Uploaded image for project: 'Apache Submarine'
  1. Apache Submarine
  2. SUBMARINE-347

Fix the job spec parser issue and refine the TF job on K8s document

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.3.0
    • Doc

    Description

      1. When trying to deploy the TF-operator following the document:

      $ kubectl kustomize ./dev-support/k8s/tfjob/operator | kubectl apply -f -
      clusterrole.rbac.authorization.k8s.io/kubeflow-tfjobs-admin created
      clusterrole.rbac.authorization.k8s.io/kubeflow-tfjobs-edit created
      clusterrole.rbac.authorization.k8s.io/kubeflow-tfjobs-view created
      clusterrole.rbac.authorization.k8s.io/tf-job-operator created
      clusterrolebinding.rbac.authorization.k8s.io/tf-job-operator created
      Error from server (NotFound): error when creating "STDIN": namespaces "submarine" not found
      Error from server (NotFound): error when creating "STDIN": namespaces "submarine" not found
      Error from server (NotFound): error when creating "STDIN": namespaces "submarine" not found
      Error from server (NotFound): error when creating "STDIN": namespaces "submarine" not found

      We should mention this in the document:

      kubectl create namespace submarine

      2. The curl command is not correct. The line break "\" is not working. And the "`" character should be "'".

      curl -H "Content-Type: application/json" --request POST --data '{"name":"mnist","librarySpec":{"name":"TensorFlow","version":"2.1.0","image":"gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0","cmd":"python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150","envVars":{"ENV_1":"ENV1"}},"submitterSpec":{"type":"k8s","configPath":null,"namespace":"submarine","kind":"TFJob","apiVersion":"kubeflow.org/v1"},"taskSpecs":{"Ps":{"name":"tensorflow","replicas":2,"resources":"cpu=4,memory=2048M,nvidia.com/gpu=1"},"Worker":{"name":"tensorflow","replicas":2,"resources":"cpu=4,memory=2048M"}}}' http://127.0.0.1:8080/api/v1/jobs
      

      3. We should have a note for user to check the running job in document. Like "kubectl get TFJob". etc.

      Attachments

        Issue Links

          Activity

            People

              jiwq Wanqiang Ji
              tangzhankun Zhankun Tang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m