Uploaded image for project: 'Apache Submarine'
  1. Apache Submarine
  2. SUBMARINE-270

[Umbrella] Submarine-sdk pipeline

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • SDK
    • None

    Description

      It's very complex from raw data ingestion to push model in production, submarine pipeline is building for deploying portable, scalable machine learning workflow

      Created this JIRA ticket to discuss more detail and plan on submarine pipeline
      The pipeline would have two main component

      1. workflow orchestrator - help us manage dependency between each task ,schedule workflow and retry if failure happens. There are 3 ways to build our orchestrator.

      • airflow - use airflow API to build our pipeline
      • submarine workflow - 10110346 suggests built-in submarine workflow
      • abstract orchestrator - support a abstraction layer like TFX, and we can support different orchestration frameworks

      2. sdk ML library - reduce routine ML code development, there are several routine task to build ML pipeline, give some callback function to let user easily preprocessing, train model and others, we may contain different frameworks to deal with both small and large datasets.

      • preprocessing (Hive,Spark,Pandas)
      • train (TF, Pytorch)
      • Evaluation
      • Model Validator
      • Pusher

      To find more check the link below, feel free to edit or comment documents

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            pingsutw Kevin Su
            pingsutw Kevin Su
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 40m
                1h 40m