Description
It's very complex from raw data ingestion to push model in production, submarine pipeline is building for deploying portable, scalable machine learning workflow
Created this JIRA ticket to discuss more detail and plan on submarine pipeline
The pipeline would have two main component
1. workflow orchestrator - help us manage dependency between each task ,schedule workflow and retry if failure happens. There are 3 ways to build our orchestrator.
- airflow - use airflow API to build our pipeline
- submarine workflow - 10110346 suggests built-in submarine workflow
- abstract orchestrator - support a abstraction layer like TFX, and we can support different orchestration frameworks
2. sdk ML library - reduce routine ML code development, there are several routine task to build ML pipeline, give some callback function to let user easily preprocessing, train model and others, we may contain different frameworks to deal with both small and large datasets.
- preprocessing (Hive,Spark,Pandas)
- train (TF, Pytorch)
- Evaluation
- Model Validator
- Pusher
To find more check the link below, feel free to edit or comment documents
Attachments
Issue Links
- Blocked
-
SUBMARINE-296 [SDK] Submarine pipeline example
- Resolved
- links to