[SUBMARINE-270] [Umbrella] Submarine-sdk pipeline - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: SDK
Labels:
None

Target Version:

0.7.0

Description

It's very complex from raw data ingestion to push model in production, submarine pipeline is building for deploying portable, scalable machine learning workflow

Created this JIRA ticket to discuss more detail and plan on submarine pipeline
The pipeline would have two main component

1. workflow orchestrator - help us manage dependency between each task ,schedule workflow and retry if failure happens. There are 3 ways to build our orchestrator.

airflow - use airflow API to build our pipeline
submarine workflow - 10110346 suggests built-in submarine workflow
abstract orchestrator - support a abstraction layer like TFX, and we can support different orchestration frameworks

2. sdk ML library - reduce routine ML code development, there are several routine task to build ML pipeline, give some callback function to let user easily preprocessing, train model and others, we may contain different frameworks to deal with both small and large datasets.

preprocessing (Hive,Spark,Pandas)
train (TF, Pytorch)
Evaluation
Model Validator
Pusher

To find more check the link below, feel free to edit or comment documents

Attachments

Issue Links

Blocked

SUBMARINE-296 [SDK] Submarine pipeline example

Resolved

links to

Compare to other ML pipeline

Submarine Pipeline

Submarine workflow

Submarine Workflow Discussion & Prioritization

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Kevin Su

Reporter:: Kevin Su

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Oct/19 07:00

Updated:: 22/Feb/22 04:17

Resolved:: 22/Feb/22 04:17

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 40m

Include sub-tasks