In this meetup, we will cover how to leverage Airflow in complex machine learning pipelines, where users need model reproducibility, the ability to run efficiently at large scale, and the power to run pipelines dynamically with changing inputs, parameters and algorithms.
After attending the session, Airflow engineers will know techniques to establish robust MLOps including how to use Airflow to automate all phases of ML model lifecycle.
What is the current problem? Why are ML pipelines different from ETL pipelines?
- Achieving reproducibility and data reuse
Running complex graphs - Multiple steps (>100) in the same pipeline
- Managing the Model Lifecycle - Research, Production and Research again
-Advantages and limitations of Apache Airflow
What are Dynamically built DAGs:
- Why not just a DAG? Why not one Template for all ML flows?
- Workaround with variables and triggers
- Workaround with SkipOperator
- Workaround with external libraries
Answering ML Use Case Questions:
- Should I separate Orchestration and ML code
- How to manage my Data? Vanilla XCOM or building your own XCOM-backend?
- How to gain visibility on the model Performance? (KPIs)
- How to build robust automation for training, deployment, and retraining?
What is needed at the next version of Apache Airflow?