Figure 1: The gap between data collection and model deployment
The time between data collection and final model deployment can be significant, sometimes ranging from weeks to months, to even years
Through self-service data and development environments
Using enterprise model registries and monitoring operations that also create wider auditability
By adopting robust data and model governance best practices
Through feature/code containerization and automated model training, evaluation, versioning, and deployment steps
By incorporating continuous integration (CI) continuous delivery (CD) and continuous monitoring (CM) best practices
Respond to business opportunities and changes quickly, incorporate enhancements to product on regular basis
Principles | Key steps: |
---|---|
Integration of code MLOps integrates and versions every piece of code to improve its accessibility, traceability, and accuracy. This makes it possible to share and reproduce the code base across projects and teams. | ■ Automated data extraction and processing ■ Data validation checks ■ Feature engineering on processed data ■ Automated monitoring |
Scalability MLOps enables organizations to effortlessly scale their ML initiatives by providing a model registry to store and version trained ML models. This greatly simplifies the task of tracking models as they move through every stage of the ML lifecycle. | ■ Creation of a model registry (to capture metadata, metrics, inputs, etc) ■ Continuous versioning of model and data artifacts ■ Seamless tracking of artifacts for quick debugging, better traceability, and auditability |
Continuous monitoring and Continuous training MLOps frameworks are powered by robust monitoring techniques that enable the continuous learning, training, and retraining of ML models. | ■ Model performance monitoring to identify model and data drift ■ Monitoring of model latency, system metrics, and operations of ML pipelines ■ Measurement of business impact based on predefined use cases and KPIs |
Operationalizing MLOps orchestrates all the different steps of a data pipeline, including batch or real-time processing and end-to-end automation. | ■ Automated triggers for model training and retraining alerts ■ Access to a centralized dashboard for seamless tracking |