End-to-end machine learning (ML) lifecycle

First published on November 2, 2021

Last updated at March 21, 2022

 

8 minute read

Xiaoyou Wang

TLDR

A machine learning (ML) project requires collaboration across multiple roles in a business. We’ll introduce the high level steps of what the end-to-end ML lifecycle looks like and how different roles can collaborate to complete the ML project.

Outline

  • Introduction

  • Define problem

  • Collect data

  • Prepare data

  • Train, evaluate, and improve model

  • Deploy and integrate model

  • Monitor model

  • Conclusion

Introduction

Machine learning is a powerful tool to help solve different problems in your business. The article “

” gives you basic ideas of what it takes to build a machine learning model. In this article, we’ll talk about what the end-to-end machine learning project lifecycle looks like in a real business. The chart below shows the high level steps from project initiation to completion. Completing a ML project requires collaboration across multiple roles, including product manager, product developer, data scientist, and MLOps engineer. Failing to accurately execute on any one of these steps will result in misleading insights or models with no practical value.

Define problem

When talking about machine learning, people usually have high expectations on what it can achieve. Before starting a machine learning project, the product team should collaborate to come up with the problem definition. Here are some questions that should be clarified at this step:

What’s the problem?

Machine learning can be used to solve various problems (e.g. reduce manual work, rank products, etc). Before starting the project, we need to clearly define the problem and expected outcome. We should think about whether this is a valuable problem to solve and estimate how much value machine learning can bring.

How should we measure the success of the model?

There are different objectives when using machine learning. We should be clear how to measure the success of the model based on different objectives. If we want to use a machine learning model to reduce manual work, we should measure whether the model can give results as well as a human does. If we want to use machine learning to rank products on a website, we can measure whether we get a higher

after using the model to rank the products.

Do we have enough data to build the model?

Now that we have the idea, we need to think about one practical thing: do we have the data? The machine learning model learns from the past data and predicts for the new data. If you don’t have enough data, machine learning won’t be a good choice for you.

Collect data

No matter what model we want to build, we first need to collect two types of data. The first type of data contains the labels (the target variable we want to predict) or can be used to create the labels. The second type of data can be used to generate features that’ll affect the model predictions. For example, if we want to build a model to predict whether a user will churn. We at least need to get a table which contains data indicating whether the user has churned. In addition, we also want to collect user events to generate more features which can contribute to the model predictions.

Product developers

are usually responsible for collecting the data after getting data requirements from data scientists. If you have a good habit of logging the events, then you’ll be relieved when building machine learning models. If you don’t have good logging in your product, start doing it. This data will help you understand your product better even if you don’t have immediate needs for machine learning models. Next, the work can be handed over to

data scientists

to prepare the data and train the model.

Prepare data

Data preparation is one of the most complex steps in the machine learning lifecycle, which is also called “Feature Engineering”. If you don’t have data processing experience and want to learn it, this

series will be a good resource for you. Here are the basic steps of feature engineering:

Create labels

In machine learning, “label” is the target variable you want to predict with the model. To prepare the data for model training, we need to identify whether we have a label column in our dataset. If there’s no explicit label column in our datasets, we need to create labels first.

Create features

Machine learning algorithms learn from the features. Here are some ways to create features:

  • Expand the existing features. For example, you can expand your date feature to “year”, “month”, “day”, and “days since holiday” features.

  • Aggregate the events feature. One example is to count the number of user events over the past 7 days, 30 days, or 90 days. Another example is to count the number of page view events from Google and Facebook respectively.

Impute, encode and scale

After creating labels and features, we need to get our data ready for machine learning algorithms.

  • Impute: The real-world datasets usually have missing values. Machine learning algorithms don’t handle missing values well. Thus, we need to fill in the missing value with data inferred from existing data.

  • Encode: Machine learning algorithms require data to be numbers. Thus, we need to convert the text features to numbers.

  • Scale: Numbers with larger ranges will have a higher impact on the model output. We need to adjust the values of number columns to fall within similar ranges so that large numbers (such as seconds since epoch) don’t affect the prediction disproportionately as much as smaller values (such as age).

Train, evaluate, and improve model

After data is prepared, we split the dataset into a training set and a test set, select an algorithm, and then start training the model with the training set. We briefly introduced some machine learning algorithms in the

article. We’ll discuss different algorithms in detail in a future blog article.

After model training completes, we need to evaluate the model’s performance with the test set. We use

to evaluate a classification model’s performance, and use

to evaluate a regression model’s performance.

The article “

” introduced the strategies for improving models, including comparing multiple algorithms, hyperparameter tuning, and more feature engineering.

Deploy and integrate model

Once you’re done with the model training and are satisfied with the model performance, the

data scientist

can now hand over the model to the

MLOps engineer 

to deploy the model to production. Then the

product developer

will integrate the model into the product.

There are generally two ways to integrate models and make predictions: online predictions and offline batch predictions.

Online prediction

For online prediction, we can deploy the model to an online web service and make API calls to the online service to get predictions. This is useful when we need to get real-time predictions, e.g. realtime product ranking.

Offline batch prediction

For other models, we don’t necessarily need to get real-time predictions. We can use an offline batch prediction job to get predictions for a large amount of data points on a regular basis. These predictions are then stored in a database and can be made available to developers or end users. For example, for the demand forecast model, we can estimate the demand for products on a daily basis for upcoming one year with an offline batch prediction job.

Experimentation

After integrating the model into production, you can run an experiment to evaluate the model performance with real production traffic. For example, if you build a ranking model for your e-commerce website. You can split the website traffic into 50/50. Half of the users will see the products in the original order (control group). Another half of the users will see products in the ranked order determined by the ranking model (treatment group). We can compare the target metrics (e.g. click-through rate) between the users in the control and the treatment groups.

Monitor model

Congratulations! With the team’s hard work, your model is finally live! You evaluated the model via experimentation and got the expected outcome. Is this everything you need to do for the model? The answer is no. Model performance can degrade over time. It’s important to set up a good monitoring system to make sure the model works correctly in production over time.

Multiple things could go wrong in production. One of the most common issues is data drift, which means the distribution of the target variable or the input data changes over time. The model- monitoring system should monitor the model performance with production data, detect the data drift issue, and provide feedback for further model improvement (e.g. model retrain). Stay tuned for a future article about model monitoring.

Conclusion

The whole machine learning lifecycle is a lengthy process, which requires expertise across multiple roles.

  • The product team defines the problem.

  • The product developer collects the data.

  • The data scientist prepares the data and trains the model.

  • The MLOps engineer deploys the model into production.

  • The product developer integrates the model into the product.

  • The MLOps engineer sets up the model monitoring system.

If you wonder whether there’s a way to simplify the process,

helps handle all the work from “prepare data” to “monitor model”. Mage also provides suggestions on what type of problems you can solve with ML and what data is needed.