Azure Machine Learning v2: Deploy MLFlow model to Kubernetes
I have an Arc enabled Kubernetes cluster and would like to use Azure ML Python SDK v2 to deploy a model registered in Azure ML to the cluster. This turned out to be more involved than expected. It requires a good understanding of quite a few concepts, including the different model types that Azure ML supports, especially MLFlow model, no-code deployment, local deployment, managed vs. unmanaged endpoints, and the different capabilities of Azure ML SDK vs. Cli vs. portal in deployment.
This article explains a few lessons learned in deploying a MLFlow model to Arc enabled Kubernetes. Source code can be found in this repo
Prepare your Kubernetes cluster
Whether you have an Arc enabled Kubernetes cluster or Azure Kubernetes cluster, the steps to set it up to connect to Azure ML are same. Follow
- Step 1 to deploy Azure ML extension to your cluster
- Step 2 to attach your cluster to an Azure ML workspace.
Completing these two steps is sufficient to deploy models to your cluster. The rest of the steps in this doc is optional.
Train and register a model
If you follow the Azure ML how-to guide to train and register a model, it may not be immediately obvious that this sample trains a MLflow model:
- The training script for this sample is located here.
- Sample code to connect to Azure ML and register the model is located here.
Deploy the model
Now it’s time to deploy the model. This could be confusing because the Azure ML how-to guide to deploy a model doesn’t use the MLflow model trained above,
but a scikit-learn model packaged as a pkl
. To deploy, it requires a score.py
. However, there’s no score.py
for our MLflow model because scoring script for MLflow model is auto-generated for no-code deployment.
But no-code deployment of MLflow models works for managed endpoints, it doesn’t for Kubernetes. At the time of this writing,
using Azure ML SDK v2 to deploy MLflow model to Kubernetes is not supported. You can, however, write your own score.py
and then deploy from Azure ML Studio portal.
There is no example of score.py
for MLflow model, so how to write one? The MLmodel
file in the MLflow model package describes
the input/output schema that the model expects.
Note that the score.py
you write is probably different from the one auto-generated, so the input data format may be different from
what you provide to an Azure ML managed endpoint the model is deployed to.
Here’s a sample score.py.
Troubleshoot failed deployment
It’s quite unlikely that your deployment of the MLflow model to your Kubernetes cluster succeeds at first try. Depending on where it fails, it may not even have helpful logs yet. So how do you troubleshoot? Deploying the model locally in a docker container helps you to not only troubleshoot issues more effectively but also better understand how Azure ML works.
Wait, didn’t we just say deploying MLflow model to unmanaged endpoints using the SDK is not supported?
Isn’t local endpoint unmanaged? Yes and yes. So we need to do some exploration. If we use the Python SDK to deploy the model locally with our score.py
,
here are some issues that you might see and tips on how to work around them.
- Model can’t be mounted with an
OSError
exception on WindowsWinError 123
. Go to Azure ML Studio portal to download the registered model to a foldermodel
. You needconda.yaml
in the model package to set up your local and Kubernetes environment anyways. - Even though the model contains the environment info it needs to run, if you don’t specify the environment, you will get
RequiredLocalArtifactsNotFoundError
. Create an environment by picking a base docker image that matches your model andconda.yaml
from the downloadedmodel
folder. - The downloaded
conda.yaml
doesn’t include the Inference Http Server required for inferencing in a local or Kubernetes deployment. You need to add this package.
Here’s the sample code for local deployment.
Run the following command to verify local endpoint works:
curl -d "{\"data\":[[1,2,3,4]]}" -H "Content-Type: application/json" localhost:<port>/score
Understand how deployment works
Take a look at the docker container by either docker inspect <container-id>
or docker exec -it <container-id> /bin/bash
. You will notice:
- Your model and code are mounted under
/var/azureml-app
- Environment variables point to code and model artifacts:
AZUREML_MODEL_DIR
points to the model folderAML_APP_ROOT
points to the code folderAZUREML_ENTRY_SCRIPT
points to the scoring scriptscore.py
The code generated in /var/azureml-server
and /var/runit/gunicorn/run
leverages these environment variables to run your code.
Deploy to Kubernetes
Once local deployment succeeds, deploy to Kubernetes:
- from Azure ML Studio portal using the same
score.py
- or, using Python SDK with the same
score.py
. Here’s the sample code for Kubernetes deployment.
The model will be deployed to the Kubernetes namespace you specified when you attached the cluster to Azure ML workspace. You can get the scoring endpoint and API key programmatically or from the portal.
Run the following command to verify Kubernetes endpoint works:
curl -d "{\"data\":[[1,2,3,4]]}" -H "Content-Type: application/json" -H "Authorization: Bearer <your key>" <your scoring url>
Summary
While you need to have a basic understanding of the various concepts of Azure ML, MLflow, and Azure Arc enabled Kubernetes to get started, Azure ML lets you centrally deploy ML models to Azure Arc enabled Kubernetes running anywhere in Azure, on-premises, or in other cloud.