Converting Models to Core ML

Core ML is Apple's framework for integrating machine learning models into iOS, macOS, watchOS, and tvOS applications. One of the key features of Core ML is its ability to convert models from various popular machine learning frameworks into the Core ML format. In this blog post, we'll explore the process of converting models to Core ML, with a focus on PyTorch models.

Understanding Core ML Tools

Core ML Tools is a Python package that facilitates the conversion of machine learning models to the Core ML format. It supports a wide range of model types and frameworks, including TensorFlow, Keras, scikit-learn, XGBoost, and PyTorch.

Key Features of Core ML Tools:

Unified API for model conversion
Support for various machine learning frameworks
Automatic model optimization for Apple devices
Built-in support for common preprocessing and postprocessing steps

Converting PyTorch Models to Core ML

The process of converting a PyTorch model to Core ML involves two main steps:

Converting the PyTorch model to TorchScript
Using Core ML Tools to convert the TorchScript model to Core ML format

Step 1: Converting PyTorch to TorchScript

TorchScript is an intermediate representation of a PyTorch model that can be run in high-performance environments such as C++. There are two ways to convert a PyTorch model to TorchScript:

1. Tracing

Tracing works by running an example input through the model and recording the operations that are executed. This method is suitable for models with a fixed control flow.


import torch

def trace_model(model, example_input):
    return torch.jit.trace(model, example_input)

# Example usage
traced_model = trace_model(my_pytorch_model, torch.rand(1, 3, 224, 224))

2. Scripting

Scripting analyzes the Python code of the model and converts it to TorchScript. This method is more flexible and can handle models with dynamic control flow.


import torch

def script_model(model):
    return torch.jit.script(model)

# Example usage
scripted_model = script_model(my_pytorch_model)

Step 2: Converting TorchScript to Core ML

Once you have a TorchScript model, you can use Core ML Tools to convert it to the Core ML format.


import coremltools as ct

def convert_to_coreml(torchscript_model, input_shape):
    mlmodel = ct.convert(
        torchscript_model,
        inputs=[ct.TensorType(shape=input_shape)]
    )
    return mlmodel

# Example usage
coreml_model = convert_to_coreml(traced_model, (1, 3, 224, 224))
coreml_model.save("my_model.mlmodel")

Model Tracing vs Model Scripting

Understanding when to use tracing versus scripting is crucial for successful model conversion. Let's dive deeper into these two approaches:

Model Tracing

Tracing is typically simpler and often produces more optimized TorchScript code. It's ideal for models with a static computation graph.

Advantages of Tracing:

Generally produces faster models
Simpler to use for straightforward models
Works well with models that have a fixed structure

Limitations of Tracing:

Cannot capture dynamic control flow
May not generalize well if the model behavior changes based on input

Model Scripting

Scripting is more flexible and can handle models with dynamic behavior, but it may produce less optimized code in some cases.

Advantages of Scripting:

Can capture dynamic control flow (if statements, loops)
Works with a wider range of PyTorch models
Preserves more of the original Python code structure

Limitations of Scripting:

May produce less optimized TorchScript code
Can be more complex to debug if errors occur

Choosing Between Tracing and Scripting

Here are some guidelines to help you choose the appropriate method:

Use tracing if your model has a fixed structure and doesn't rely on dynamic control flow
Use scripting if your model contains conditional statements or loops that depend on the input
Consider using a hybrid approach (tracing some components and scripting others) for complex models

Case Study: Converting a Segmentation Model

Let's walk through a real-world example of converting a PyTorch segmentation model to Core ML. We'll use the DeepLabV3 model with a ResNet-101 backbone.


import torch
import torchvision
import coremltools as ct
from PIL import Image

# Load the pre-trained model
model = torchvision.models.segmentation.deeplabv3_resnet101(pretrained=True).eval()

# Prepare a sample input
input_image = Image.open("sample_image.jpg")
preprocess = torchvision.transforms.Compose([
    torchvision.transforms.Resize((256, 256)),
    torchvision.transforms.ToTensor(),
])
input_tensor = preprocess(input_image).unsqueeze(0)

# Attempt to trace the model (this will fail)
try:
    traced_model = torch.jit.trace(model, input_tensor)
except RuntimeError as e:
    print(f"Tracing failed: {e}")

# Create a wrapper class to handle dictionary output
class WrappedDeepLabV3(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model
    
    def forward(self, x):
        return self.model(x)['out']

# Wrap the model and trace it
wrapped_model = WrappedDeepLabV3(model)
traced_model = torch.jit.trace(wrapped_model, input_tensor)

# Convert to Core ML
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.ImageType(name="input", shape=input_tensor.shape, scale=1/255.0, bias=[0, 0, 0])],
    outputs=[ct.TensorType(name="output")]
)

# Set metadata
mlmodel.author = "Your Name"
mlmodel.license = "Your License"
mlmodel.short_description = "DeepLabV3 Segmentation Model"
mlmodel.version = "1.0"

# Save the model
mlmodel.save("deeplabv3_segmentation.mlmodel")

In this example, we encountered an issue with tracing the original model due to its dictionary output. We solved this by creating a wrapper class that extracts the relevant output tensor. This demonstrates the importance of understanding your model's structure and being prepared to adapt your approach during the conversion process.

Case Study: CLIP-Finder Image Encoder Conversion

Now, let's look at a more complex example: converting the image encoder from the CLIP model used in the CLIP-Finder project. This case study showcases the conversion of a state-of-the-art multimodal model to Core ML. For more details, please check out: 🤗 MobileCLIP Converted on Hugging Face


import coremltools
import torch
import mobileclip
from mobileclip.modules.common.mobileone import reparameterize_model
from mobileclip.clip import CLIP

# Define the model configuration
model_cfg = {
    "embed_dim": 512,
    "image_cfg": {
        "image_size": 256,
        "model_name": "mci0"
    },
    "text_cfg": {
        "context_length": 77,
        "vocab_size": 49408,
        "dim": 512,
        "ffn_multiplier_per_layer": 4.0,
        "n_heads_per_layer": 8,
        "n_transformer_layers": 4,
        "norm_layer": "layer_norm_fp32",
        "causal_masking": False,
        "model_name": "mct"
    }
}

# Create a custom CLIP class for image encoding
class CLIP_encode_image(CLIP):
    def __init__(self, cfg, output_dict=False, *args, **kwargs):
        super().__init__(cfg, output_dict, *args, **kwargs)

    def forward(self, image):
        return self.encode_image(image, normalize=True)

# Initialize and load the model
model_ie = CLIP_encode_image(cfg=model_cfg)
model_ie.eval()
chkpt = torch.load("checkpoints/mobileclip_s0.pt")
model_ie.load_state_dict(chkpt)

# Reparameterize the model for inference
reparameterized_model = reparameterize_model(model_ie)
reparameterized_model.eval()

# Trace the model
image = torch.rand(1, 3, 256, 256)
traced_model = torch.jit.trace(reparameterized_model, image)

# Convert to Core ML
input_image = coremltools.ImageType(name="input_image", shape=(1, 3, 256, 256), color_layout=coremltools.colorlayout.RGB, scale=1/255.0, bias=[0, 0, 0])
output_tensor = [coremltools.TensorType(name="output_embeddings")]

ml_model = coremltools.convert(
    model=traced_model,
    outputs=output_tensor,
    inputs=[input_image],
    convert_to="mlprogram",
    minimum_deployment_target=coremltools.target.iOS17,
    compute_units=coremltools.ComputeUnit.ALL,
    debug=True,
)

# Save the model
ml_model.save("clip_mci_image_s0.mlpackage")

This case study demonstrates several advanced techniques:

Custom model architecture for specific use-case (image encoding)
Model reparameterization for optimized inference
Use of advanced Core ML Tools features like MLProgram format and compute unit specification
Handling of complex model configurations and checkpoints

Best Practices and Troubleshooting Tips

When converting models to Core ML, keep these best practices in mind:

Always put your model in evaluation mode (model.eval()) before tracing or scripting
Use representative input data for tracing to ensure all code paths are captured
Be prepared to create wrapper classes or modify your model to handle complex outputs or inputs
Verify the converted model's outputs against the original PyTorch model
Use Core ML Tools' debugging features to identify and resolve conversion issues

Common troubleshooting steps include:

If tracing fails, try scripting or a hybrid approach
For models with dynamic shapes, use coremltools.EnumeratedShapes to specify possible input dimensions
If you encounter unsupported operations, consider implementing them as custom layers or composite ops
Use the latest versions of PyTorch and Core ML Tools to ensure compatibility with newer model architectures

Conclusion

Converting PyTorch models to Core ML opens up a world of possibilities for deploying sophisticated machine learning models on Apple devices. By understanding the nuances of tracing and scripting, and following best practices, you can successfully convert a wide range of models, from simple classifiers to complex multimodal architectures like CLIP.

Remember that the field of machine learning and model conversion is constantly evolving. Stay updated with the latest developments in PyTorch and Core ML Tools to ensure you're using the most efficient and effective conversion techniques for your projects.

References and Resources

Scripts

CLIPImageModel to CoreML
This notebook demonstrates the process of converting a CLIP image model to CoreML format.
CLIPTextModel to CoreML
This notebook demonstrates the process of converting a CLIP text model to CoreML format.

Related Projects

CLIP-Finder GitHub Repository: https://github.com/fguzman82/CLIP-Finder2
Converted CLIP Core ML Models: 🤗 MobileCLIP on Hugging Face