Converting Models to Core ML

Core ML is Apple's framework for integrating machine learning models into iOS, macOS, watchOS, and tvOS applications. One of the key features of Core ML is its ability to convert models from various popular machine learning frameworks into the Core ML format. In this blog post, we'll explore the process of converting models to Core ML, with a focus on PyTorch models.

Understanding Core ML Tools

Core ML Tools is a Python package that facilitates the conversion of machine learning models to the Core ML format. It supports a wide range of model types and frameworks, including TensorFlow, Keras, scikit-learn, XGBoost, and PyTorch.

Key Features of Core ML Tools:

Converting PyTorch Models to Core ML

The process of converting a PyTorch model to Core ML involves two main steps:

  1. Converting the PyTorch model to TorchScript
  2. Using Core ML Tools to convert the TorchScript model to Core ML format

Step 1: Converting PyTorch to TorchScript

TorchScript is an intermediate representation of a PyTorch model that can be run in high-performance environments such as C++. There are two ways to convert a PyTorch model to TorchScript:

1. Tracing

Tracing works by running an example input through the model and recording the operations that are executed. This method is suitable for models with a fixed control flow.


import torch

def trace_model(model, example_input):
    return torch.jit.trace(model, example_input)

# Example usage
traced_model = trace_model(my_pytorch_model, torch.rand(1, 3, 224, 224))
            

2. Scripting

Scripting analyzes the Python code of the model and converts it to TorchScript. This method is more flexible and can handle models with dynamic control flow.


import torch

def script_model(model):
    return torch.jit.script(model)

# Example usage
scripted_model = script_model(my_pytorch_model)
            

Step 2: Converting TorchScript to Core ML

Once you have a TorchScript model, you can use Core ML Tools to convert it to the Core ML format.


import coremltools as ct

def convert_to_coreml(torchscript_model, input_shape):
    mlmodel = ct.convert(
        torchscript_model,
        inputs=[ct.TensorType(shape=input_shape)]
    )
    return mlmodel

# Example usage
coreml_model = convert_to_coreml(traced_model, (1, 3, 224, 224))
coreml_model.save("my_model.mlmodel")
            

Model Tracing vs Model Scripting

Understanding when to use tracing versus scripting is crucial for successful model conversion. Let's dive deeper into these two approaches:

Model Tracing

Tracing is typically simpler and often produces more optimized TorchScript code. It's ideal for models with a static computation graph.

Advantages of Tracing:

Limitations of Tracing:

Model Scripting

Scripting is more flexible and can handle models with dynamic behavior, but it may produce less optimized code in some cases.

Advantages of Scripting:

Limitations of Scripting:

Choosing Between Tracing and Scripting

Here are some guidelines to help you choose the appropriate method:

Case Study: Converting a Segmentation Model

Let's walk through a real-world example of converting a PyTorch segmentation model to Core ML. We'll use the DeepLabV3 model with a ResNet-101 backbone.


import torch
import torchvision
import coremltools as ct
from PIL import Image

# Load the pre-trained model
model = torchvision.models.segmentation.deeplabv3_resnet101(pretrained=True).eval()

# Prepare a sample input
input_image = Image.open("sample_image.jpg")
preprocess = torchvision.transforms.Compose([
    torchvision.transforms.Resize((256, 256)),
    torchvision.transforms.ToTensor(),
])
input_tensor = preprocess(input_image).unsqueeze(0)

# Attempt to trace the model (this will fail)
try:
    traced_model = torch.jit.trace(model, input_tensor)
except RuntimeError as e:
    print(f"Tracing failed: {e}")

# Create a wrapper class to handle dictionary output
class WrappedDeepLabV3(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model
    
    def forward(self, x):
        return self.model(x)['out']

# Wrap the model and trace it
wrapped_model = WrappedDeepLabV3(model)
traced_model = torch.jit.trace(wrapped_model, input_tensor)

# Convert to Core ML
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.ImageType(name="input", shape=input_tensor.shape, scale=1/255.0, bias=[0, 0, 0])],
    outputs=[ct.TensorType(name="output")]
)

# Set metadata
mlmodel.author = "Your Name"
mlmodel.license = "Your License"
mlmodel.short_description = "DeepLabV3 Segmentation Model"
mlmodel.version = "1.0"

# Save the model
mlmodel.save("deeplabv3_segmentation.mlmodel")
            

In this example, we encountered an issue with tracing the original model due to its dictionary output. We solved this by creating a wrapper class that extracts the relevant output tensor. This demonstrates the importance of understanding your model's structure and being prepared to adapt your approach during the conversion process.

Case Study: CLIP-Finder Image Encoder Conversion

Now, let's look at a more complex example: converting the image encoder from the CLIP model used in the CLIP-Finder project. This case study showcases the conversion of a state-of-the-art multimodal model to Core ML. For more details, please check out: 🤗 MobileCLIP Converted on Hugging Face


import coremltools
import torch
import mobileclip
from mobileclip.modules.common.mobileone import reparameterize_model
from mobileclip.clip import CLIP

# Define the model configuration
model_cfg = {
    "embed_dim": 512,
    "image_cfg": {
        "image_size": 256,
        "model_name": "mci0"
    },
    "text_cfg": {
        "context_length": 77,
        "vocab_size": 49408,
        "dim": 512,
        "ffn_multiplier_per_layer": 4.0,
        "n_heads_per_layer": 8,
        "n_transformer_layers": 4,
        "norm_layer": "layer_norm_fp32",
        "causal_masking": False,
        "model_name": "mct"
    }
}

# Create a custom CLIP class for image encoding
class CLIP_encode_image(CLIP):
    def __init__(self, cfg, output_dict=False, *args, **kwargs):
        super().__init__(cfg, output_dict, *args, **kwargs)

    def forward(self, image):
        return self.encode_image(image, normalize=True)

# Initialize and load the model
model_ie = CLIP_encode_image(cfg=model_cfg)
model_ie.eval()
chkpt = torch.load("checkpoints/mobileclip_s0.pt")
model_ie.load_state_dict(chkpt)

# Reparameterize the model for inference
reparameterized_model = reparameterize_model(model_ie)
reparameterized_model.eval()

# Trace the model
image = torch.rand(1, 3, 256, 256)
traced_model = torch.jit.trace(reparameterized_model, image)

# Convert to Core ML
input_image = coremltools.ImageType(name="input_image", shape=(1, 3, 256, 256), color_layout=coremltools.colorlayout.RGB, scale=1/255.0, bias=[0, 0, 0])
output_tensor = [coremltools.TensorType(name="output_embeddings")]

ml_model = coremltools.convert(
    model=traced_model,
    outputs=output_tensor,
    inputs=[input_image],
    convert_to="mlprogram",
    minimum_deployment_target=coremltools.target.iOS17,
    compute_units=coremltools.ComputeUnit.ALL,
    debug=True,
)

# Save the model
ml_model.save("clip_mci_image_s0.mlpackage")
            

This case study demonstrates several advanced techniques:

Best Practices and Troubleshooting Tips

When converting models to Core ML, keep these best practices in mind:

Common troubleshooting steps include:

Conclusion

Converting PyTorch models to Core ML opens up a world of possibilities for deploying sophisticated machine learning models on Apple devices. By understanding the nuances of tracing and scripting, and following best practices, you can successfully convert a wide range of models, from simple classifiers to complex multimodal architectures like CLIP.

Remember that the field of machine learning and model conversion is constantly evolving. Stay updated with the latest developments in PyTorch and Core ML Tools to ensure you're using the most efficient and effective conversion techniques for your projects.

References and Resources

Scripts

Related Projects