Machine Learning

Apple’s M1 Pro and M1 Max Outperform Google Colab by up to 54%

2021 Apple M1 Pro and M1 Max Machine Learning speed test comparison.

Daniel Bourke

05 Dec 2021 • 11 min read

This image is for display purposes only. It's photoshopped and shouldn't be taken to scale. See the rest of the post for actual details. Image source: Apple Newsroom.

The main keyboard I used is attached to a MacBook Pro. I'm typing this on a MacBook Pro.

And being the tech nerd I am, when Apple released a couple of new MacBook Pro's with upgraded hardware: M1 Pro, M1 Max chips and redesigns and all the rest, I decided, I better test them out.

For context, I make videos on machine learning, write machine learning code and teach machine learning.

That's what this article focuses on.

Comparing Apple's M1, M1 Pro and M1 Max chips against each other and a few other chips.

How?

Four tests:

Final Cut Pro Export – How fast can the various MacBook Pro's export a 4-hour long TensorFlow instructional video (I make coding education videos) and a 10-minute long story video (using H.264 and ProRes encodings)?
CreateML Image Classification Machine Learning Model Creation – How fast can the various MacBook Pro's turn 10,000 images into an image classification model with CreateML?
CIFAR10 TinyVGG Image Classification with TensorFlow (via tensorflow-macos) – Thanks to tensorflow-metal, you can now leverage your MacBook's internal GPU to speed up machine learning model training. How does this go with a smaller model?
Food101 EfficientNetB0 feature extraction with tensorflow-macos - I rarely train machine learning models from scratch. So how do the new M1 Pro and M1 Max chips go with transfer learning using TensorFlow code?

This article is strictly focused on performance. For design, inputs, outputs, battery life, there's plenty of other resources out there.

Hardware specs

I currently use an Intel-based MacBook Pro 16-inch as my main machine (almost always plugged in) with a 2020 13-inch M1 MacBook Pro as a take-with-me-places option.

And for training larger machine learning models, I use Google Colab, Google Cloud GPUs or SSH (connect via the internet) to a dedicated deep learning PC with a TITAN RTX GPU.

For the TensorFlow code tests, I've included comparisons with Google Colab and the TITAN RTX GPU.

The specs here focus on the MacBook Pro's, Intel-based, M1, M1 Pro, M1 Max.

hardware specs for all mac laptops tested in this blog post — Hardware specs of each of the Mac laptops tested in the video.

For each test, all MacBook Pro's were running macOS Monterey 12.0.1 and were plugged into power.

Experiment 1: Final Cut Pro Export (small and large video)

I make YouTube videos and educational videos teaching machine learning.

So the machine I'm using has to be fast at rendering and exporting. That's one of the main reasons I bought a spec'd out 2019 16-inch MacBook Pro, so I could edit videos without lag.

Plus, the M1 Pro and M1 Max chips are targeted at pros. Many of which edit videos at far higher quality than I do (for now).

For each video, I exported them both to H.264 encoding (higher compression, more GPU intensive) and ProRes encoding (lower compression, less CPU and GPU intensive).

This is because of Apple's statements that the newer M1 chips have dedicated ProRes engines.

Large video

Experiment details:

Video: Learn TensorFlow for Deep Learning Part 2
Length: 4-hours
Style: 30x screen recordings of smaller videos (~10-minutes each) stitched together
Quality: 1080p

Machine	Chip	Encoding	Export time (HH:MM:SS)
2019 MacBook Pro 16-inch	Intel	H.264	33:06
2020 MacBook Pro 13-inch	M1	H.264	50:09
2021 MacBook Pro 14-inch	M1 Pro	H.264	2:00:26
2021 MacBook Pro 14-inch	M1 Max	H.264	2:02:29

Interesting to see the newer M1s (Pro and Max) take over double the time of the base M1 and nearly four times as long as the Intel-based Mac when using the H.264 encoding.

It should also be noted that large amounts of hardware were left idle on both the M1 Pro (~89% idle) and M1 Max (~68% idle) when using the H.264 encoding.

activity monitor for M1 Pro during large video final cut pro export, lots of hardware left idle — While exporting to H.264 encoding, plenty of hardware was left idle on the M1 Pro and M1 Max. I'm not sure what caused this. But it does explain why their export times were so long in comparison to the other Macs.

Potentially a software update may be required to leverage the full power of the newer chips?

It was a different story for the ProRes encoding though.

Machine	Chip	Encoding	Export time (HH:MM:SS)
2019 MacBook Pro 16-inch	Intel	ProRes	33:24
2020 MacBook Pro 13-inch	M1	ProRes	30:26
2021 MacBook Pro 14-inch	M1 Pro	ProRes	12:10
2021 MacBook Pro 14-inch	M1 Max	ProRes	11:24

The ProRes encoding is where the newer M1 chips really shined.

In comparison to the H.264 encoding, huge amounts of CPU usage occurred when exporting to ProRes on both the M1 Pro and M1 Max. Perhaps this is from the dedicated ProRes cores?

Activity Monitor App for M1 Max laptop whilst exporting video to ProRes encoding with Final Cut Pro, plenty of CPU usage — Huge amounts of CPU usage were visible on the M1 Max and M1 Pro while exporting to ProRes encoding. **Note:** All screen recordings were taken separately from result tracking.

However, even though the ProRes encoding export happened far quicker than H.264, the difference in file size makes it almost unusable for many.

The file size when exporting the large video to H.264 turned out to be 7GB and when exporting to ProRes was 167GB.

I don't know about your internet speed but that kind of video would take three or four days to upload for me.

Small video

The smaller video resulted in far closer results.

Experiment details:

Video: How I study machine learning 5 days per week
Length: 10-minutes
Style: Voiceover with video clips edited to the voice
Quality: 1080p

Machine	Chip	Encoding	Export time (HH:MM:SS)
2019 MacBook Pro 16-inch	Intel	H.264	2:59
2020 MacBook Pro 13-inch	M1	H.264	3:48
2021 MacBook Pro 14-inch	M1 Pro	H.264	3:28
2021 MacBook Pro 14-inch	M1 Max	H.264	3:31

Very similar export times here across the board.

But again there was plenty of hardware left idle on the M1 Pro and M1 Max chips when exporting to the H.264 encoding.

Machine	Chip	Encoding	Export time (HH:MM:SS)
2019 MacBook Pro 16-inch	Intel	ProRes	2:35
2020 MacBook Pro 13-inch	M1	ProRes	2:41
2021 MacBook Pro 14-inch	M1 Pro	ProRes	1:09
2021 MacBook Pro 14-inch	M1 Max	ProRes	1:05

The M1 Pro and M1 Max really shine through again when using the ProRes encoding. Checking Activity Monitor showed huge amounts of CPU usage, ~350-450% for the M1 Pro and ~300-500% for the M1 Max.

Experiment 2: CreateML Image Classification

CreateML is a machine learning app made by Apple that comes with Xcode (Apple's software for making iOS and macOS apps).

It provides an easy way to turn data into machine learning models.

I tried it because my brother and I have been using it for prototype models on Nutrify (an app to take a photo of food and learn about it).

Not only does it work well for our use case, it produces trained models which are optimized for Apple devices.

Experiment details:

Data: Random 10% subset of all Food101 classes (~7.5k train images, ~2.5k test images)
Training: 25 epochs, all data augmentation settings on
Model: CreateML powered model (Apple doesn't tell you what architecture is being used)

Machine	Chip	Num Epochs	Total training time (mins)
2019 MacBook Pro 16-inch	Intel	25	24
2020 MacBook Pro 13-inch	M1	25	20
2021 MacBook Pro 14-inch	M1 Pro	25	10
2021 MacBook Pro 14-inch	M1 Max	25	11

And once again, there wasn't a large difference between the M1 Pro and M1 Max. Though both of them did very much outperform the other chips.

During training, checking Activity Monitor for the M1 Pro and M1 Max showed large amounts of CPU usage under the process name "MLRecipeExcecutionService".

screenshot of training of a model in Apple's CreateML app on an M1 Pro, activity monitor showing lots of CPU usage — During the training of a model in the CreateML app, CPU usage spikes to over 500%. The GPU doesn't get much use at all. Perhaps this is the 16-core Neural Engine kicking in? Screenshot taken on M1 Pro.

Notably, the GPU didn't get much usage at all during training or feature extraction.

This leads me to believe that CreateML uses 16-core Neural Engine to accelerate training. However, this isn't 100% confirmed since Activity Monitor doesn't disclose when the Neural Engine kicks in.

Now CreateML doesn't reveal what kind of model it uses either. Judging by the performance, my guess is it would be using at least a pretrained ResNet50 model or EfficientNetB2 and above or similar.

Experiment 3: CIFAR10 TinyVGG Model with TensorFlow Code

CreateML works fantastic but sometimes you'll want to be making your own machine learning models.

For that, you'll probably end up using a framework like TensorFlow.

I teach TensorFlow and code it almost every day. So I was excited to see how the new machines would perform here.

For all of the custom TensorFlow tests, all machines ran the same code with the same datasets with the same environment setup.

The only difference was between Google Colab and the Nvidia TITAN RTX versus each of the Macs.

Each of the Macs ran a combination of tensorflow-macos (TensorFlow for Mac) and tensorflow-metal for GPU acceleration. Whereas Google Colab and the Nvidia TITAN RTX used standard TensorFlow.

You can see the code for all experiments and TensorFlow on Mac setup on GitHub.

First TensorFlow experiment details:

Data: CIFAR10 from TensorFlow Datasets (32x32 images, 10 classes, 50,000 train, 10,000 test)
Model: TinyVGG (from CNN Explainer website)
Training: 10 epochs, batch size 32
Code: See it on GitHub

Machine	Chip	Num Epochs	Total time (seconds)	Time per epoch (average, seconds)
2019 MacBook Pro 16-inch	Intel	10	180	18
2020 MacBook Pro 13-inch	M1	10	137	14
2021 MacBook Pro 14-inch	M1 Pro	10	161	16
2021 MacBook Pro 14-inch	M1 Max	10	143	14
Google Colab*	Nvidia K80 GPU	10	167	17
Custom Deep Learning PC	Nvidia TITAN RTX	10	37	4

*For experiments, I use the free version of Google Colab, a fantastic offering from Google to use a Jupyter Notebook connected to a GPU with very little setup. The free version used to offer faster GPUs (Nvidia P100, T4), however, it's been a while since I've gotten access to one of those. You can get faster GPUs via Colab Pro but that isn't available where I live (Australia) yet.

It makes sense that the TITAN RTX outperformed the rest of the machines. It's a GPU purpose-built for machine learning and data science.

All of the Macs performed more or less within the same range in terms of time per epoch. With the M1 Max and plain M1 machines running the code in similar times.

bar graph showing results of different devices training with TensorFlow on the CIFAR10 dataset with the TinyVGG model architecture — Comparison of different devices tested with a smaller dataset (CIFAR10, 32x32 image size) and model architecture (TinyVGG) made with TensorFlow code.

Checking Activity Monitor for each of the M1 Macs saw a large amount of GPU usage during training. This is thanks to Apple's tensorflow-metal PluggableDevice, a software package to leverage Apple's Metal GPU framework to accelerate TensorFlow.

GPU usage in activity monitor for TensorFlow code running on M1 Pro, 91.9% of GPU usage during training a smaller tensorflow model — Across all of the Macs, there was plenty of GPU usage showing up in Activity Monitor during training thanks to `tensorflow-metal`. Screenshot from M1 Pro.

Experiment 4: Food101 EfficientNetB0 Feature Extraction with TensorFlow Code

The final machine learning experiment went larger scale. Using more images and larger images than Experiment 3.

One of the best ways to get fantastic results when training machine learning models is to use transfer learning.

Transfer learning leverages the weights (patterns) one model has learned on a similar dataset and allows you to use them for your own dataset.

For Experiment 4 I used transfer learning with an EfficientNetB0 model on the Food101 dataset.

Experiment details:

Data: Food101 from TensorFlow Datasets (224x224 images, 101 classes, ~75k train, ~25k test)
Model: EfficientNetB0 pretrained on ImageNet with top layer replaced (feature extraction)
Training: 5 epochs, batch size 32
Code: See it on GitHub

Machine	Chip	Num Epochs	Total time (seconds)	Time per epoch (average, seconds)
2019 MacBook Pro 16-inch*	Intel	5	3032	606
2020 MacBook Pro 13-inch	M1	5	3387	677
2021 MacBook Pro 14-inch	M1 Pro	5	1486	297
2021 MacBook Pro 14-inch	M1 Max	5	959	192
Google Colab	Nvidia K80 GPU	5	2122	424
Custom Deep Learning PC	Nvidia TITAN RTX	5	464	93

*The code run on the 16-inch MacBook Pro used the SGD optimizer instead of the Adam optimizer due to an unresolved issue with tensorflow-macos running on an Intel-based Mac I couldn't find the fix to.

Performance for different devices running TensorFlow code with EfficientNetB0 feature extractor model on the food101 dataset — Turns out the newer M1 Pro and M1 Max chips are faster than Google Colab's free offering (K80 GPU) for larger-scale models and datasets. The M1 Max is even not too far off a TITAN RTX.

What stands out most to me here is how performant the M1 Pro and M1 Max are when scaling up to larger experiments.

Experiment 3 saw similar results across all Macs but when the amount of data is increased (both in image size and number of images), the M1 Pro and M1 Max pull far ahead of the other Macs.

The M1 Pro and M1 Max even outperform Google Colab with a dedicated Nvidia GPU (~1.5x faster on the M1 Pro and ~2x faster on the M1 Max).

This means you could machine learning experiments on your local machine faster than you could with an online Colab notebook. Giving you all of the benefits of running locally. Though, Google Colab does give you the fantastic ability to share notebooks with a link.

And of course, the TITAN RTX performs the best but the M1 Max isn't far behind, which is very impressive for a portable device.

Activity Monitor showed plenty of GPU usage across all devices.

Activity Monitor GPU usage on M1 Max MacBook Pro during training of an EfficientNetB0 feature extractor model with the Food101 dataset — GPU usage across all Macs was high during training thanks to `tensorflow-metal`. It would be cool to see the Neural Engine come into play here though. The training on CreateML was very fast. I wonder if that could be leveraged for custom code. Screenshot from M1 Max.

Which one should you buy?

I've been using a 13-inch M1 MacBook Pro every day for the past year. I use for it for small-scale machine learning experiments, video editing and Google Colab browsers.

However, I'm now considering upgrading to a 14-inch MacBook Pro with the M1 Pro and doing everything locally (no more Google Colab) before scaling up on the TITAN RTX or cloud when I need.

From the tests, it's clear the 13-inch M1 MacBook Pro (and although not tested here, the M1 MacBook Air should still perform close to the M1 MacBook Pro results) is still an incredible laptop to start out on.

Though if you've got the budget, the M1 Pro will give you a noticeable performance boost.

As for the M1 Max, my tests struggle to justify the $1100USD price gap between it and the base M1 Pro. For me, I'd rather spend the money on extra local storage, RAM or put it toward a dedicated GPU.

In short, these are my recommendations:

13-inch M1 MacBook Pro/MacBook Air — Still a fantastic laptop for beginner machine learning and data science.
14-inch M1 Pro MacBook Pro — Noticeable performance boost from M1, worth it if you enjoy the new design or have the budget.
14-inch M1 Max MacBook Pro — Noticeable performance boost from M1 Pro only at large-scale model training, many other performance benchmarks hit the same levels as the M1 Pro. This may be an option if you find yourself often editing multiple streams of 4K video.

Conclusion

I'd like to see where and when Apple's dedicated chips are being used.

For example, it'd be cool to see when the Neural Engine comes into play. Such as when using CreateML (all M1 machines sped through training here like no tomorrow).

Or even better, how one could write code to leverage the neural engine themselves?

Last year, I said how about a 16-inch MacBook Pro with an M2 and Apple delivered an M1 Max. Perhaps the M2 is still on the way in the form of a Mac mini or new iMac.

If anything is clear, dedicated chips for certain purposes seem to be the trend. And the performance benefits speak for themselves.

Resources

See all of the code used for the testing and evaluation on GitHub.
See my video on how to setup your new M1 machine for machine learning and data science.
Watch a video version of this blog post on YouTube.