*Note: This article is dedicated to the Google Cloud Professional Data Engineer Certification exam before March 29, 2019. After this date, there were some updates. I have included these in the Extras section*
So you want to get a fresh hoodie like the one I have in the cover photo?
Or you’ve been looking at getting Google Cloud Professional Data Engineer Certified and you’re wondering how to do it.
Over the past few months, I’ve been taking courses alongside using Google Cloud to prepare for the Professional Data Engineer exam. Then I took it. And I passed. And a few weeks later my hoodie arrived. The certificate came quicker.
This article will list out a few things you may want to know and the steps I took to acquiring the Google Cloud Professional Data Engineer Certification.
Why would you want to do a Google Cloud Professional Data Engineer Certification?
Data is everywhere. And knowing how to build systems which can handle and utilise data is in demand. Google Cloud provides the infrastructure to build these systems.
You may already have the skills to use Google Cloud already but how do you demonstrate this to a future employer or client? Two ways. Through a portfolio of projects or a certification.
A certificate says to future clients and employers, ‘Hey, I’ve got the skills and I’ve put in the effort to get accredited.’
Google’s one-liner sums it up.
Demonstrate your proficiency to design and build data processing systems and create machine learning models on Google Cloud Platform.
If you don’t have the skills already, going through the learning materials for the certification means you’ll learn all about how to build world-class data processing systems on Google Cloud.
Who would want to do a Google Cloud Professional Data Engineer Certification?
You’ve seen the figures. The cloud is growing. And it’s here to stay. If you haven’t seen the figures, trust the cloud is growing.
If you’re already a data scientist, a data engineer, data analyst, machine learning engineer or looking for a career change into the world of data, the Google Cloud Professional Data Engineer Certification is for you.
Being able to use cloud technologies is becoming a requirement for any kind of data focused role.
Do you need the certificate to be a good data engineer/data scientist/machine learning engineer?
You can still use Google Cloud to work on data solutions without the certificate.
A certificate is only one validation method of existing skills.
How much does it cost?
To sit the certification exam costs $200 USD. If you fail, you will have to pay the fee again to resit.
There are costs associated with the preparation courses and using the platform itself.
Platform costs are what you’ll be charged for using Google Cloud’s services. If you are an avid user, you’ll be well aware of these. If not, and you’re only going through the training materials in this article, you could create a new Google Cloud account and complete them all well within the $300 credits Google offers on sign up.
We’ll get to course costs in a second.
How long does the certification last?
2-years. After that, you’ll need to take the exam again.
And since Google Cloud is evolving every day, it’s likely what’s required for the certificate has changed (as I found out was the case when I started writing this article).
What do you need to get ready for the exam?
Google recommends 3+ years of industry experience and 1+ years designing and managing solutions using GCP for professional level certifications.
I didn’t have either of these.
It was more like 6-months of each.
To supplement this, I utilised a combination of online training resources.
What courses did I take?
If you’re like me and don’t have the recommended requirements, you may want to look into some of the following courses to upskill yourself.
The following courses are what I used to prepare for the certification. They’re listed in order of completion.
I’ve listed the costs, timelines and helpfulness towards passing the certification exam for each.
Cost: $49 USD per month (after 7-day free trial)
Time: 1–2 months, 10+ hours per week
The Data Engineering on Google Cloud Platform Specilization on Coursera is made in collaboration with Google Cloud.
It’s broken into five sub-courses, each of which takes about 10-hours per week worth of study time.
If you’re unfamiliar with Data Processing on Google Cloud, this Specialization is like a 0 to 1. You’ll go through a range of practical exercises using an iterative platform called QwikLabs. Prior to these, will be lectures led by Google Cloud practitioners on how to use different services such as Google BigQuery, Cloud Dataproc, Dataflow and Bigtable.
Time: 1week, 4–6 hours
Don’t take the low helpfulness score as this course being useless. It’s far from it. The only reason it gets a lower score is it’s not focused on the Professional Data Engineer Certification (this could be gathered from the title).
I took this as a refresher after completing the Coursera Specialization because I’d only been using Google Cloud for a few specialised use cases.
If you’re coming from another cloud service provider or have never used Google Cloud before, you may want to take this course. It’s a great introduction to Google Cloud Platform as a whole.
Note: As of 2021, Linux Academy is now part of A Cloud Guru, the link has been updated to reflect this.
Cost: $49 USD per month (after 7-day free trial)
Time: 1–4 weeks, 4+ hours per week
After completing the exam and reflecting back on the courses I’d done, the Linux Academy Google Certified Professional Data Engineer was the most helpful.
The videos, along with the Data Dossier eBook (a great free learning resource which came with the course) and the practice exams made the course one of the best learning resources I’ve ever used.
I even recommended it as the go-to resource in some Slack notes to the team after the exam.
• Some things on the exam weren’t in Linux Academy (now A Cloud Guru) or A Cloud Guru or the Google Cloud Practice exams (expected)
• 1 question with a graph of data points and what equation you’d need to cluster them (e.g. cos(X) or X²+Y²)
• Knowing the difference between Dataflow, Dataproc, Datastore, Bigtable, BigQuery, Pub/Sub and how they can each be used is a must
• The two case studies in the exam are the exact same as the ones in the practice, though I didn’t read the studies at all during the exam (the questions gave enough insight)
• Knowing some basic SQL query syntax is very helpful, especially for the BigQuery questions
• The practice exams provided by Linux Academy (now A Cloud Guru) and GCP are very similar style questions to the exam, I’d do each of these multiple times and use them to figure out where you’re weak
• A little rhyme to help with Dataproc: “Dataproc the croc and Hadoop the elephant plan to Spark a fire and cook a Hive of Pigs” (Dataproc deals with Hadoop, Spark, Hive and Pig)
• “Dataflow is a flowing Beam of light” (Dataflow deals with Apache Beam)
• “Everyone around the world can relate to a well-made ACID washed Spanner.” (Cloud Spanner is a DB designed for the cloud from the ground up, it’s ACID compliant and globally available)
• Handy to know the names old school equivalents of relational and non-relational database options (e.g. MongoDB, Cassandra)
• IAM roles are slightly different for each service but understanding how to seperate users from being able to see data versus design workflows is helpful (e.g. Dataflow Worker role can design workflows but not see the data)
This is probably enough for now. Mileage will probably vary from each exam. Linux Academy’s (now A Cloud Guru) course will supply 80% of the knowledge.
Time: 1–2 hours
These were recommended on the A Cloud Guru forums. Many of them weren’t related to the Professional Data Engineer Certification however I cherry-picked some of the ones I recognised.
Some of the services can seem complex when going through a course, so it was good to hear a particular service described in a minute.
Cost: $49 USD for the certificate or free (no certificate)
Timeline: 1–2 weeks, 6+ hours per week
I found this resource the day before my exam was scheduled. I didn’t do it due to time restrictions, hence the lack of helpfulness rating.
However, after going through the course overview page it looks like a great resource to bring together all the things you’ve been learning about Data Engineering on Google Cloud and to highlight any weak points.
I sent this course as a resource to one of my colleagues who’s preparing for the certification.
This was another resource I stumbled upon after the exam. I took a look at it and it’s comprehensive yet concise. Plus, it’s free. This could be used as something to read over in between practice exams or even after the certification to remind yourself.
Get Google Cloud Certified by Sam Lee
Cost: $39 per course ($49 for all 3)
Sam is a big data engineer and web developer who was taken his knowledge of Google Cloud and put into course form. There are three different courses including the Professional Cloud Architect, Professional Data Engineer and the Associate Cloud Engineer. This is a one-stop-shop for all the Google Cloud Certification you need.
What did I do after the courses?
After getting close to completing the courses, I booked the exam with a week’s notice.
Having a deadline is a great motivation for going over what you’ve learned.
I went through the practice exams from Linux Academy and Google Cloud multiple times each until I could complete them at 95%+ accuracy every time.
The quizzes from each platform are similar but I found going over the answers I kept getting wrong and writing down why I got them wrong helped fix my weak points.
The exam I took used designing data processing systems on Google Cloud for two case studies as the theme (this has changed since March 29, 2019). And was multiple choice the whole way through.
It took me about 2-hours. And was about 20% harder than any of the practice exams I’d taken.
I can’t stress the value of the practice exams enough.
What would I change if I went to do it again?
More practice exams. More practical knowledge.
Of course, there’s always more preparation you could do.
The recommended requirements do list 3+ years of using GCP. But I didn’t have this so I had to deal with what I had.
The exam was updated on March 29. The materials in this article will still give you a good foundation however, it’s important to note some changes.
Different sections of the Google Cloud Professional Data Engineer Exam (Version 1)
1. Designing data processing systems
2. Building and maintaining data structures and databases
3. Analysing data and enabling machine learning
4. Modelling business processes for analysis and optimisation
5. Ensuring reliability
6. Visualizing data and advocating policy
7. Designing for security and compliance
Different sections of the Google Cloud Professional Data Engineer Exam (Version 2)
1. Designing data processing systems
2. Building and Operationalizing Data Processing Systems
3. Operationalizing Machine Learning Models (most of the changes have happened here) [NEW]
4. Ensuring Solution Quality
Version 2 has combined section 1, 2, 4 and 6 of Version 1 into 1 and 2. It has also combined section 5 and 7 from Version 1 into section 4. And section 3 of Version 2 has been expanded to encompass all of Google Cloud’s new machine learning capabilities.
Because these changes have occurred so recently, many training materials have not had a chance to be updated.
However, going through the materials in this article should be enough to cover 70% of what you need. I’d combine it with some of your own research on the following (these were introduced in Version 2 of the exam).
- Google Machine Learning (ML) APIs
- Google Cloud Machine Learning Engine
- Google Cloud TPUs (a custom piece of hardware Google has built specifically for ML training)
- Google Glossary of ML terms
As you can see the latest update to the exam had a big focus on Google Cloud’s ML capabilities.
Update 29/04/2019: a message from the Linux Academy (now A Cloud Guru) course instructor Matthew Ulasien.
Just an FYI, we are planning on updating the Data Engineer course on Linux Academy to reflect the new objectives starting sometime in mid/late May.
Update 1/6/2019: another message from the Linux Academy (now A Cloud Guru) course instructor Matthew Ulasien.
We are in the planning stages now. I’m guesstimating it will take about a month to fully update it.
After the exam
When you complete the exam you’ll only receive a pass or fail result. The advice is to aim for at least 70%, hence why I aimed for a minimum of 90% on the practice exams.
Once you’ve passed, you’ll be emailed a redemption code alongside your official Google Cloud Professional Data Engineer certificate. Congratulations!
You can use the redemption code on an exclusive Google Cloud Professional Data Engineer store which is packed with swag. There are t-shirts, backpacks and hoodies (these may vary in stock when you get there). I chose the hoodie.
Now you’re certified you can now show off your skillset (officially) and get back to doing what you do best, building.
See you in two years to get recertified.
PPS a big thank you to all the amazing instructors throughout the above courses and Max Kelsen for providing resources and time to study and prepare for the exam.