The 10 Commandments of Self-Taught Machine Learning Engineers
Principles for using math, code and data to seduce Mother Nature into revealing her secrets.
The terms commandments and self-taught are not to be taken lightly. One must be in charge of their own education and enlightenment. Ignore this and someone else will choose it for you.
1. Math, Code and Data are your holy trinity
Any effective machine learning pipeline leverages the crossover of mathematics, code and data. Each is only as effective as the other.
If your data is of poor quality, it does not matter how elegant your mathematics or efficient your code is.
If your data is of the highest quality but your mathematics is off, expect your results to disappoint, or worse, harm.
If your data and mathematics are world-class but your code is inefficient, you will fail to reap the benefits of scale.
Data provides you with a mining site containing the gems of nature. Mathematics is your pickaxe. And code allows you to create a fleet of pickaxe-wielding machines.
The trinity is the bridge between the inputs (what you have) and outputs (what you want) of your system.
Note: The mathematics branch of the trinity also contains statistics and probability. If you do not like this, think of the trinity as a starfish.
2. The only exception to the trinity
Other than failing to balance the trinity, there is only one greater sin: forgetting who the trinity is set to serve. Even the most well executed code, powered by the most graceful math, deriving insights on the most bountiful data means nothing if it does not serve your customer.
Too often an engineer will find themselves lost in a process; forgetting the outcome. Despite proceeding with their best intentions, they forget intentions do not matter as much as actions.
A less performant manifestation of the trinity which provides benefits to a customer is better than a performant solution which offers nothing at all.
To be clear, if your state of the art model takes 47x longer to run for a 1% boost in accuracy, is it delivering the best experience?
3. Do not be fooled by the trinity
However much you worship the trinity, you should not be blinded by your love.
The self-taught machine learning engineer is their own biggest sceptic.
They know data cannot prove, only disprove (all it takes it for one data point in a billion to prove a concept previously thought important wrong), a little bit of poor mathematics can have extreme consequences (nature is not linear) and code is only as efficient as it’s weakest point.
However holy the trinity, a gut feeling should not be ignored. If a result seems too good to be true, unless you’ve gotten lucky, it probably is.
4. Keep your relationship with those who you seek to serve holy
Let the machines do what they’re good at (repeating processes over and over again). All the while you do what you’re good at (caring, empathising, questioning, listening, leading, teaching).
Your customers do not care about the trinity as much as you do. They care about whether or not their needs are being met.
Complicated questions, such as those surrounding data ethics should be answered with the Silver Rule: do not do to others what you would like not done to you.
5. Pay respects to those who have laid the foundations for you
When you think of the fields of computing, machine learning, artificial intelligence, mathematics, whose names come to mind?
Ada Lovelace, Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Alan Turing, Fei Fei Li, Grace Hopper, Andrew Ng, Jon Von Neumann, Alan Kay, Stuart Russel, Peter Norvig?
Of course, for all the names you hear or remember, there are 1000s which have contributed but have slipped out of the history books.
The up and comer should recognize the monumental efforts contributed by those who have come before them but should also recognize each and every one of them would tell the newly minted machine learning engineer the same thing: the future of the field depends on your work.
6. Do not underestimate the power of a complete rewrite
You shall aim to build things reliably the first time. But as your skill improves, you may revisit old creations, tear them down and recreate them with a new perspective.
The self-taught machine learning engineer understands that like nature, software and machine learning projects are never done, they are constantly in motion. Data changes, code gets executed on new hardware, a genius discovers a computationally efficient, lower memory dependent optimizer suitable for large datasets and calls it Adam.
You should not only be open to these changes, you should welcome them. And once they arrive, use your best judgement as to whether or not they are worth implementing into your system — just because something is new, does not mean it is required.
7. Avoid being a tool whore
A common anecdote in the programming world is painting the bike shed. It speaks of a programmer, or team of programmers, worrying about what colour a bike shed should be rather than asking important questions such as whether or not the shed can actually store bikes.
Of course, the bike shed can be subbed out with a computer program which serves some purpose.
In the machine learning world, you will hear an endless debate between R or Python, TensorFlow or PyTorch, books or courses, math or code first (both, remember the trinity), Spark or Hadoop, Amazon Web Services or Google Cloud Platform, VSCode or Jupyter, Nvidia or... actually there’s no real alternative here.
All valid comparisons but none worth arguing with the other side over.
The real question you should be answering is: what allows me to build my ideas in the fastest, most reliable manner?
And once you ask yourself this, you’ll realise everyone else is asking themselves the same question.
The curse of the engineer is starting with a tool and searching for a problem instead of starting with a problem, then searching for a tool. It is then and only then, if the right tool doesn’t exist, it should be built.
The same can be said for educational resources. The holy trinity of math, code and data, is invariant of where you learn it from, it only matters how you put it to use.
Do not forget: many problems can be solved without machine learning.
8. Your ideas are commodities
Do not confuse someone acting upon a good idea with someone stealing your idea. Your ideas have far more value in the hands of others than in your head.
Your role as an engineer is to not only build your ideas but to communicate and show others how they may benefit from them. If you lack such communication abilities, you should partner with someone who does or seek to develop them.
In a world where no one knows what to believe, you can differentiate yourself by being authentic. Be honest about what your creations have to offer and what you do not know. An ability to admit what one does not know is a strength, not a weakness.
Good technology always wins, lying never wins. Build technology. Don’t lie.
9. Your neighbour, colleagues, classmates and comrades are figuring it out too
Do you see the progress of others and get jealous? Or do you see it as inspiration for what you could potentially do?
How you feel the success of others is how you feel about yourself.
10. You shall not covet
You should seek to build your skill at applying the trinity and answering the questions of those you seek to serve but you should not do so with desire. Desire places a curse on you to forever take the future too seriously rather than enjoying what you have right now.
The cure for the desire of improved skill is to develop a love for learning.
The self-taught machine learning engineer is fast to learn the concepts required to harness the power of math, code and data but is never in a rush. They understand learning any worthwhile craft takes time and if this is the case, they might as well enjoy it.
Recall from the beginning, you are the one responsible for your own enlightenment and education. Knowing this, you should pick projects in which will come out ahead no matter what kind of luck you have. Does the project satisfy your curiosity? Does it challenge your skillset? Does it allow you to adhere to the commandments? If so, it is enough.
Finally, all the while dancing along their own path, the self-taught machine learning engineer, keeps fresh in their mind:
- No certifications without knowledge.
- No (over)thinking without doing.
- No learning without enjoyment.
- No creations without style.
- No skill without practice.
- No tools without purpose.
- No showboating without shipping.
- No assumptions without scepticism.
- No consumption without contribution.
- No desire for the future without love for the present.
And above all, no machine learning with the trinity.
This article originally appeared as an issue of Eat, Move, Learn, Make — a newsletter for hungry, active, curious creators. If the previous sentence describes you, you should sign up.