Things I Wish I Had Known Before Starting ML

, I shared a few lessons that would have made my ML journey smoother. Writing that previous article started as a reflection while lying on a beach somewhere along the Mediterranean Sea, away from the noise of daily work. It turns out, space, silence, and sea have a way of bringing up a list of things that I wish I had known before starting ML.

This article is part two of that list. In my previous article, I discussed that (1) doing ML primarily means preparing, (2) papers are like sales pitches, (3) bug fixing is the way forward, and (4) most works (including mine) won’t make that breakthrough.

The present article has slightly broader principles—less about specific pain points in ML, more about mindsets.

5. You Need (Flexible) Boundaries

Machine learning moves fast. New papers are published every day. Some are quietly uploaded (to, e.g., arXiv), while others come with press releases and fancy demos. It’s natural to want to stay on top of it all—to keep up with the latest trends and breakthroughs.

But there’s a problem: if you try to keep up with everything, you’ll end up keeping up with nothing. The field is simply too big, too fragmented, too fast.

Think of the recent Nobel laureates, Geoffrey Hinton, Demis Hassabis, and John Jumper. All were awarded (shares of) Nobels for bringing the field of AI forward. The laureates did not earn these highly-sought-after prizes by being on top of every trend. In fact, as many other famed researchers, many of them went deep into their own corner of the world.

Richard Feynman, another Nobel winner, famously avoided fads. He deliberately stepped back from mainstream physics to explore areas that interested him deeply, to make “real good physics.”

It’s understandable to want to stay on the cutting edge. But, the definition of cutting edge is per sé a constantly moving area: like the waves that form on a pond once you throw in a stone. If you’re always surfing the outmost wave, you’ll lose connection to the innermost area.

Instead, what you need are boundaries. Not as fences, but as guardrails. They keep you in the right direction. They let you go deep while still allowing space for surprising departures. Within your chosen focus area, you’ll still encounter new problems, new papers, new angles—but they all will be connected to your core field.

Guardrails allow you to apply a filter to all the things that you see: yes, no, yes, yes, no.

Take my own field—continual learning—as an example. It’s already overwhelming. Just looking at recent papers listed on GitHub shows how much gets published at each major conference. And that’s only within CL! Now imagine trying to stay on-top of CL and GenAI. And LLMS. And …

Impossible.

6. Research Code Is Just That: Research Code

Writing ML algorithms is an essential part of machine learning work. But not all code is created equal. There’s production code—the kind used in apps, services, and end-user systems—and then there’s research code.

Research code has a different goal. It doesn’t need to be cleanly abstracted, deeply modularized, or prepared for long-term maintenance. It needs to work, help you test your hypotheses, and let you iterate fast.

When I started, I often spent time worrying about whether my code was “elegant” enough. I then spent precious coding hours refactoring, restructuring, and turning research projects into object-oriented software paradigms. But, a lot of times that was unnecessary.

Of course, code should be readable, documented (for your future self, if anybody), and decently structured. But it does not have to be perfect. It doesn’t need to be “production-grade.” Most of the time, you’re the only user (which is perfectly fine, see my previous post). And in many cases, the code won’t live past the end of the project..

So, if your code does what it should do: fine. Keep as-is and turn to the next project.

7. Read Broadly, Read Deeply

In November 2002, an unassuming math paper was uploaded to arXiv. Its title: The entropy formula for the Ricci flow and its geometric applications. The author was a reclusive Russian mathematician, Grigory Perelman.

That paper—and the two follow-ups he posted in the next year—later* turned out to contain the long-awaited proof of the Poincaré conjecture, one of the most famous, then-unsolved, problems in mathematics. In the years after, Perelman declined both the Fields Medal and the $1 million Millennium Prize for his work, further adding to his image as a one-of-a-kind mathematician.**

What struck me about this story, apart from the appeal that the story of scientific breakthroughs naturally have, is that it all began with a simple arXiv submission.

In the last two decades, the way scholarly work is shared has changed dramatically. arXiv, as the best-known preprint platform, has made research more accessible and faster to spread. According to arXiv’s own stats, computer science (CS) has exploded in submission volume over the years:

The yearly number of submissions to the category CS – orange line – strongly grows over time. Image by the author; freely re-doable at https://tableau.cornell.edu/t/PublicContent/views/arXivSubmissions/LineGraphByArchive

There’s more to read than ever before. And if you try to read everything, you’ll end up understanding very little. In my experience, you’re better off choosing a focus area, reading deeply within it, and supplementing that with occasional reads from adjacent fields.

For example, my main area is continual learning. There’s far too much being published for me to read everything—even just within CL. But I can read around it.

Continual learning is about adapting a model to new domains over time, without forgetting previous ones. That naturally connects to other fields:

Domain adaptation (DA), which focuses on adapting to new domains—though often without caring about old domains
Test-time adaptation (TTA), which adapts models on the fly, using only test data
Optimization methods, especially those that help balance stability and plasticity—exactly the trade-off we care about in CL

Reading in those areas gives new ideas. But having a deep foundation in CL gives me the context to understand what’s useful and how it might transfer.

So yes, read broadly. But don’t do it at the cost of depth. The good ideas often come not from reading more, but from seeing connections more clearly. And that requires going deep — smoothly connecting to my 6.5 years lookback article.

Footnotes

* later: simply because the problem was so complex, and the proof so complicated, that it took several brilliant minds to proof the proof. Wikipedia has a good coverage of the story, as interesting as mathematics can get.

** another one was Paul Erdős

Things I Wish I Had Known Before Starting ML

5. You Need (Flexible) Boundaries

6. Research Code Is Just That: Research Code

7. Read Broadly, Read Deeply

Links

Footnotes

Leave a Reply Cancel reply

Things I Wish I Had Known Before Starting ML

5. You Need (Flexible) Boundaries

6. Research Code Is Just That: Research Code

7. Read Broadly, Read Deeply

Links

Footnotes

Related Posts

The Importance of Visualization in Data Storytelling

Claude AI Ranks In Top 3% At Student Hacking Contest

Leave a Reply Cancel reply