In recent years, there has been an explosion of innovation in the field of machine learning. Systems like DALL-E 2 and Whisper AI have allowed us to do things with machine learning algorithms that we couldn’t even imagine before: incredibly accurate transcription, text-to-image generation, and even text-to-video generation. It’s an exciting time to be a practitioner of artificial intelligence and machine learning!
But what if you’re just starting out? Seeing a new innovative model every day can be intimidating, and it can feel like you’ll never catch up. All new journeys are intimidating, but we should remember the proverb: "The journey of a thousand miles begins with one step."
To get to the cutting edge of machine learning, we need to have a deep understanding and appreciation for its foundation. This guide lays out this foundation by exploring several important algorithms, so you can carefully plan, study and know that you’re making progress on your journey.
Let’s get into the algorithms.
Linear Regression
Linear regression is often the first machine learning algorithm that students learn about. It’s easy to dismiss linear regression because it seems simplistic, but its simplicity is what makes it so widely used. A linear regression model looks like the following:
$$
Y = beta_0 + beta_1 X + epsilon
$$
The model is actually a reflection of how we think as humans! If you’ve ever thought, "The more I study, the better my grades should be," then you’ve mentally used linear regression!
Linear regression models are our first introduction machine learning because they allow us to use one variable can be used to predict an another one in an intuitive way. For an increase in $X$, we get a proportional change in $Y$. If you’ve ever used the phrase, "The more the better", then you’ve implicitly used a linear regression!
For example, we might want to predict insurance costs based on characteristics of the patient. Insurance costs are hard to know ahead of time, but patient characteristics are easier to see and measure, so linear regressions allow us to connect these two. If you want to know more about linear regression, you can refer to our Linear Regression course!
Logistic Regression
Logistic regression is similar to linear regression, but instead of trying to predict a number, this model is trying to predict a binary class. A binary class has values of either 0 or 1, which typically translate to "no" and "yes." For example, trying to predict disease status based on laboratory tests is a classification problem. Classification is one of the central problems of machine learning, and logistic regression is our first solution to this particular problem.
Logistic regression has a similar form to linear regression, with a slight tweak:
$$
text{sigmoid}(Y) = beta_0 + beta_1 X + epsilon
$$
The features ($X$) are still in a linear form, but they are transformed or "squashed" to be between 0 and 1. The sigmoid function performs this transformation. If this squashed value is below 0.5, logistic function classifies the observation as 0, and 1 otherwise.
Linear and logistic regression are important because they lay out the central goals of machine learning: prediction of values, both continuous and categorical. Knowing this, we can start to learn other algorithms and see how they approach these goals differently. If you’d like to learn more about logistic regression, consider looking at our own course or read more about how to quickly implement it in Python.
K-Means
The linear and logistic models are considered to be the introductory algorithms for supervised learning in regression and classification. Supervised learning is a branch of machine learning in which we’ve observed the outcome $Y$ and can use these observations $X$ to train the model. There’s also a branch of machine learning in which we don’t observe the outcome and want to try to learn about the inherent patterns in the data. This is called unsupervised learning.
A great introductory model for unsupervised learning is the K-means algorithm. Given a dataset and a number of clusters, the K-means algorithm tries to classify each observation into one of the clusters. The algorithm does this by finding the centers, or means, of the clusters and assigning each observation to the mean that it’s closest to. Then, the algorithm recalculates the centers and repeats until the classifications don’t change.
In other words, the K-means algorithm seeks to find the underlying groups in the data, based on the idea that observations close together are also related. This idea comes up repeatedly in machine learning and is crucial to understanding more sophisticated algorithms. Dataquest’s own CEO, Vik Paruchuri, has created an in-depth video on implementing K-Means in Python, and we encourage you to have a look.
Support Vector Machines
Support Vector Machines (SVMs) are another algorithm that can perform either classification or regression. For this article, we’ll refer to classification. Logistic regression uses a sigmoid function to classify objects, whereas support vector machines try to draw a line that best separates the classes. Both sides of the line are dedicated to the two classes, although SVMs are capable of multiclass tasks as well.
The above metaphor shows that the red line divides the two sides. SVMs work by constructing this line for your dataset. One advantage that SVMs have over simpler models like logistic regression is that this "separating line" approach can easily be extended to higher dimensions. Instead of a separating line, we get a hyperplane that separates the classes in the same way as in the diagram.
In addition, both linear and logistic regression assume some kind of linear structure, and this assumption is too simplistic for more complex ML problems. Support vector machines are also important because they allow us to move beyond this linearity assumption. They highlight that as problems get more complex, we may be required to approach the problem with more flexible models.
Random Forest
The next class of algorithm that is important to learn in machine learning is the random forest algorithm. Before we understand the random forest, we must understand decision trees. We have a simple example of a decision tree below:
The first decision in this tree looks at a column named A
. If A
is not red
, then the decision tree predicts that the observation has a class of 0
. If A
has the value red
, we look at the B
column and make a second decision here. If B
is greater than 100
, then we predict the class to be 1
, and 0
otherwise. Decision trees can incorporate as many branches as needed to improve model performance.
However, a single decision tree is usually not a great machine learning model by itself, which is why we haven’t dedicated an entire section to it. We can extend decision trees by using many different trees at the same time, each using a slightly different set of decision rules. This collection of decision trees forms our random forest. In order to make a prediction, a random forest looks at the predictions of its individual decision tree and uses the "wisdom of the crowd", choosing the prediction that the majority pick.
Random forests are an example of what we call ensemble models, models that are constructed from many simpler models. Random forests have been shown to be incredibly accurate predictors, and they demonstrate the importance of power of using multiple, simple models at once. We’ve gone beyond using single models to using groups of models instead. An interested reader can refer to our random forests course to learn more!
Regularized Models
The ultimate goal of any machine learning model is to "learn" the underlying relationship between the features and the outcome. However, if we aren’t careful, a model can simply learn how to predict on the data that it was trained on and perform terribly on new data. This is the problem of overfitting, and it’s the reason that we’ve included regularized models on this list. Regularized models extend our knowledge over the previous models because they allow us to start tackling higher dimension problems.
One example of a regularized model is the LASSO, a type of constrained linear regression. Regularizing prevents the parameters $beta_0$ and $beta_1$ from being too large, which prevents them from being tuned precisely to the training data. In LASSO, unimportant features can be reduced to zero, meaning that it can be used to perform feature selection. To perform this regularization, LASSO adds an additional penalty term to the cost function for linear regression. When we minimize this new cost function, consisting of the mean squared error plus the penalty term, we get reduced coefficients.
$$
text{Cost} = text{MSE Loss} + text{LASSO Penalty}
$$
Another example of a regularized model is the ridge regression. Like LASSO, ridge regression reduces how large the parameters of a linear regression can be. However, unlike LASSO, it cannot reduce parameter values to zero.
Regularized models are also important when we start veering into high-dimensional data, where the number of features outnumbers the number of observations. Traditional methods break down here, so these new regularized methods step in.
Neural Networks
The final machine learning model that’s integral for the ML practitioner is the neural network. As their name suggests, neural networks incorporate interconnected layers. These can learn complicated patterns in the data. Neural networks are designed to learn complicated, non-linear functions between inputs and outputs. Most famously, neural networks perform well on identifying handwritten digits, based on the MNIST datasets. The other algorithms mentioned on this list come nowhere near close to the performance of neural networks.
For simplicity, we’ll talk about a dense neural network, where all of the units from the previous layer are connected to all of the units in the next layer, as shown in the diagram below. A unit represents a numerical value. Neural networks can be divided into 3 sections: an input layer, a set of hidden layers, and an output layer. The input layer contains the data from the original input that we want to use to predict the outcome. The hidden layers in the middle are what enable neural networks to capture non-linear relationships. Each unit in a hidden layer is a linear combination of the units from the previous layer given to an activation function that decides the final value of that hidden unit.
Neural networks also create a jumping off point into deep learning. The structure of neural networks can take many forms which allow different functions, such as remembering sequences. Understanding these different structures is crucial for understanding cutting edge algorithms, such as the ones used in DALL-E 2. But to really understand everything behind a neural network, we must first understand the concepts and principles behind its simpler cousins.
So what now?
Througout this post, we’ve highlighted the different areas of machine learning by introducing different algorithms. By getting an overview of the different areas of the field, we can take a principled approach to learning that will get us a deep, seasoned understanding of machine learning and will serve us much better than diving blindly into the most modern algorithms. Studying and mastering these concepts takes time, dedication, and grit. At Dataquest, we believe that mastery comes with working with code and testing your knowledge via projects. Projects force you to think about what you’ve learned and challenge you to use them in a new context.
Employers are always looking for demonstrated skills rather than empty words on a resume. Machine learning projects are the perfect platform for showing off your skills and coding prowess. If you can go further and explain and share your project with others and develop a robust portfolio, you’ll shine and stand out among competing applicants. If you’d like to practice more, here are some ML projects that could give you inspiration. If you prefer tutorials you can follow along with, you can also learn how to predict the stock market or the weather on Youtube.
Dataquest provides a career path of courses for those interested in learning and applying machine learning. You’ll create many projects along the way and quickly gain the knowledge needed to be competitive. Dataquest students have been hired at companies like Accenture and SpaceX, among many others.
If you’re excited and eager to learn, we’re happy to go on that journey with you! If you’re curious to learn more, feel free to explore our online community and see what other students have done.