Three Essential Hyperparameter Tuning Techniques for Better Machine Learning Models

Learning (ML) model should not memorize the training data. Instead, it should learn well from the given training data so that it can generalize well to new, unseen data.

The default settings of an ML model may not work well for every type of problem that we try to solve. We need to manually adjust these settings for better results. Here, “settings” refer to hyperparameters.

What is a hyperparameter in an ML model?

The user manually defines a hyperparameter value before the training process, and it does not learn its value from the data during the model training process. Once defined, its value remains fixed until it is changed by the user.

We need to distinguish between a hyperparameter and a parameter.

A parameter learns its value from the given data, and its value depends on the values of hyperparameters. A parameter value is updated during the training process.

Here is an example of how different hyperparameter values affect the Support Vector Machine (SVM) model.

from sklearn.svm import SVC

clf_1 = SVC(kernel='linear')
clf_2 = SVC(C, kernel='poly', degree=3)
clf_3 = SVC(C, kernel='poly', degree=1)

Both clf_1 and clf_3 models perform SVM linear classification, while the clf_2 model performs non-linear classification. In this case, the user can perform both linear and non-linear classification tasks by changing the value of the ‘kernel’ hyperparameter in the SVC() class.

What is hyperparameter tuning?

Hyperparameter tuning is an iterative process of optimizing a model’s performance by finding the optimal values for hyperparameters without causing overfitting.

Sometimes, as in the above SVM example, the selection of some hyperparameters depends on the type of problem (regression or classification) that we want to solve. In that case, the user can simply set ‘linear’ for linear classification and ‘poly’ for non-linear classification. It is a simple selection.

However, for example, the user needs to use advanced searching methods to select the value for the ‘degree’ hyperparameter.

Before discussing searching methods, we need to understand two important definitions: hyperparameter search space and hyperparameter distribution.

Hyperparameter search space

The hyperparameter search space contains a set of possible hyperparameter value combinations defined by the user. The search will be limited to this space.

The search space can be n-dimensional, where n is a positive integer.

The number of dimensions in the search space is the number of hyperparameters. (e.g 3-dimensional — 3 hyperparameters).

The search space is defined as a Python dictionary which contains hyperparameter names as keys and values for those hyperparameters as lists of values.

search_space = {'hyparam_1':[val_1, val_2],
                'hyparam_2':[val_1, val_2],
                'hyparam_3':['str_val_1', 'str_val_2']}

Hyperparameter distribution

The underlying distribution of a hyperparameter is also important because it decides how each value will be tested during the tuning process. There are four types of popular distributions.

Uniform distribution: All possible values within the search space will be equally selected.
Log-uniform distribution: A logarithmic scale is applied to uniformly distributed values. This is useful when the range of hyperparameters is large.
Normal distribution: Values are distributed around a zero mean and a standard deviation of 1.
Log-normal distribution: A logarithmic scale is applied to normally distributed values. This is useful when the range of hyperparameters is large.

The choice of the distribution also depends on the type of value of the hyperparameter. A hyperparameter can take discrete or continuous values. A discrete value can be an integer or a string, while a continuous value always takes floating-point numbers.

from scipy.stats import randint, uniform, loguniform, norm

# Define the parameter distributions
param_distributions = {
    'hyparam_1': randint(low=50, high=75),
    'hyparam_2': uniform(loc=0.01, scale=0.19),
    'hyparam_3': loguniform(0.1, 1.0)
}

randint(50, 75): Selects random integers in between 50 and 74
uniform(0.01, 0.49): Selects floating-point numbers evenly between 0.01 and 0.5 (continuous uniform distribution)
loguniform(0.1, 1.0): Selects values between 0.1 and 1.0 on a log scale (log-uniform distribution)

Hyperparameter tuning methods

There are many different types of hyperparameter tuning methods. In this article, we will focus on only three methods that fall under the exhaustive search category. In an exhaustive search, the search algorithm exhaustively searches the entire search space. There are three methods in this category: manual search, grid search and random search.

Manual search

There is no search algorithm to perform a manual search. The user just sets some values based on instinct and sees the results. If the result is not good, the user tries another value and so on. The user learns from previous attempts will set better values in future attempts. Therefore, manual search falls under the informed search category.

There is no clear definition of the hyperparameter search space in manual search. This method can be time-consuming, but it may be useful when combined with other methods such as grid search or random search.

Manual search becomes difficult when we have to search two or more hyperparameters at once.

An example for manual search is that the user can simply set ‘linear’ for linear classification and ‘poly’ for non-linear classification in an SVM model.

from sklearn.svm import SVC

linear_clf = SVC(kernel='linear')
non_linear_clf = SVC(C, kernel='poly')

Grid search

In grid search, the search algorithm tests all possible hyperparameter combinations defined in the search space. Therefore, this method is a brute-force method. This method is time-consuming and requires more computational power, especially when the number of hyperparameters increases (curse of dimensionality).

To use this method effectively, we need to have a well-defined hyperparameter search space. Otherwise, we will waste a lot of time testing unnecessary combinations.

However, the user does not need to specify the distribution of hyperparameters.

The search algorithm does not learn from previous attempts (iterations) and therefore does not try better values in future attempts. Therefore, grid search falls under the uninformed search category.

Random search

In random search, the search algorithm randomly tests hyperparameter values in each iteration. Like in grid search, it does not learn from previous attempts and therefore does not try better values in future attempts. Therefore, random search also falls under uninformed search.

Grid search vs random search (Image by author)

Random search is much better than grid search when there is a large search space and we have no idea about the hyperparameter space. It is also considered computationally efficient.

When we provide the same size of hyperparameter space for grid search and random search, we can’t see much difference between the two. We have to define a bigger search space in order to take advantage of random search over grid search.

There are two ways to increase the size of the hyperparameter search space.

By increasing the dimensionality (adding new hyperparameters)
By widening the range of hyperparameters

It is recommended to define the underlying distribution for each hyperparameter. If not defined, the algorithm will use the default one, which is the uniform distribution in which all combinations will have the same probability of being chosen.

There are two important hyperparameters in the random search method itself!

n_iter: The number of iterations or the size of the random sample of hyperparameter combinations to test. Takes an integer. This trades off runtime vs quality of the output. We need to define this to allow the algorithm to test a random sample of combinations.
random_state: We need to define this hyperparameter to get the same output across multiple function calls.

The major disadvantage of random search is that it produces high variance across multiple function calls of different random states.

This is the end of today’s article.

Please let me know if you’ve any questions or feedback.

How about an AI course?

See you in the next article. Happy learning to you!

Designed and written by:
Rukshan Pramoditha

2025–08–22