, we explored trend, seasonality, and residuals using temperature data as our example. We started by uncovering patterns in the data with Python’s seasonal_decompose
method. Next, we made our first temperature forecasts using standard baseline models like the seasonal naive.
From there, we went deeper and learned how seasonal_decompose
actually computes the trend, seasonality and residual components.
We extracted those pieces to build a decomposition-based baseline model and then experimented with custom baselines tailored to our data.
Finally, we evaluated each model using Mean Absolute Percentage Error (MAPE) to see how well our approaches performed.
In the first two parts, we worked with temperature data, a relatively simple dataset where the trend and seasonality were clear and seasonal_decompose
did a good job of capturing those patterns.
However, in many real-world datasets, things aren’t always so straightforward. Trends and seasonal patterns can shift or get messy, and in these cases, seasonal_decompose
may not capture the underlying structure as effectively.
This is where we turn to a more advanced decomposition method to better understand the data: STL — Seasonal-Trend decomposition using LOESS.
LOESS stands for Locally Estimated Scatterplot Smoothing.
To better understand this in action, we’ll use the Retail Sales of Department Stores dataset from FRED (Federal Reserve Economic Data).
Here’s what the data looks like:
The dataset we’re working with tracks monthly retail sales from U.S. department stores, and it comes from the trusted FRED (Federal Reserve Economic Data) source.
It has just two columns:
Observation_Date
– the beginning of each monthRetail_Sales
– total sales for that month, in millions of dollars
The time series runs from January 1992 all the way to March 2025, giving us over 30 years of sales data to explore.
Note: Even though each date marks the start of the month (like 01-01-1992
), the sales value represents the total sales for the entire month.
But before jumping into STL, we will run the classic seasonal_decompose
on our dataset and take a look at what it shows us.
Code:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
# Load the dataset
df = pd.read_csv("C:/RSDSELDN.csv", parse_dates=['Observation_Date'], dayfirst=True)
# Set the date column as index
df.set_index('Observation_Date', inplace=True)
# Set monthly frequency
df = df.asfreq('MS') # MS = Month Start
# Extract the series
series = df['Retail_Sales']
# Apply classical seasonal decomposition
result = seasonal_decompose(series, model='additive', period=12)
# Plot with custom colors
fig, axs = plt.subplots(4, 1, figsize=(12, 8), sharex=True)
axs[0].plot(result.observed, color='olive')
axs[0].set_title('Observed')
axs[1].plot(result.trend, color='darkslateblue')
axs[1].set_title('Trend')
axs[2].plot(result.seasonal, color='darkcyan')
axs[2].set_title('Seasonal')
axs[3].plot(result.resid, color='peru')
axs[3].set_title('Residual')
plt.suptitle('Classical Seasonal Decomposition (Additive)', fontsize=16)
plt.tight_layout()
plt.show()
Plot:

The observed series shows a gradual decline in overall sales. However, the seasonal component remains fixed across time — a limitation of classical decomposition, which assumes that seasonal patterns do not change, even when real-world behavior evolves.
In Part 2, we explored how seasonal_decompose
computes trend and seasonal components under the assumption of a fixed, repeating seasonal structure.
However, real-world data doesn’t always follow a fixed pattern. Trends may change gradually and seasonal behaviors can vary year to year. This is why we need a more adaptable approach, and STL decomposition offers exactly that.
We will apply STL decomposition to the data to observe how it handles shifting trends and seasonality.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL
# Load the dataset
df = pd.read_csv("C:/RSDSELDN.csv", parse_dates=['Observation_Date'], dayfirst=True)
df.set_index('Observation_Date', inplace=True)
df = df.asfreq('MS') # Ensure monthly frequency
# Extract the time series
series = df['Retail_Sales']
# Apply STL decomposition
stl = STL(series, seasonal=13)
result = stl.fit()
# Plot and save STL components
fig, axs = plt.subplots(4, 1, figsize=(10, 8), sharex=True)
axs[0].plot(result.observed, color='sienna')
axs[0].set_title('Observed')
axs[1].plot(result.trend, color='goldenrod')
axs[1].set_title('Trend')
axs[2].plot(result.seasonal, color='darkslategrey')
axs[2].set_title('Seasonal')
axs[3].plot(result.resid, color='rebeccapurple')
axs[3].set_title('Residual')
plt.suptitle('STL Decomposition of Retail Sales', fontsize=16)
plt.tight_layout()
plt.show()
Plot:

Unlike classical decomposition, STL allows the seasonal component to change gradually over time. This flexibility makes STL a better fit for real-world data where patterns evolve, as seen in the adaptive seasonal curve and cleaner residuals.
After completing that step, got a feel for what STL does, we will dive into how it figures out the trend and seasonal patterns behind the scenes.
To better understand how STL decomposition works, we will consider a sample from our dataset spanning from January 2010 to December 2023.

To understand how STL decomposition works on this data, we first need rough estimates of the trend and seasonality.
Since STL is a smoothing-based technique, it requires an initial idea of what should be smoothed, such as where the trend lies and how the seasonal patterns behave.
We will begin by visualizing the retail‐sales series (Jan 2010–Dec 2023) and use Python’s STL routine to extract its trend, seasonal, and remainder parts.
Code:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL
# Load the dataset
df = pd.read_csv("C:/STL sample data.csv", parse_dates=['Observation_Date'], dayfirst=True)
df.set_index('Observation_Date', inplace=True)
df = df.asfreq('MS') # Ensure monthly frequency
# Extract the time series
series = df['Retail_Sales']
# Apply STL decomposition
stl = STL(series, seasonal=13)
result = stl.fit()
# Plot and save STL components
fig, axs = plt.subplots(4, 1, figsize=(10, 8), sharex=True)
axs[0].plot(result.observed, color='sienna')
axs[0].set_title('Observed')
axs[1].plot(result.trend, color='goldenrod')
axs[1].set_title('Trend')
axs[2].plot(result.seasonal, color='darkslategrey')
axs[2].set_title('Seasonal')
axs[3].plot(result.resid, color='rebeccapurple')
axs[3].set_title('Residual')
plt.suptitle('STL Decomposition of Retail Sales(2010-2023)', fontsize=16)
plt.tight_layout()
plt.show()
Plot:

To understand how STL derives its components, we first estimate the data’s long-term trend using a centered moving average.
We will use a single-month example to demonstrate how to calculate a centered moving average.
We will calculate the centered moving average for July 2010.

Because our data is monthly, the natural cycle covers twelve points, which is an even number. Averaging January 2010 through December 2010 produces a value that falls halfway between June and July.
To adjust for this, we form a second window from February 2010 through January 2011, whose twelve-month mean lies halfway between July and August.
We then compute each window’s simple average and average those two results.
In the first window July is the seventh of twelve points, so the mean lands between months six and seven.
In the second window July is the sixth of twelve points, so its mean also falls between months six and seven but shifted forward.
Averaging both estimates pulls the result back onto July 2010 itself, yielding a true centered moving average for that month.


This is how we compute the initial trend using a centered moving average.
At the very start and end of our series, we simply don’t have six months on both sides to average—so there’s no “natural” centered MA for Jan–Jun 2010 or for Jul–Dec 2023.
Rather than drop those points, we carry the first real July 2010 value backwards to fill Jan–Jun, and carry our last valid December 2023 value forward to fill Jul–Dec 2023.
That way, every month has a baseline trend before we move on to the LOESS refinements.
Next, we will use Python to compute the initial trend for each month.
Code:
import pandas as pd
# Load and prepare the data
df = pd.read_csv("C:/STL sample data for part 3.csv",
parse_dates=["Observation_Date"], dayfirst=True,
index_col="Observation_Date")
df = df.asfreq("MS") # ensure a continuous monthly index
# Extract the series
sales = df["Retail_Sales"]
# Compute the two 12-month moving averages
n = 12
ma1 = sales.rolling(window=n, center=False).mean().shift(-n//2 + 1)
ma2 = sales.rolling(window=n, center=False).mean().shift(-n//2)
# Center them by averaging
T0 = (ma1 + ma2) / 2
# Fill the edges so every month has a value
T0 = T0.fillna(method="bfill").fillna(method="ffill")
# Attach to the DataFrame
df["Initial_Trend"] = T0
Table:

We have extracted the initial trend using a centered moving average, let’s see how it actually looks.
We will plot it along with the original time series and STL’s final trend line to compare how each one captures the overall movement in the data.
Plot:

Looking at the plot, we can see that the trend line from the moving average almost overlaps with the STL trend for most of the years.
But around Jan–Feb 2020, there’s a sharp dip in the moving average line. This drop was due to the sudden impact of COVID on sales.
STL handles this better, it doesn’t treat it as a long-term trend change but instead marks it as a residual.
That’s because STL sees this as a one-time unexpected event, not a repeating seasonal pattern or a shift in the overall trend.
To understand how STL does this and how it handles changing seasonality, let’s continue building our understanding step by step.
We now have the initial trend using moving averages, so let’s move on to the next step in the STL process.
Next, we subtract our centered MA trend from the original sales to obtain the detrended series.

We have removed the long-term trend from our data, so the remaining series shows just the repeating seasonal swings and random noise.
Let’s plot it to see the regular ups and downs and any unexpected spikes or dips.

The above plot shows what remains after we remove the long-term trend. You can see the familiar annual rise and fall and that deep drop in January 2020 when COVID hit.
When we average all the January values, including the 2020 crash, that single event blends in and hardly affects the January average.
This helps us ignore rare shocks and focus on the true seasonal pattern. Now we will group the detrended values by month and take their averages to create our first seasonal estimate.
This gives us a stable estimate of seasonality, which STL will then refine and smooth in later iterations to capture any gradual shifts over time.
Next, we will repeat our seasonal-decompose approach: we’ll group the detrended values by calendar month to extract the raw monthly seasonal offsets.
Let’s focus on January and gather all the detrended values for that month.

Now, we compute the average of the detrended values for January across all years to obtain a rough seasonal estimate for that month.

This process is repeated for all 12 months to form the initial seasonal component.

Now we have the average detrended values for each month, we map them across the entire time series to construct the initial seasonal component.

After grouping the detrended values by month and calculating their averages, we obtain a new series of monthly means. Let’s plot this series to observe how the data look after applying this averaging step.

In the above plot, we grouped the detrended values by month and took the average for each one.
This helped us reduce the effect of that big dip in January 2020, which was likely due to the COVID pandemic.
By averaging all the January values together, that sudden drop gets blended in with the rest, giving us a more stable picture of how January usually behaves each year.
However, if we look closely, we can still see some sudden spikes and dips in the line.
These might be caused by things like special promotions, strikes or unexpected holidays that don’t happen every year.
Since seasonality is meant to capture patterns that repeat regularly each year, we don’t want these irregular events to stay in the seasonal curve.
But how do we know those spikes or dips are just one-off events and not real seasonal patterns? It comes down to how often they happen.
A big spike in December shows up because every December has high sales, so the December average stays high year after year.
We see a small increase in March, but that’s mostly because one or two years were unusually strong.
The average for March doesn’t really shift much. When a pattern shows up almost every year in the same month, that’s seasonality. If it only happens once or twice, it’s probably just an irregular event.
To handle this, we use a low-pass filter. While averaging helps us get a rough idea of seasonality, the low-pass filter goes one step further.
It smooths out those remaining small spikes and dips so that we are left with a clean seasonal pattern that reflects the general rhythm of the year.
This smooth seasonal curve will then be used in the next steps of the STL process.
Next, we will smooth out that rough seasonal curve by running a low-pass filter over every point in our monthly-average series.
To apply the low-pass filter, we start by computing a centered 13-month moving average.
For example, consider September 2010. The 13-month average at this point (from March 2010 to March 2011) would be:

We repeat this 13-month averaging for every point in our monthly average series. Because the pattern repeats every year, the value for September 2010 will be the same as for September 2011.
For the first and last six months, we don’t have enough data to take a full 13-month average, so we just use whatever months are available around them.
Let’s take a look at the averaging windows used for the months where a full 13-month average isn’t possible.

Now we’ll use Python to calculate the 13-month average values
Code:
import pandas as pd
# Load the seasonal estimate series
df = pd.read_csv("C:/stl_with_monthly_avg.csv", parse_dates=['Observation_Date'], dayfirst=True)
# Apply 13-month centered moving average on the Avg_Detrended_by_Month column
# Handle the first and last 6 values with partial windows
seasonal_estimate = df[['Observation_Date', 'Avg_Detrended_by_Month']].copy()
lpf_values = []
for i in range(len(seasonal_estimate)):
start = max(0, i - 6)
end = min(len(seasonal_estimate), i + 7) # non-inclusive
window_avg = seasonal_estimate['Avg_Detrended_by_Month'].iloc[start:end].mean()
lpf_values.append(window_avg)
# Add the result to DataFrame
seasonal_estimate['LPF_13_Month'] = lpf_values
With this code, we get the 13-month moving average for the full time series.

After completing the first step of applying the low-pass filter by calculating the 13-month averages, the next step is to smooth those results further using a 3-point moving average.
Now, let’s see how the 3-point average is calculated for September 2010.

For January 2010, we calculate the average using January and February values, and for December 2023, we use December and November.
This approach is used for the endpoints where a full 3-month window isn’t available. In this way, we compute the 3-point moving average for each data point in the series.
Now, we use Python again to calculate the 3-month window averages for our data.
Code:
import pandas as pd
# Load CSV file
df = pd.read_csv("C:/seasonal_13month_avg3.csv", parse_dates=['Observation_Date'], dayfirst=True)
# Calculate the 3-point moving average
lpf_values = df['LPF_13_Month'].values
moving_avg_3 = []
for i in range(len(lpf_values)):
if i == 0:
avg = (lpf_values[i] + lpf_values[i + 1]) / 2
elif i == len(lpf_values) - 1:
avg = (lpf_values[i - 1] + lpf_values[i]) / 2
else:
avg = (lpf_values[i - 1] + lpf_values[i] + lpf_values[i + 1]) / 3
moving_avg_3.append(avg)
# Add the result to a new column
df['LPF_13_3'] = moving_avg_3
Using the code above, we get the 3-month moving average values.

We’ve calculated the 3-month averages on the 13-month smoothed values. Next, we’ll apply another 3-month moving average to further refine the series.
Code:
import pandas as pd
# Load the dataset
df = pd.read_csv("C:/5seasonal_lpf_13_3_1.csv")
# Apply 3-month moving average on the existing LPF_13_3 column
lpf_column = 'LPF_13_3'
smoothed_column = 'LPF_13_3_3'
smoothed_values = []
for i in range(len(df)):
if i == 0:
avg = df[lpf_column].iloc[i:i+2].mean()
elif i == len(df) - 1:
avg = df[lpf_column].iloc[i-1:i+1].mean()
else:
avg = df[lpf_column].iloc[i-1:i+2].mean()
smoothed_values.append(avg)
# Add the new smoothed column to the DataFrame
df[smoothed_column] = smoothed_values
From the above code, we have now calculated the 3-month averages once again.

With all three levels of smoothing complete, the next step is to calculate a weighted average at each point to obtain the final low-pass filtered seasonal curve.
It’s like taking an average, but a smarter one. We use three versions of the seasonal pattern, each smoothed to a different level.
We create three smoothed versions of the seasonal pattern, each one smoother than the last.
The first is a simple 13-month moving average, which applies light smoothing.
The second takes this result and applies a 3-month moving average, making it smoother.
The third repeats this step, resulting in the most stable version. Since the third one is the most reliable, we give it the most weight.
The first version still contributes a little, and the second plays a moderate role.
By combining them with weights of 1, 3, and 9, we calculate a weighted average that gives us the final seasonal estimate.
This result is smooth and steady, yet flexible enough to follow real changes in the data.
Here’s how we calculate the weighted average at each point.
For example, let’s take September 2010.

We divide by 23 to apply an additional shrink factor and ensure the weighted average stays on the same scale.
Code:
import pandas as pd
# Load the dataset
df = pd.read_csv("C:/7seasonal_lpf_13_3_2.csv")
# Calculate the weighted average using 1:3:9 across LPF_13_Month, LPF_13_3, and LPF_13_3_2
df["Final_LPF"] = (
1 * df["LPF_13_Month"] +
3 * df["LPF_13_3"] +
9 * df["LPF_13_3_2"]
) / 23
By using the code above, we calculate the weighted average at each point in the series.

These final smoothed values represent the seasonal pattern in the data. They highlight the recurring monthly fluctuations, free from random noise or outliers, and provide a clearer view of the underlying seasonal rhythms over time.
But before moving to the next step, it’s important to understand why we used a 13-month average followed by two rounds of 3-month averaging as part of the low-pass filtering process.
First, we calculated the average of detrended values by grouping them by month. This gave us a rough idea of the seasonal pattern.
But as we saw earlier, this pattern still has some random spikes and dips. Since we’re working with monthly data, it might seem like using a 12-month average would make sense.
But STL actually uses a 13-month average. That’s because 12 is an even number, so the average isn’t centered on a single month — it falls between two months. This can slightly shift the pattern.
Using 13, which is an odd number, keeps the smoothing centered right on each month. It helps us smooth out the noise while keeping the true seasonal pattern in place.
Let’s take a look at how the 13-month average transforms the series with the help of a plot.

The orange line, representing the 13-month average, smooths the sharp fluctuations seen in the raw monthly averages (blue), helping to expose a clearer and more consistent seasonal pattern by filtering out random noise.
You might notice that the peaks in the orange line don’t perfectly line up with the blue ones anymore.
For example, a spike that appeared in December before might now show up slightly earlier or later.
This happens because the 13-month average looks at the surrounding values, which can shift the curve a little to the side.
This shifting is a normal effect of moving averages. To fix it, the next step is centering.
We group the smoothed values by calendar month and putting all the January values together and so on and then take the average.
This brings the seasonal pattern back into alignment with the correct month, so it reflects the real timing of the seasonality in the data.
After smoothing the pattern with a 13-month average, the curve looks much cleaner, but it can still have small spikes and dips. To smooth it a little more, we use a 3-month average.
But why 3 and not something bigger like 5 or 6. A 3-month window works well because it smooths gently without making the curve too flat. If we use a larger window, we might lose the natural shape of the seasonality.
Using a smaller window like 3, and applying it twice, gives a nice balance between cleaning the noise and keeping the real pattern.
Now let’s see what this looks like on a plot.

This plot shows how our rough seasonal estimate becomes smoother in steps.
The blue line is the result of the 13-month average, which already softens out many of the random spikes.
Then we apply a 3-month average once (orange line) and again (green line). Each step smooths the curve a bit more, especially removing tiny bumps and jagged noise.
By the end, we get a clean seasonal shape that still follows the repeating pattern but is much more stable and easier to work with for forecasting.
We now have three versions of the seasonal pattern: one slightly rough, one moderately smooth and one very smooth. It might seem like we could simply choose the smoothest one and move on.
After all, seasonality repeats every year, so the cleanest curve should be enough. But in real-world data, seasonal behavior is rarely that perfect.
December spikes might show up a little earlier in some years, or their size might vary depending on other factors.
The rough version captures these small shifts, but it also carries noise. The smoothest one removes the noise but can miss those subtle variations.
That’s why STL blends all three. It gives more weight to the smoothest version because it is the most stable, but it also keeps some influence from the medium and rougher ones to retain flexibility.
This way, the final seasonal curve is clean and reliable, yet still responsive to natural changes. As a result, the trend we extract in later steps stays true and does not absorb leftover seasonal effects.
We use weights of 1, 3, and 9 when blending the three seasonal curves because each version gives us a different perspective.
The roughest version picks up small shifts and short-term changes but also includes a lot of noise. The medium version balances detail and stability, while the smoothest version gives a clean and steady seasonal shape that we can trust the most.
That is why we give the smoothest one the highest weight. These specific weights are recommended in the original STL paper because they work well in most real-world cases.
We might wonder why not use something like 1, 4, and 16 instead. While that would give even more importance to the smoothest curve, it could also make the seasonal pattern too rigid and less responsive to natural shifts in timing or intensity.
Real-life seasonality is not always perfect. A spike that usually happens in December might come earlier in some years.
The 1, 3, 9 combination helps us stay flexible while still keeping things smooth.
After blending the three seasonal curves using weights of 1, 3, and 9, we might expect to divide the result by 13, the sum of the weights as we would in a regular weighted average.
But here we divide it by 23 (13+10) instead. This scaling factor gently shrinks the seasonal values, especially at the edges of the series where estimates tend to be less stable.
It also helps keep the seasonal pattern reasonably scaled, so it doesn’t overpower the trend or distort the overall structure of the time series.
The result is a seasonal curve that is smooth, adaptive, and does not interfere with the trend.
Now let’s plot the final low-pass filtered values that we obtained by calculating the weighted averages.

This plot shows the final seasonal pattern we got by blending three smoothed versions using weights 1, 3, and 9.
The result keeps the repeating monthly pattern clear while reducing random spikes. It’s now ready to be centered and subtracted from the data to find the trend.
The final low-pass filtered seasonal component is ready. The next step is to center it to ensure the seasonal effects average to zero over each cycle.
We center the seasonal values by making their average (mean) zero. This is important because the seasonal part should only show repeating patterns, like regular ups and downs each year, not any overall increase or decrease.
If the average isn’t zero, the seasonal part might wrongly include part of the trend. By setting the mean to zero, we make sure the trend shows the long-term movement, and the seasonal part just shows the repeating changes.
To perform the centering, we first group the final low-pass filter seasonal components by month and then calculate the average.
After calculating the average, we subtract it from the actual final low-pass filtered value. This gives us the centered seasonal component, completing the centering step.
Let’s walk through how centering is done for a single data point.
For Sep 2010
Final LPF value (September 2010) = −71.30
Monthly average of all September LPF values = −48.24
Centered seasonal value = Final LPF – Monthly Average
= −71.30−(−48.24) = −23.06
In this way, we calculate the centered seasonal component for each data point in the series.

Now we’ll plot these values to see how the centered seasonality curve looks.
Plot:

The above plot compares the monthly average of detrended values (blue line) with the centered seasonal component (orange line) obtained after low-pass filtering and centering.
We can observe that the orange curve is much smoother and cleaner, capturing the repeating seasonal pattern without any long-term drift.
This is because we’ve centered the seasonal component by subtracting the monthly average, ensuring its mean is zero across each cycle.
Importantly, we can also see that the spikes in the seasonal pattern now align with their original positions.
The peaks and dips in the orange line match the timing of the blue spikes, showing that the seasonal effect has been properly estimated and re-aligned with the data.
In this part, we discussed how to calculate the initial trend and seasonality in the STL process.
These initial components are essential because STL is a smoothing-based decomposition method, and it needs a structured starting point to work effectively.
Without an initial estimate of the trend and seasonality, applying LOESS directly to the raw data could lead to the smoothing of noise and residuals or even fitting patterns to random fluctuations. This would result in unreliable or misleading components.
That’s why we first extract a rough trend using moving averages and then isolate seasonality using a low-pass filter.
These provide STL with a reasonable approximation to begin its iterative refinement process, which we will explore in the next part.
In the next part, we begin by deseasonalizing the original series using the centered seasonal component. Then, we apply LOESS smoothing to the deseasonalized data to obtain an updated trend.
This marks the starting point of the iterative refinement process in STL.
Note: All images, unless otherwise noted, are by the author.
Dataset: This blog uses publicly available data from FRED (Federal Reserve Economic Data). The series Advance Retail Sales: Department Stores (RSDSELD) is published by the U.S. Census Bureau and can be used for analysis and publication with appropriate citation.
Official citation:
U.S. Census Bureau, Advance Retail Sales: Department Stores [RSDSELD], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/RSDSELD, July 7, 2025.
Thanks for reading!