From JSON to Dashboard: Visualizing DuckDB Queries in Streamlit with Plotly

Image by Editor | ChatGPT

# Introduction

Data is a company’s most significant resource, and insights from data could make the difference between profit and failure. However, raw data is hard to understand, so we visualize it in dashboards so non-technical people can better navigate it.

Building a dashboard is not straightforward, especially when working with JSON data. Luckily, many Python libraries can be combined to create a helpful tool.

In this article, we will learn how to develop a dashboard using Streamlit and Plotly to visualize DuckDB queries on data from a JSON file.

Curious? Let’s get into it.

# Dashboard Development

Before developing our dashboard, let’s learn a bit about the tools we will use.

First, JSON, or JavaScript Object Notation, is a text-based format for storing and transmitting data using key-value pairs and arrays. It is a commonly used format for APIs and data interchange between systems.

Next, DuckDB is an open-source RDBMS (Relational Database Management System) designed for analytical workloads. It is an in-process online analytical processing (OLAP) SQL database that runs directly in the Python process without the need to manage a separate server. It’s also optimized for fast execution, ideal for data analysis with large datasets.

Streamlit is often used for dashboard development. It is an open-source framework for developing interactive data web applications using Python. To develop the dashboard, we do not need to understand HTML, CSS, or JavaScript.

We will also use pandas, a powerful library for data manipulation and analysis in Python.

Lastly, Plotly is an open-source library for developing interactive graphs and charts. It can be integrated with dashboard development libraries such as Streamlit.

That’s the basic explanation of the tools we will use. Let’s start developing our JSON Dashboard. We will use the following structure, so try to create it as follows.

JSON_Dashboard/
├── data/
│   └── sample.json
├── app.py
└── requirements.txt

Next, let’s fill the files with all the required information. First, let’s have our JSON example data like the one below. You can always use your own data, but here is an example you can use.

[
  {"id": 1, "category": "Electronics", "region": "North", "sales": 100, "profit": 23.5, "date": "2024-01-15"},
  {"id": 2, "category": "Furniture", "region": "South", "sales": 150, "profit": 45.0, "date": "2024-01-18"},
  {"id": 3, "category": "Electronics", "region": "East", "sales": 70, "profit": 12.3, "date": "2024-01-20"},
  {"id": 4, "category": "Clothing", "region": "West", "sales": 220, "profit": 67.8, "date": "2024-01-25"},
  {"id": 5, "category": "Furniture", "region": "North", "sales": 130, "profit": 38.0, "date": "2024-02-01"},
  {"id": 6, "category": "Clothing", "region": "South", "sales": 180, "profit": 55.2, "date": "2024-02-05"},
  {"id": 7, "category": "Electronics", "region": "West", "sales": 90, "profit": 19.8, "date": "2024-02-10"},
  {"id": 8, "category": "Furniture", "region": "East", "sales": 160, "profit": 47.1, "date": "2024-02-12"},
  {"id": 9, "category": "Clothing", "region": "North", "sales": 200, "profit": 62.5, "date": "2024-02-15"},
  {"id": 10, "category": "Electronics", "region": "South", "sales": 110, "profit": 30.0, "date": "2024-02-20"}
]

Next, we will fill the requirements.txt file with the libraries we will use for our dashboard development.

streamlit
duckdb
pandas
plotly

Then, run the following code to install the required libraries. It is recommended to use a virtual environment when setting up the environment.

pip install -r requirements.txt

Once everything is ready, we will develop our dashboard. We will explore the application code step-by-step so you can follow the logic.

Let’s start by importing the necessary libraries for our dashboard.

import streamlit as st
import duckdb
import pandas as pd
import plotly.express as px

Next, we will set up the connection we need to DuckDB.

@st.cache_resource
def get_conn():
    return duckdb.connect()

The code above will cache the DuckDB connection so the Streamlit dashboard does not need to reconnect when the dashboard reruns, which avoids any performance lag.

Then, we prepare the code to read the JSON data using the following code.

@st.cache_data
def load_data(path):
    df = pd.read_json(path, convert_dates=["date"])
    return df

In the code above, we transform the JSON file into a pandas DataFrame and cache the data so we do not need to read it again when the filter changes.

After the data loading and connection are ready, we will connect to DuckDB to store the JSON data. You can always change the data location and table name.

conn = get_conn()
df_full = load_data("data/sample.json")
conn.execute("CREATE OR REPLACE TABLE sales AS SELECT * FROM df_full")

In the code above, we register the DataFrame as an SQL table named sales inside DuckDB. The table will be refreshed from memory on every rerun, as we are not setting up persistence in a separate script.

That’s all for the backend; let’s prepare the Streamlit dashboard. First, let’s prepare the dashboard title and the sidebar filters.

st.title("From JSON to Dashboard: DuckDB SQL Visualizer")

st.sidebar.header("Filter Options")
category = st.sidebar.multiselect("Select Category:", df_full['category'].unique())
region = st.sidebar.multiselect("Select Region:", df_full['region'].unique())
date_range = st.sidebar.date_input("Select Date Range:", [df_full['date'].min(), df_full['date'].max()])

The sidebar above will become a dynamic filter for the loaded data, where we can change the SQL query based on these filters.

We then build the SQL query according to the filters with the following code.

query = "SELECT * FROM sales WHERE TRUE"
if category:
    query += f" AND category IN {tuple(category)}"
if region:
    query += f" AND region IN {tuple(region)}"
query += f" AND date BETWEEN '{date_range[0]}' AND '{date_range[1]}'"

The query above is built dynamically based on the user’s selection. We start with a WHERE TRUE condition to simplify appending additional filters with AND.

With the query generation ready, we will show the query and the resulting data with the following code.

st.subheader("Generated SQL Query")
st.code(query, language="sql")

df = conn.execute(query).df()
st.subheader(f"Query Results: {len(df)} rows")
st.dataframe(df)

The code above shows the SQL query used to retrieve data from DuckDB and converts the result into a pandas DataFrame to display the filtered table.

Lastly, we will prepare the Plotly visualizations using the filtered data.

if not df.empty:
    col1, col2 = st.columns(2)

    with col1:
        st.markdown("### Scatter Plot: Sales vs Profit by Region")
        scatter_fig = px.scatter(df, x="sales", y="profit", color="region", hover_data=["category", "date"])
        st.plotly_chart(scatter_fig, use_container_width=True)

    with col2:
        st.markdown("### Bar Chart: Total Sales by Category")
        bar_fig = px.bar(df.groupby("category", as_index=False)["sales"].sum(), x="category", y="sales", text_auto=True)
        st.plotly_chart(bar_fig, use_container_width=True)

    st.markdown("### Line Chart: Daily Sales Trend")
    line_fig = px.line(df.groupby("date", as_index=False)["sales"].sum(), x="date", y="sales")
    st.plotly_chart(line_fig, use_container_width=True)
else:
    st.warning("No data found for the selected filters.")

In the code above, we create three different plots: a scatter plot, a bar chart, and a line chart. You can always switch the chart type according to your needs.

With all the code ready, we will run the following command to launch our Streamlit dashboard.

You can now access the dashboard, which looks like the image below.

Overview of the Streamlit dashboard interface with filter options

The plots will look like the image below.

Scatter plot and bar chart visualizations in the Streamlit dashboard

Since the visualizations use Plotly, you can navigate them interactively, as shown in the line chart below.

Interactive line chart showing daily sales trend in the Streamlit dashboard

That’s all you need to know. You can always add more complexity to the dashboard and even deploy it in your business.

# Conclusion

Data is the most valuable resource a company can have, and visualizing it in a dashboard is a way for business people to gain insights. In this article, we learned how to develop a simple dashboard with Streamlit and Plotly while connecting to data from a JSON file stored in DuckDB.

I hope this has helped!

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

From JSON to Dashboard: Visualizing DuckDB Queries in Streamlit with Plotly

# Introduction

# Dashboard Development

# Conclusion

Related Posts

Nvidia Designs Slower B30A Chip To Meet US Restrictions

Shortcuts for the Long Run: Automated Workflows for Aspiring Data Engineers

Leave a Reply Cancel reply