Why We Should Focus on AI for Women

The story began with a conversation I had with my girlfriend last Sunday. She, interested in medical research, mentioned that women are often underdiagnosed for stroke. There are often many false negative cases among women because initial stroke research was mainly conducted on male subjects. As a result, the symptoms seen in women—often different from those observed in men—could not be recognized clinically.

A similar issue has been observed in skin cancer diagnosis. Individuals with darker skin tones have less of a chance of being correctly diagnosed.

Examples like these show how bias in data collection and research design can lead to harmful outcomes. We are living in an era where AI is present in nearly every domain — and it’s inevitable that biased data is fed into these systems. I’ve even witnessed doctors using chatbot tools as medical assistants while writing prescriptions.

From this aspect, before a subject or a topic has been fully studied among different groups—such as those based on gender or race—applying its incomplete findings to AI systems carries significant risks, both scientifically and ethically. AI systems not only tend to inherit existing human cognitive biases, but might also unintentionally amplify and entrench these biases within their technical structures.

In this post, I will walk through a case study from my personal experience: defining the optimal temperature in an office building, considering the different thermal comfort levels of men and women.

Case study: Thermal comfort

Two years ago, I worked on a project to optimize the energy efficiency in a building while maintaining thermal comfort. This raised an essential question: What exactly is thermal comfort? In many office buildings or commercial centers, the answer is a fixed temperature. However, research has shown that women report significantly more dissatisfaction than men under similar thermal conditions (Indraganti & Humphreys, 2015). Beyond the serious scientific investigation, I, along with other female colleagues, have all reported feeling cold during office hours.

We will now design a simulation experiment to show just how gender inclusivity is important in defining thermal comfort, as well as in other real‑world scenarios.

Image by Author: Experimental Flowchart

Simulation setup

We now simulate two populations—male and female—with slightly different thermal preferences. This difference may seem of small importance at first glance, but we will see it really becomes something in the following chapter, where we introduce a reinforcement learning (RL) model to learn the optimal temperature. We see how well the agent satisfies the female occupants if the agent is trained only on males.

We begin with defining an idealized thermal comfort model inspired by the Predicted Mean Vote (PMV) framework. Each temperature is assigned a comfort score defined as max(0, 1 – dist / zone), based on how close its value is to the center of the gender-specific comfort range:

Males: 21–23°C (centered at 22°C)
Females: 23–25°C (centered at 24°C)

By definition, the further the temperature moves from the center of this range, the more the comfort score decreases.

Next, we simulate a simplified room-like environment where an agent controls the temperature. Three possible actions:

Decrease the temperature by 1°C
Maintain the temperature
Increase the temperature by 1°C

The environment updates the temperature accordingly and returns a comfort-based reward.

The agent’s goal is to maximize this reward over time, and it learns the optimal temperature setting for the occupants. See the code below for the environment simulation.

RL agent: Q-learning

We implement a Q-learning method, letting the agent interact with the environment.

It learns an optimal policy by updating a Q-table, where the expected comfort rewards for each state-action pair are stored. The agent balances exploration—that is, trying random actions—and exploitation—that is, choosing the best-known actions—as it learns a temperature-controlling strategy by maximizing the reward.

class QLearningAgent:
    def __init__(self, state_space, action_space, alpha=0.1, gamma=0.9, epsilon=0.2):
        self.states = state_space
        self.actions = action_space
        self.alpha = alpha
        self.gamma = gamma
        self.epsilon = epsilon
        # Initialize Q-table with zeros: states x actions
        self.q_table = np.zeros((len(state_space), len(action_space)))

    def choose_action(self, state):
        if random.random() < self.epsilon:
            return random.choice(range(len(self.actions)))
        else:
            return np.argmax(self.q_table[state])

    def learn(self, state, action, reward, next_state):
        predict = self.q_table[state, action]
        target = reward + self.gamma * np.max(self.q_table[next_state])
        self.q_table[state, action] += self.alpha * (target - predict)

We updated our Q-table by letting the agent choose either the best-known action based on the current environment or a random action. We control the trade-off with a small epsilon—here, 0.2—representing the level of uncertainty we want.

Biased training and testing

As promised before, we train the agent using only male data.

We let the agent interact with the environment for 1000 episodes, 20 steps each. It gradually learns how to associate desired temperature levels with high comfort scores for men.

def train_agent(episodes=1000):
    env = TempControlEnv(sex='male')
    agent = QLearningAgent(state_space=env.state_space, action_space=env.action_space)
    rewards = []

    for ep in range(episodes):
        state = env.reset()
        total_reward = 0
        for step in range(20):
            action_idx = agent.choose_action(state - env.min_temp)
            action = env.action_space[action_idx]
            next_state, reward, done = env.step(action)
            agent.learn(state - env.min_temp, action_idx, reward, next_state - env.min_temp)
            state = next_state
            total_reward += reward
        rewards.append(total_reward)
    return agent, rewards

The code shows a standard training process of Q-learning. Here is a plot of the learning curve.

We can now evaluate how well the male-trained agent performs when placed in a female comfort environment. The test is done in the same environmental setting, only with a slightly different comfort scoring model reflecting female preferences.

Result

The experiment shows the following result:

The agent has achieved an average comfort reward of 16.08 per episode for male comfort. We see that it successfully learned how to maintain temperatures around the male-optimal comfort range (21–23 °C).

The agent’s performance dropped to an average reward of 0.24 per episode on female comfort. This shows that the male-trained policy, unfortunately, cannot be generalized to female comfort needs.

We can thus say that such a model, trained only on one group, may not perform well when applied to another, even when the difference between groups appears small.

Conclusion

This is only a small and simple example.

But it might highlight a bigger issue: when AI models are trained on data from only one or several groups, they have some risks to fail to meet the needs of others—even when differences between groups seem small. You see the above male-trained agent fails to satisfy the female comfort, and it proves that bias in training data reflects directly on outcomes.

This can go beyond the case of office temperature control. In many domains like healthcare, finance, education, etc., if we train models on some non-representative data, we can anticipate unfair or harmful results for underrepresented groups.

For readers, this means questioning how AI systems around us are built and pushing for transparency and fairness in their design. It also means recognizing the limitations of “one-size-fits-all” solutions and advocating for approaches that consider diverse experiences and needs. Only then can AI truly serve everyone equitably.

However, I always feel that empathy is super difficult in our society. Differences in race, gender, wealth, and culture make it very hard for the majority of us to stay in others’ shoes. AI, a data-driven system, can not only easily inherit existing human cognitive biases but also may embed these biases into its technical structures. Groups already less recognized may thus receive even less attention or, worse, be further marginalized.

Why We Should Focus on AI for Women

Case study: Thermal comfort

Simulation setup

RL agent: Q-learning

Biased training and testing

Result

Conclusion

Related Posts

Rethinking Data Science Interviews in the Age of AI

Change-Aware Data Validation with Column-Level Lineage

Leave a Reply Cancel reply