Why AI in Healthcare Is Scaling the Gender Bias Nobody Wants to Talk About

Pocket Bridges Physician advisory network

Why AI in Healthcare Is Scaling the Gender Bias Nobody Wants to Talk About

By Rachel Miller, MD, FACOG, ACC | Founder, Pocket Bridges

I gave a talk last week at the Women in Life Sciences Leadership Summit called “The Risk No One Sees.” And the risk I was talking about isn’t the one most people in healthtech are focused on.

It’s not regulatory. It’s not reimbursement. It’s not competition.

It’s what happens when AI-powered health products are built on clinical data that was never designed to represent the patients they’re supposed to serve.

The Data Problem That Isn’t Getting Fixed

Most people working in healthcare know, at least in the abstract, that clinical research has historically skewed male. What’s less understood is how directly that history flows into the AI tools being built right now.

Machine learning models learn from data. In clinical AI, that data comes largely from studies and datasets where women were underrepresented, enrolled later, or excluded entirely. The downstream effects are specific and measurable. Heart attacks present differently in women, but most cardiac AI models were trained on male-pattern symptom data. Pain presents differently. Autoimmune conditions, which disproportionately affect women, are underrepresented in training sets. Hormonal health, the entire menstrual cycle, perimenopause, menopause, is barely a footnote in most clinical datasets.

It’s the foundation companies are building on today.

During Q&A at the summit, someone asked me directly whether biotech and pharmaceutical companies are fixing this. Whether they’re cleaning up the biased training data.

My honest answer: I haven’t seen it happening at any meaningful scale. Not in the companies I work with. Not in the products I evaluate. The conversation is happening. The correction isn’t,  or at least, not fast enough.

What This Looks Like in Practice

The tricky thing about biased training data is that the products built on top of it can look like they’re working. The validation metrics check out. The accuracy numbers are solid. The problem shows up in whose outcomes the model gets wrong (not in the aggregate performance).

A symptom checker that works well for the average patient in the training set but misses atypical presentations that are actually typical for women. A risk stratification tool that underweights hormonal factors because the training data didn’t include them. A diagnostic algorithm that was never tested against the specific population it’s now being marketed to.

These failures are the predictable result of building AI on incomplete data without the clinical expertise to catch what’s missing.

The Second Problem That Makes the First One Worse

AI bias is a data problem. But it’s also a process problem, and the process failure I see most often is simpler than anyone wants to admit.

Most healthtech companies building in women’s health have never had a practicing physician, someone currently seeing patients in the relevant specialty, sit with their product team and walk through how the tool would actually be used in clinical care.

Not an advisor who reviewed a slide deck. Not a KOL who endorsed the vision. A physician who looked at the product in the context of a real clinical workflow and said, “Here’s where this doesn’t fit. Here’s the claim that doesn’t hold up. Here’s the patient population you’re missing.”

That conversation is where gender bias in AI products gets caught before it reaches patients. Without it, the bias launches with the product.

I see this pattern consistently. A founder who is genuinely passionate about women’s health builds a product driven by personal experience and solid technology. The product looks clinically sound from the inside. But no one with current clinical expertise in the relevant specialty has stress-tested it against the reality of how women actually present, how their symptoms differ from the data the model learned from, or how the product fits into the workflow of a physician who treats this population every day.

Why the Usual Models of Physician Input Don’t Catch This

Companies aren’t ignoring clinical input entirely. Most have some version of physician engagement. The problem is that the most common models aren’t designed to catch the kind of bias we’re talking about.

The single-advisor model, one physician giving informal feedback, doesn’t have the breadth. That advisor might be brilliant, but they represent one practice setting, one patient population, one set of clinical experiences. They can’t see what a reproductive endocrinologist would see, or what a community OB/GYN managing a high-volume practice would flag, or what a urogynecologist would catch about pelvic floor assumptions baked into the product.

The KOL endorsement model, a prominent name lending credibility, isn’t designed to interrogate the product at all. It’s designed to validate the company’s positioning. The KOL is endorsing the vision, not pressure-testing the clinical foundation.

Neither model is structured to ask the questions that surface gender bias in AI: What data was this trained on? Whose symptoms does this model not account for? What happens when a patient presents the way women commonly present rather than the way the training data expects?

What Actually Catches This

The companies I’ve seen get this right do something different. They bring in multiple physicians who are currently practicing in the relevant specialties, not just one perspective, but a panel that represents the clinical complexity of the problem they’re trying to solve.

They engage those physicians early, before the product is locked, before the claims are written, before the training data decisions are finalized. And they ask questions designed to surface the gaps, not confirm the assumptions.

That’s what I built the Physician Advisory Network to do. Not to give companies a name for their website. To give them a structured process for pressure-testing their product, their data, and their clinical claims against the reality of how women’s health actually works in practice, across specialties, across practice settings, across the patient populations that most datasets have historically left out.

The gender bias in clinical AI isn’t going to fix itself. The data won’t clean itself up. Someone has to be in the room asking the uncomfortable questions early enough for the answers to matter. And that someone needs to be a physician who knows this space, ideally, more than one.

 

I’m going deeper on this in my newsletter, specifically, what questions investors should be asking about training data before they fund AI-powered health products, and what I’m seeing (and not seeing) from companies that claim to be addressing gender bias in their models. That piece is coming to subscribers first. Sign up here.

 

Rachel Miller, MD, FACOG, MSCP, ACC is a board-certified OB/GYN, physician advisor, executive coach, and founder of Pocket Bridges. She leads the Physician Advisory Network (PAN), bringing clinical credibility to healthtech and women’s health brands through a curated group of physician collaborators. Connect on LinkedIn.

Recent Posts

Scroll to Top