← Back to WorkData Science

Drug Reviews NLP

Mining 215,063 patient reviews to identify which medications outperform alternatives — and quantifying the business impact of drug non-adherence.

RoleData Analyst
StackPython, SQL, NLTK, Streamlit
Timeline2025
215K+
Reviews Analyzed
3,436
Drugs
885
Conditions
$27M+
Revenue Risk Identified

Overview

The Question

Patients frequently switch or stop medications due to dissatisfaction. Which drugs significantly underperform their alternatives, and what's the financial impact of this non-adherence?

The Approach

Combined SQL analysis, NLP sentiment extraction, and topic modeling on 215K patient reviews from Drugs.com (2008-2017) to surface rating gaps, sentiment discrepancies, and recurring side effect themes.

Key Findings

5+ Point Rating Gap

Found systematic performance gaps between best and worst drugs within the same conditions — some patients getting significantly worse care.

8 Underperforming Conditions

Identified conditions with average ratings below 6.0, indicating systematic unmet patient needs across multiple medications.

Birth Control: 18% of Reviews

Highest volume category despite below-average satisfaction — a major opportunity for pharmaceutical improvement.

Sentiment vs Rating Mismatch

VADER analysis revealed reviews where explicit ratings didn't match expressed sentiment — hidden dissatisfaction signals.

Methodology

Data
UCI Dataset
215K reviews
SQL Analysis
SQLite
Rankings & trends
NLP
VADER + TF-IDF
Sentiment & topics
Insights
Findings
$27M impact
Product
Streamlit
Dashboard

Technical Highlights

1Sentiment Analysis at Scale
ApproachApplied VADER sentiment analysis across 215K reviews to extract polarity scores and compare against explicit 1-10 ratings.
InsightFound significant rating-sentiment mismatches revealing hidden patient dissatisfaction
2Topic Modeling for Side Effects
ApproachUsed TF-IDF vectorization with NMF (Non-negative Matrix Factorization) to extract recurring themes from negative reviews.
InsightSurfaced common side effect complaints and unmet patient needs by condition
3Business Impact Quantification
ApproachCombined rating data with market research on drug adherence costs and patient switching behavior.
InsightEstimated $27M+ in revenue risk from preventable non-adherence

The Dashboard

Built a Streamlit app ("Drug Alternative Finder") that lets users explore the data — compare drugs within conditions, view rating distributions, and surface better-rated alternatives.

Try the Live App ↗
Condition-based filteringDrug comparison chartsRating distributionsAlternative recommendations

Impact

Revenue Risk
$27M+
Reviews Analyzed
215K
Conditions Flagged
8