APN (Healthcare Network) Case
Date: 00.00.00
Problem:
Practitioner data (specialties, contact info) scattered across 5+ directories with inconsistent formatting.
Manual entry caused a 40% error rate in their central directory.
Goal:
Automate data scraping, cleaning, and structuring for a centralized directory by:
Extracting data from different directories I cannot disclose.
Standardizing entries using AI models.
Technologies Used:
Python: Web scraping (Scrapy, Selenium).
OpenAI: Data cleaning (GPT-4 for rewriting bios, standardizing specialties).
Google Sheets: Staging area for QA.
Reduced data entry errors to 2% and cut directory update time from 2 weeks to 2 days.
Automated Scraping: Deployed Scrapy bots to extract practitioner names, locations, and specialties.
AI-Powered Standardization: Used GPT-4 to rewrite bios into consistent formats (e.g., “[Name] specializes in [X] with [Y] years of experience”).
Data Structuring: Exported cleaned data as CSV/JSON for seamless integration into APN’s directory.
A scalable data pipeline that:
Enabled APN to onboard 200+ new practitioners/month.
Improved search accuracy for end-users by 60%.
Lorem ipsum dolor sit amet, conse ctetur adipiscing elit. Sed varius ornare turpis, ut luctus lectus efficitur vel. Quisque a ipsum laoreet, porttitor ipsum quis, imperdiet nisi. Sed varius ornare turpis, ut luctus lectus efficitur vel. Lorem ipsum dolor sit amet, conse ctetur adipiscing elit. Sed varius ornare turpis.
John Doe
Co-Founder at Google