AI-Powered News Sentiment ETL Pipeline
Category:
Data Engineering
Client:
Self-Directed Research & Development
The Challenge
Developing a pipeline to reliably track Microsoft stock (MSFT) performance required solving two key challenges:
Integrating disparate data sources (historical price data, real-time news headlines) into a cohesive structure, and
Converting high-volume, unstructured text data (news articles) into a standardized, numerical sentiment score in real-time.
The Solution
A Full-Stack ETL Pipeline with Gemini AI
We architected a robust Extract, Transform, Load (ETL) pipeline using Python to automate data ingestion. The transformation layer utilized the Gemini API for sophisticated Natural Language Processing (NLP), translating raw news articles into a quantifiable sentiment score (from -1 to +1). All data was persisted in a PostgreSQL database, ensuring data integrity and history. The final result is visualized via a custom Streamlit dashboard.
The Result
The project delivered a fully operational, self-updating dashboard that allows for real-time comparative analysis between MSFT price action and overall news sentiment. Crucially, this exercise served as a comprehensive, curiosity-driven bootcamp in fundamental Data Engineering practices, demonstrating proficiency in data ingestion, database schema design, and productionizing AI models for data transformation.











