AI-Powered News Sentiment ETL Pipeline

Hero Image
Hero Image

Category:

Data Engineering

Client:

Self-Directed Research & Development

The Challenge

Developing a pipeline to reliably track Microsoft stock (MSFT) performance required solving two key challenges:

  1. Integrating disparate data sources (historical price data, real-time news headlines) into a cohesive structure, and

  2. Converting high-volume, unstructured text data (news articles) into a standardized, numerical sentiment score in real-time.


The Solution

A Full-Stack ETL Pipeline with Gemini AI

We architected a robust Extract, Transform, Load (ETL) pipeline using Python to automate data ingestion. The transformation layer utilized the Gemini API for sophisticated Natural Language Processing (NLP), translating raw news articles into a quantifiable sentiment score (from -1 to +1). All data was persisted in a PostgreSQL database, ensuring data integrity and history. The final result is visualized via a custom Streamlit dashboard.


The Result

The project delivered a fully operational, self-updating dashboard that allows for real-time comparative analysis between MSFT price action and overall news sentiment. Crucially, this exercise served as a comprehensive, curiosity-driven bootcamp in fundamental Data Engineering practices, demonstrating proficiency in data ingestion, database schema design, and productionizing AI models for data transformation.


CLICK HERE TO VIEW PROJECT FROM GITHUB

Create a free website with Framer, the website builder loved by startups, designers and agencies.