Skip to main content
Leo Chang
AAPL$185.00+0.50%
GOOGL$140.50+0.35%
MSFT$370.00+0.42%
TSLA$245.00-0.28%
NVDA$495.00+0.65%
META$355.00+0.38%
AMZN$155.00-0.15%
AAPL$185.00+0.50%
GOOGL$140.50+0.35%
MSFT$370.00+0.42%
TSLA$245.00-0.28%
NVDA$495.00+0.65%
META$355.00+0.38%
AMZN$155.00-0.15%
AAPL$185.00+0.50%
GOOGL$140.50+0.35%
MSFT$370.00+0.42%
TSLA$245.00-0.28%
NVDA$495.00+0.65%
META$355.00+0.38%
AMZN$155.00-0.15%
Published Research

Stock Price Prediction ML

Analyzing the Impact of Tweet Sentiment on Stock Price Prediction using LSTM Models

Lead Researcher LSTM NLP Python
80K+ Tweets
JEI 2025
3 Stocks
3-Fold CV

Research Overview

This study investigates whether incorporating Twitter sentiment analysis improves LSTM neural network models for stock price prediction. Testing three major technology stocks (Apple, Tesla, Microsoft) using rigorous cross-validation, we found that sentiment-enhanced models consistently underperformed baseline models by 55-104%, contradicting our initial hypothesis.

The research employed comprehensive feature engineering including 13 technical indicators combined with 3 sentiment metrics, advanced multi-layer LSTM architectures with regularization, and rigorous 3-fold time series cross-validation. Statistical analysis using paired t-tests revealed that sentiment-enhanced models consistently underperformed baseline models.

Authors & Affiliations

Author Affiliation Role
Leo Chang Princeton Day School, Princeton, NJ Lead Researcher
Aditya Saraf Cornell University, Ithaca, NY Co-Researcher
Jenjen Chen Yardley, Pennsylvania Research Collaborator

Key Research Findings

69% Avg RMSE Increase

Performance Degradation

Sentiment-enhanced models showed 55-104% RMSE increase across all tested stocks.

p=0.021 Statistical Sig.

Microsoft Significance

Microsoft comparison reached statistical significance, confirming sentiment worsened predictions.

3-Fold Cross-Validation

Overfitting Evidence

Time series cross-validation revealed consistent overfitting in sentiment-enhanced models.

80K+ Tweets Analyzed

Noise vs Signal

Twitter sentiment contained more noise than predictive information for large-cap stocks.

Technical Implementation

Machine Learning

LSTM Networks TensorFlow/Keras Time Series CV Feature Engineering

Data Processing

Python Pandas NumPy Scikit-learn

Sentiment Analysis

TextBlob NLTK Twitter API NLP Preprocessing

Financial Data

Yahoo Finance API Technical Indicators OHLCV Data Statistical Testing

Research Methodology

Step Phase Description
1 Data Collection Gathered historical stock data for AAPL, TSLA, and MSFT from Yahoo Finance API covering September 2021-2022, plus 80,000+ labeled tweets from Kaggle datasets.
2 Feature Engineering Implemented 13 technical indicators (RSI, moving averages, volatility) and 3 sentiment metrics (polarity, subjectivity, tweet volume) with proper normalization.
3 Model Architecture Designed baseline LSTM (single layer, 32 units) and sentiment-enhanced LSTM (2 layers: 64/32 units) with dropout regularization.
4 Rigorous Validation Applied 3-fold time series cross-validation using TimeSeriesSplit to prevent data leakage, with paired t-tests for statistical significance.

Publication & Impact

Journal of Emerging Investigators

PEER-REVIEWED

"Analyzing the Impact of Tweet Sentiment on Stock Price Prediction using Long Short-Term Memory Models"

Authors: Leo Chang, Aditya Saraf, Jenjen Chen Published: 2025 Keywords: LSTM, sentiment, Twitter, prediction, finance

Research Significance

This study provides rigorous evidence challenging the widespread assumption that social media sentiment enhances financial prediction models. By employing proper time series validation and statistical testing, our findings contribute valuable negative results to the computational finance literature, highlighting important limitations in sentiment-based approaches for large-cap stock prediction.

Detailed Results

Stock Sentiment RMSE Baseline RMSE Degradation P-Value
Apple Inc. (AAPL) 7.909 ± 3.854 5.103 ± 2.273 55.0% p = 0.245
Tesla Inc. (TSLA) 24.045 ± 19.036 16.191 ± 5.432 48.5% p = 0.409
Microsoft Corp. (MSFT) 18.174 ± 9.236 8.911 ± 4.534 104.0% p = 0.021 *

Research Visualizations

Comprehensive visual analysis of model performance, statistical significance, and prediction accuracy across three major technology stocks.

Model Performance Comparison

Model Performance Comparison

RMSE comparison showing sentiment models performed 50-104% worse than baseline across all stocks.

AAPL Price Predictions

AAPL Predictions

Apple stock predictions showing model lag and dampened volatility capture.

TSLA Price Predictions

TSLA Predictions

Tesla's high volatility makes it the most challenging stock to predict.

MSFT Price Predictions

MSFT Predictions

Microsoft shows consistent prediction lag despite more stable price behavior.

Statistical Significance

Statistical Significance

Microsoft reached significance (p=0.021), confirming sentiment degraded performance.

Feature Importance

Feature Importance

Permutation importance showing technical indicators outperform sentiment features.

Directional Accuracy

Directional Accuracy

Direction prediction accuracy around 50-60%, barely better than random chance.

Code & Resources

Complete codebase including data processing, model implementation, and statistical analysis available on GitHub.

GitHub Repository Dataset Access