Stock Price Prediction ML
Accepted for PublishingAccepted for Publishing — Journal of Emerging Investigators
Overview
This research investigates whether incorporating Twitter sentiment analysis improves LSTM-based stock price prediction. The study tested three major stocks — Apple (AAPL), Tesla (TSLA), and Microsoft (MSFT) — comparing baseline technical-indicator models against sentiment-enhanced variants over a one-year period.
Contrary to popular assumptions in financial machine learning literature, the research found that sentiment features consistently degraded prediction accuracy across all three stocks, providing empirical evidence against the naive integration of social media sentiment into price prediction models.
The hypothesis was tested by comparing a one-layer baseline LSTM trained on technical indicators against a three-layer sentiment-augmented LSTM that added daily Twitter sentiment metrics (mean polarity, polarity dispersion, and tweet count). Both models used early stopping and dropout, and were validated through five-fold time series cross-validation preserving chronological ordering. Across all equities, the increase in RMSE was 32.1%, with only Tesla showing statistically significant degradation (t = 6.50, p = 0.003). The sentiment models showed signs of overfitting — smaller training losses but greater validation losses — and permutation importance analysis indicated that sentiment features contributed less than 5% to total predictive importance. These findings suggest that publicly available tweet-level sentiment data may contain insufficient information to improve predictions for highly traded, large-capitalization technology companies, and may instead reduce model performance due to excessive noise.
Key Finding
Sentiment-enhanced models underperformed baseline by ~32% average RMSE
Across all three stocks tested (80,793 tweets analyzed, Sep 2021 – Sep 2022), adding Twitter sentiment features to LSTM models consistently worsened prediction accuracy compared to technical-indicator-only baselines.
Authors
Leo Chang
Lead Researcher & Developer
Princeton Day School
Aditya Saraf
Co-Researcher
Cornell University
Jenjen Chen
Co-Researcher
Yardley, PA
Methodology
Data Collection
Gathered daily stock prices for AAPL, TSLA, and MSFT from September 2021 to September 2022 via Yahoo Finance, alongside 80,793 labeled tweets mentioning each ticker symbol from a publicly available Kaggle dataset.
Feature Engineering
Constructed 13 technical features (log returns, intraday high-low range, close-to-open change, 5/10/20-day SMAs, price-to-SMA ratios, 14-day RSI, volume moving average, volume ratio, rolling volatility) plus 3 sentiment metrics (mean polarity, polarity standard deviation, tweet count) for each trading day.
Model Architecture
Designed two LSTM architectures: a baseline model with a single 50-unit layer using dropout (0.2), and a sentiment-enhanced model with three stacked LSTM layers (128/64/32 units) plus batch normalization, L2 regularization, and dropout (0.2–0.3). Both used the Adam optimizer with early stopping.
Validation
Applied five-fold time series cross-validation respecting temporal ordering. Statistical significance assessed via paired t-tests comparing fold-level RMSE between baseline and sentiment models.
Results by Stock
AAPL
Sentiment features introduced noise that worsened predictions, though the difference was not statistically significant (t = 1.16, p = 0.316).
TSLA
Tesla showed statistically significant degradation (p < 0.01) when sentiment features were added, despite its reputation as a sentiment-driven stock.
MSFT
Sentiment features worsened predictions with a sizable effect, though the difference was not statistically significant (t = 1.33, p = 0.300).
Research Figures

Model Performance Comparison

AAPL Price Predictions

TSLA Price Predictions

MSFT Price Predictions

Statistical Significance Analysis

Permutation Feature Importance

Directional Accuracy Comparison