top of page
Search

From NLP to NYSE: Evaluating Transformer Models for Stock Market Forecasting

Abstract


This report presents a comparative study of Transformer and Vision Transformer (ViT) models applied to time series forecasting, specifically in the context of stock price prediction. Transformers, originally designed for natural language processing tasks, have shown remarkable adaptability to various domains, including time series analysis. This study explores their potential in financial forecasting, comparing their performance against a baseline LSTM model.


Transformer models excel at capturing long-range dependencies in sequential data, making them theoretically well-suited for time series forecasting. The standard Transformer model processes time series data directly as a sequence, while the Vision Transformer approach innovatively treats the time series as a 2D image, potentially capturing different patterns in the data.


Our experiments evaluate these models on stock price prediction tasks, assessing their performance through multiple metrics including MSE, RMSE, MAE, R2, and MAPE. The results provide insights into the strengths and limitations of each approach, highlighting areas for potential improvement and future research directions in applying transformer architectures to time series forecasting tasks.


 Performance Comparison




  • ·   LSTM outperforms both Transformer and Vision Transformer (ViT) models across all metrics, as expected for a base estimator.

  • ·    The Transformer model shows significantly better performance compared to the Vision Transformer:

    • o   MSE: 11.8297 (Transformer) vs 112.1335 (ViT)

      o   RMSE: 3.4394 (Transformer) vs 10.5893 (ViT)

      o   MAE: 2.6844 (Transformer) vs 7.7738 (ViT)

      o   R2: 0.9880 (Transformer) vs 0.8931 (ViT)

      o   MAPE: 2.0664% (Transformer) vs 6.3428% (ViT)





Analysis of Transformer vs Vision Transformer Performance

1. Data Representation:

o   Transformers process sequential data directly, which aligns well with time series data.

o   ViT treats time series as 2D images, which may not capture temporal dependencies as effectively.


2. Model Complexity:

o   The ViT model may be overly complex for this task, leading to overfitting.

o   The Transformer model's architecture seems more suitable for the time series nature of stock data.

 

3. Training Stability: 

o   The ViT model's higher error rates suggest potential issues with training stability or convergence.


4. Feature Extraction: 

o   Transformers can directly capture temporal relationships in the data.

o   ViT's patch-based approach may not be optimal for extracting relevant features from time series data.

 

Suggestions for Improving ViT Performance

1. Adjust Patch Size: Experiment with different patch sizes to better capture temporal patterns in the data.


2. Enhance Data Augmentation: Implement time series-specific augmentation techniques to improve generalization.


3. Modify Model Architecture:

o   Reduce model complexity to prevent overfitting.

o   Incorporate temporal attention mechanisms specific to time series data.


4. Hyperparameter Tuning: Conduct more extensive hyperparameter optimization, focusing on learning rate, number of layers, and attention heads.


5. Pre-training: Implement pre-training on a larger dataset of stock prices to improve feature extraction capabilities.


6. Ensemble Methods: Combine ViT predictions with other models to leverage its unique feature extraction capabilities.

 

Reasons to Persist with Transformer and ViT Models

By addressing the below identified issues and leveraging the unique strengths of Transformer-based models, there's significant potential to improve their performance in stock price forecasting, possibly surpassing traditional methods like LSTM in the future.


1. Potential for Improvement: Both models, especially ViT, have room for optimization and could potentially outperform LSTM with further refinement.


2. Handling Long-term Dependencies: Transformer-based models are theoretically better at capturing long-term dependencies in data compared to LSTM.


3. Parallelization: Transformer and ViT models allow for more efficient parallelization, potentially leading to faster training and inference times on appropriate hardware.


4. Flexibility: These models can easily incorporate additional features or external data sources, which could be valuable for stock price prediction.


5. Interpretability: Attention mechanisms in these models can provide insights into which parts of the input data are most influential for predictions.


6. State-of-the-art in Other Domains: Transformers and ViT have shown exceptional performance in various domains, suggesting untapped potential in time series forecasting.


7. Active Research Area: Continued work on these models aligns with current research trends, potentially leading to breakthrough improvements.


Code Overview


The implementation of this experiment consists of several Python scripts that work together to perform the time series forecasting tasks:


1. Combined_forecasting_coordinator.py: This script serves as the main coordinator for the experiment. It imports functions from other scripts and runs all models for comparison.


2. Transformer_ts_forecasting_core.py: Contains the core functionality for the Transformer model, including data preparation, model definition, training, and evaluation functions.


3. Transformer_ts_forecasting_main.py: Handles the main execution logic for the Transformer model, including argument parsing and calling the core functions.


4. vision_transformer_ts_forecasting_v32.py: Implements the Vision Transformer model. It includes data preparation specific to ViT, model definition, and training routines.


5. simple_lstm_ts_forecasting.py: Contains the implementation of the baseline LSTM model.


Key features of the code include:

·        Use of PyTorch for model implementation

·        Data fetching from Yahoo Finance using the yfinance library

·        Data preprocessing and scaling

·        Implementation of custom PyTorch modules for each model type

·        Training loops with early stopping and learning rate scheduling

·        Evaluation metrics calculation and result visualization

·        Flexibility to adjust hyperparameters and model configurations


The code is designed to be run in a Google Colab environment, with a separate scaffolding script handling the execution setup and GPU allocation.

 
 
 

Comments


bottom of page