The STRIDE Database: Workers’ Compensation Claims and Adjuster Notes  
Author Chelsea M. Zuvieta

 

Co-Author(s) Richard Bauder; Brian White; Taghi M. Khoshgoftaar

 

Abstract The Structured and Textual Records for Injury Data Exploration (STRIDE) database consists of two datasets, tabular and text, detailing workers’ compensation claims that were compiled, cleaned, and anonymized for analysis of workplace injuries. The database provides insights into work-related injury types, costs, and outcomes, with the potential to inform safety improvements, cost reduction strategies, and fair workers’ compensation practices. Additionally, the repository includes notes documented by insurance adjusters. The tabular data encompasses 230,833 workers’ compensation claims, and the text data contains notes for 25,691 of those claims. We present the baseline results of a medical cost classification model, which shows promise for future machine learning experiments. Primary research outcomes of the STRIDE database should include predictions on different costs, injury severity, claim length, and legal involvement.

 

Keywords Workplace Injury, Workers’ Compensation, Machine Learning, Dataset, Data Analysis, Natural Language Processing
   
    Article #:  RQD2025-177
 

Proceedings of 30th ISSAT International Conference on Reliability & Quality in Design
August 6-8, 2025