Volume 151
Published on November 2025Volume title: Proceedings of CONF-CIAP 2026 Symposium: Applied Mathematics and Statistics
Sprint-freestyle performance prediction and interpretation require precise and actionable models for coaches and athletes. This study presents an interpretable machine learning model applied to lap-by-lap metrics from A-final 100-yard freestyle swims (n = 67). We construct a 12-dimensional feature vector from three technical metrics (mean stroke rate, cycle count, and breakout distance) across four laps, and construct both a regression task (smooth race time prediction) and a binary classification task (fast/slow, threshold at 41.4 s). Several algorithms were explored—Linear Regression, Random Forest, k-Nearest Neighbors (kNN), and Support Vector techniques—on multiple train/test splits and based on measures of R², MAPE, accuracy, and F1 score. Where regression R² values were low (best mean R² ≈ −0.042 for Random Forest), MAPE was nonetheless small (~0.011), with modest absolute error but little explained variance. Classification fared better: kNN recorded the best mean accuracy (≈0.727) and F1 (≈0.717). Most significantly, SHAP (Shapley Additive Explanations) identified Lap2_Stroke_Rate and Lap4_Breakout_Dist as two of the top features. Feature-selection tests showed that models that are trained on higher features perform with identical MAPE with significantly fewer inputs, towards useful, interpretable, and data-efficient ways for performance monitoring and coaching decisions.