Real-time biomechanical feedback system for swimming turn analysis based on convolutional neural networks and temporal attention mechanism
Abstract
This paper presents an advanced deep learning framework that integrates convolutional neural networks (CNNs) with temporal attention mechanisms for real-time swimming turn analysis. The proposed architecture features a hybrid spatial-temporal design with multi-scale feature fusion and adaptive normalization, achieving robust performance in challenging underwater environments. The system demonstrates 96.2% accuracy in standard conditions and 91.8% accuracy under low-light scenarios, with a 15% improvement over existing methods. By optimizing computational complexity, the framework achieves 32 frames per second with a 99.99% error recovery rate and a 23% improvement in resource utilization efficiency. Extensive validation shows robust performance across varying water qualities, lighting conditions, and motion scenarios. In addition to its technical robustness, the framework introduces a novel adaptive error handling mechanism, hierarchical state machines, and hybrid deep learning architecture, ensuring stable operation with a mean time between failures (MTBF) of 8760 h and mean time to recovery (MTTR) of 1.2 s. Tested in Olympic-standard facilities, the system reliably delivers precise biomechanical feedback for athletes and coaches. Future research will extend the system to multi-object detection, integrate advanced acoustic sensing for zero-visibility conditions, and explore federated learning for privacy-preserving model updates. This work sets new benchmarks for underwater motion analysis, advancing both athletic training and aquatic research.
References
1. Carreira J, Zisserman A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
2. Feichtenhofer C, Fan H, Malik J, et al. SlowFast Networks for Video Recognition. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019.
3. He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016.
4. Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018.
5. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning; 2015.
6. Nakazono Y, Shimojo H, Sengoku Y, et al. Impact of variations in swimming velocity on wake flow dynamics in human underwater undulatory swimming. Journal of Biomechanics. 2024; 165: 112020. doi: 10.1016/j.jbiomech.2024.112020
7. Kingma DP, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations; 2015.
8. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Available online: https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 2 December 2024).
9. Lin TY, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017.
10. Liu Z, Ning J, Cao Y, et al. Video Swin Transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021.
11. Nibali A, He Z, Morgan S, Prendergast L. 3D human pose estimation with 2D marginal heatmaps. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2019.
12. Pan J, Luo J, Qiu G. Multi-scale feature fusion for video-based human action recognition. Pattern Recognition Letters. 2021; 145: 1–8.
13. Redmon J, Farhadi A. YOLOv3: An incremental improvement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2018.
14. Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017; 39(6): 1137-1149. doi: 10.1109/tpami.2016.2577031
15. Veiga S, Lorenzo J, Trinidad A, et al. Kinematic Analysis of the Underwater Undulatory Swimming Cycle: A Systematic and Synthetic Review. International Journal of Environmental Research and Public Health. 2022; 19(19): 12196. doi: 10.3390/ijerph191912196
16. Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research. 2014; 15: 1929-1958.
17. Tran D, Bourdev L, Fergus R, et al. Learning Spatiotemporal Features with 3D Convolutional Networks. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); 2015.
18. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017; Long Beach, CA, USA.
19. Guignard B, Rouard A, Chollet D, et al. Perception and action in swimming: Effects of aquatic environment on upper limb inter-segmental coordination. Human Movement Science. 2017; 55: 240-254. doi: 10.1016/j.humov.2017.08.003
20. Wang X, Girshick R, Gupta A, et al. Non-local Neural Networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018.
21. Wu Z, Xie S, Wang X, et al. Fast accurate video object segmentation with multi-scale feature fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2020.
22. Xie S, Girshick R, Dollar P, et al. Aggregated Residual Transformations for Deep Neural Networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.
23. Zhang Z, Tao D. SlowFast bilateral networks for video recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021; 43(2): 452–465.
24. Zhou B, Andonian A, Torralba A, et al. Temporal relational reasoning in videos. In: Proceedings of the IEEE International Conference on Computer Vision; 2019.
25. Zhou X, Wang W, Li H. Spatiotemporal attention for video action recognition. IEEE Transactions on Multimedia. 2020; 22(10): 2577-2590.
26. Zhu Y, Lan Z, Newsam S, et al. Hidden two-stream convolutional networks for action recognition. In: Proceedings of the Asian Conference on Computer Vision; 2017.
27. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016.
28. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553): 436-444. doi: 10.1038/nature14539
Copyright (c) 2025 Author(s)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright on all articles published in this journal is retained by the author(s), while the author(s) grant the publisher as the original publisher to publish the article.
Articles published in this journal are licensed under a Creative Commons Attribution 4.0 International, which means they can be shared, adapted and distributed provided that the original published version is cited.