Scholar - SciOpen

Sport plays a crucial role in society, influencing physical health, entertainment, and community engagement. As artificial intelligence advances, the ability to classify sport images accurately becomes increasingly crucial. Effective sport image classification enhances applications, such as performance analysis, athlete tracking, and fan engagement. Despite its significance, current methods face challenges due to limited labeled datasets and issues with feature misalignment. This paper introduces a novel Contrastive Language-Image Pre-training (CLIP) based framework specifically designed for sport image classification. By incorporating data augmentation techniques, the approach addresses data sparsity and enriches the diversity of image-text pairings, reducing the need for extensive manual annotation. Additionally, feature alignment strategies tackle text-image misalignment issues that affect classification accuracy. This approach fills a significant research gap and offers practical solutions to improve classification performance in sport image analysis. The results of extensive experiments validate the effectiveness of the framework, demonstrating its potential to advance sports analytics and contribute to more precise and scalable solutions in sport image classification.