Aiming at the problem of low detection accuracy caused by more background interference information, smaller targets, and multi-scale distribution of road crack images in complex environments, a multi-module coordinated optimization pavement crack detection model based on the You Only Look Once version 5 small (YOLOv5s) is proposed. Firstly, the backbone of the benchmark model is replaced with the EfficientViT to improve the detection accuracy of small targets and achieve significant improvements in memory efficiency and channel communication. Secondly, the lightweight convolution Spatial and Channel reconstruction Convolution (SCConv) is introduced to replace the standard convolution, which reduces feature redundancy in the channel and computing costs of the model. Finally, a flexible attention module based on residual network and soft thresholding is designed to be incorporated into the neck to realize noise removal and improve the anti-interference ability of the model in complex environments. The experimental results show that the proposed model achieved 79.2% F1-score and 82.5% mean average precision (mAP) on the custom dataset, with corresponding metrics reaching 65.4% and 69.6% respectively on the public dataset. The study demonstrates the proposed model exhibits outstanding detection performance, providing a technical reference for efficient pavement crack detection in complex environments.