Accurate estimation of shipping CO2 emissions is important for developing regulations to combat the greenhouse effect. Many shipping CO2 emissions models have been proposed in the past decades. However, most of them are only validated for a few specific ships, and there is a lack of data-driven validation and comparison of these models on a large scale. To fill this gap, this study proposes a general evaluation framework to quantitatively validate and compare different emission models. This framework is based on data integration of three types of data sources: ship technical details, AIS trajectory, and weather. Along with emission models, these data are fed into three carefully-designed modules that perform analysis at both grid and trajectory level as well as use annually aggregated fuel consumption ground truth. Extensive experiments are conducted on one-month data from 1,571 ships passing Danish waters to demonstrate the utility of the framework and insights into the accuracy of five popular CO2 emission models are presented. |