Greece-based maritime tech company DeepSea has published a new research paper outlining a process for verifying the accuracy of models of ships generated by artificial intelligence (AI) systems when applied to in real-world conditions.
The research was carried out by seven of DeepSea’s thirteen-strong team of research scientists, headed up by Dr Antonis Nikitakis.
“This research is an important step in helping our customers and the wider market to understand the true power, while alleviating the limitations, of an AI-based approach,” said Dr Nikitakis.
“Coupled with the daily real-world impact we’re seeing on fuel consumption and CII ratings, we believe this sort of information is key to popularising this incredible technology throughout the industry.”
The research notes that most current AI models provide an estimation of their accuracy based on testing with data obtained from the same distribution as the data used to train the model (i.e., representative of similar conditions and containing similar biases).
For example, if the model is trained on data from the vessel’s historical behaviour, in a narrow range of common wind speeds or drafts, it is also then tested on data with these speeds and drafts. As a result, the tests performed can’t properly assess if the model is reproducing biases that exist in the training data, or how it might operate in different conditions that have never been previously encountered.
As an alternative, the researchers proposed an evaluation methodology built around a specifically designed dataset partitioning scheme to expose a model’s robustness to large distributional shifts. The proposed methodology includes analysis of results through the lens of ‘predictive uncertainty’ to assess the model’s fitness in handling uncertain and noisy regions in the modelled dataset.
In testing the proposed method, the group found that splitting the dataset as described successfully exposed models’ performance drop when moving from in-domain to out-of-domain dataset splits. Predictive uncertainty was also found to correlate well with such drops, making it possible to assess the model’s performance after deployment without access to the true target values.
The full research paper is available to download here.