Predicting into unknown space? Estimating the area of applicability of spatial prediction models

Meyer, H; Pebesma, E

Forschungsartikel (Zeitschrift)

Zusammenfassung

1. Machine learning algorithms have become very popular for spatial mapping of the environment due to their ability to fit nonlinear and complex relationships. However, this ability comes with the disadvantage that they can only be applied to new data if these are similar to the training data. Since spatial mapping requires predictions to new geographic space which in many cases goes along with new predictor properties, a method to assess the area to which a prediction model can be reliably applied is required. 2. Here, we suggest a methodology that delineates the "area of applicability" (AOA) that we define as the area where we enabled the model to learn about relationships based on the training data, and where the estimated cross-validation performance holds. We first propose a "dissimilarity index" (DI) that is based on the minimum distance to the training data in the multidimensional predictor space, with predictors being weighted by their respective importance in the model. The AOA is then derived by applying a threshold which is the (outlier-removed) maximum DI of the training data derived via cross-validation. We further use the relationship between the DI and the cross-validation performance to map estimated performance of predictions. We illustrate the approach in a simulated case study chosen to mimic ecological realities and test the credibility by using a large set of simulated data. 3. The simulation studies showed that the prediction error within the AOA is comparable to the cross-validation error of the trained model, while the cross-validation error does not apply outside the AOA. This applies to models being trained with randomly distributed training data, as well as when training data are clustered in space and where spatial cross-validation is applied. Using the relationship between DI and cross-validation performances showed potential to limit predictions to the area where a user-defined performance applies. 4. We suggest to add the AOA computation to the modeller's standard toolkit and to present predictions for the AOA only. We further suggest to report a map of DI-dependent performance estimates alongside prediction maps and complementary to (cross-)validation performance measures and the common uncertainty estimates.

Details zur Publikation

Veröffentlichungsjahr: 2021
Sprache, in der die Publikation verfasst istEnglisch
Link zum Volltext: https://doi.org/10.1111/2041-210X.13650