Cross-validation for geospatial data: Estimating generalization performance in geostatistical problems

Jing Wang; Laurel Hopkins; Tyler Hallman; W Douglas Robinson; Rebecca Hutchinson

Cross-validation for geospatial data: Estimating generalization performance in geostatistical problems

Research output: Contribution to journal › Article › peer-review

Standard Standard

Cross-validation for geospatial data: Estimating generalization performance in geostatistical problems. / Wang, Jing; Hopkins, Laurel; Hallman, Tyler et al.
In: Transactions on Machine Learning Research, 04.10.2023.

Research output: Contribution to journal › Article › peer-review

RIS

TY - JOUR

T1 - Cross-validation for geospatial data: Estimating generalization performance in geostatistical problems

AU - Wang, Jing

AU - Hopkins, Laurel

AU - Hallman, Tyler

AU - Robinson, W Douglas

AU - Hutchinson, Rebecca

PY - 2023/10/4

Y1 - 2023/10/4

N2 - Geostatistical learning problems are frequently characterized by spatial autocorrelation in the input features and/or the potential for covariate shift at test time. These realities violate the classical assumption of independent, identically distributed data, upon which most cross-validation algorithms rely in order to estimate the generalization performance of a model. In this paper, we present a theoretical criterion for unbiased cross-validation estimators in the geospatial setting. We also introduce a new cross-validation algorithm toevaluate models, inspired by the challenges of geospatial problems. We apply a framework for categorizing problems into different types of geospatial scenarios to help practitioners select an appropriate cross-validation strategy. Our empirical analyses compare cross-validation algorithms on both simulated and several real datasets to develop recommendations for a variety of geospatial settings. This paper aims to draw attention to some challenges that arise in model evaluation for geospatial problems and to provide guidance for users.

AB - Geostatistical learning problems are frequently characterized by spatial autocorrelation in the input features and/or the potential for covariate shift at test time. These realities violate the classical assumption of independent, identically distributed data, upon which most cross-validation algorithms rely in order to estimate the generalization performance of a model. In this paper, we present a theoretical criterion for unbiased cross-validation estimators in the geospatial setting. We also introduce a new cross-validation algorithm toevaluate models, inspired by the challenges of geospatial problems. We apply a framework for categorizing problems into different types of geospatial scenarios to help practitioners select an appropriate cross-validation strategy. Our empirical analyses compare cross-validation algorithms on both simulated and several real datasets to develop recommendations for a variety of geospatial settings. This paper aims to draw attention to some challenges that arise in model evaluation for geospatial problems and to provide guidance for users.

M3 - Article

JO - Transactions on Machine Learning Research

JF - Transactions on Machine Learning Research

ER -

Research Portal

Cross-validation for geospatial data: Estimating generalization performance in geostatistical problems

Standard Standard

HarvardHarvard

APA

CBE

MLA

VancouverVancouver

Author

RIS