DeGaP: A Deep Gaussian Process Surrogate Model for Cleaning Data from Spatially Distributed Sensor Networks

Status: Completed

Start Date: 2024-08-07

End Date: 2025-02-06

Description: Spatiotemporal data from space and ground-based instruments are used in research and as a validation source for physical models throughout NASA fundamental space science and sensors on the ground and in space. While there are a wide range of measurement types and uses, all data suffers from non-physical errors that must be identified, and removed, before its use in science and operations. It is important that these data gaps be filled with scientifically-derived proxies automatically, and with as little latency as possible, for timely hazard detection. Our proposed innovation is to develop a Deep Gaussian Process surrogate model (DeGaP) that is used to fill in gaps due to either bad or missing data, while retaining the high precision and fidelity of the original measurements along with uncertainty quantification (UQ) of the filled-in measurements. Deep Gaussian Processes are a powerful technique to model complex system behavior while enabling uncertainty to be fundamentally incorporated by virtue of the stochastic structure of the model. In Phase I, we will a) develop a prototype of the DeGaP model using a ground-based magnetic field sensor network, MagStar. b) quantify the accuracy and uncertainty of the model predictions in terms of spatial and temporal proximity to other magnetometers in the network c) host a web service for continuous dissemination of the DeGaP model output wherein gaps and transient disturbances in the magnetic field data are replaced and uncertainty quantification provided for the filled-in data. Potential applications for the Phase 1 development include the Space Weather Modeling Framework (SWMF) running at CCMC, World Magnetic Model at NOAA and space-based magnetometers. Potential private sector market for the product is the U.S. bulk power industry, entities that support US power utilities in regulatory response, and entities that need machine-learning (ML) ready data sets from sensor networks for ML models in production.
Benefits: Phase 1 immediately supports the following models and missions: • Space Weather Modeling Framework (SWMF) running at CCMC. SWMF is a magnetohydrodynamics (MHD) model of the geospace environment that can be used to simulate geomagnetic storm conditions, including magnetic variations on the ground. As MHD models advance, they can be used not just for specification of the magnetic field, but also in a predictive capacity. Measurements of the magnetic field provide validation data for the footprint of the simulation on the ground. A cleaned, annotated, and gap-filled data set is important for effectively characterizing model versus measurement error. • World Magnetic Model (NOAA). The World Magnetic Model is used as a tool for specifying the changing characteristics of the magnetic field over decades, and provides local estimates of the magnetic field strength, orientation and declination at a given point on the globe. It is used in multiple scientific disciplines including geohazards, space science, geo-orientation, and near-Earth satellite environment assessment to provide baseline magnetic field values. When comparing this model to local measurements of the geomagnetic field, it is critical to understand whether observed differences are real, or due to environmental errors. • Space-based magnetometers. Magnetometers are commonly used on satellite missions, both as scientific sensors and for orientation and navigation. In Phase II, after validation on the simpler ground-based magnetic field measurements, this product will be expanded to identify and fill erroneous measurements in space-based magnetic sensors. The private sector market for the product is the U.S. bulk power industry, including generation, transmission and distribution. The power transmission sector is comprised of 7584 businesses with $380bn in total annual revenue. Power transmission companies with transformer assets above 210kV are subject to FERC regulations TPL-007-1 and EOP-010-1, which makes them our primary addressable market for the Phase 1 data set. The entities that support US power utilities in regulatory response also have a strong need for tools to help them respond to the space weather hazard. Planning and coordination groups such as ERCOT, ISO-NE, WECC, and BPA, which are comprised of power utilities, often support regulatory response for planning and operations. NOAA-SWPC is responsible for the prediction and alerts of space weather events to power grid operators, and would also benefit from improved, more reliable data to ingest in their hazard models. CPI has an existing relationship with NOAA-SWPC established in 2019 under a Cooperative Research and Development Agreement (CRADA). This product will improve our magnetic field data delivery and strengthen the potential for long-term use in operational settings. There is also a new potential market in the growing artificial intelligence / machine learning (AI/ML) community, which needs validated, labeled, and cleaned data sets. While the volume of data produced by sensor networks has grown exponentially, there is considerable processing required to render the data ready for AI/ML applications. The DeGaP model, deployed within a continuous data processing framework, is a scalable solution for providing high-volume, high-fidelity, validated and cleaned data along with UQ, meeting the ever-increasing need for machine-learning ready data sets for AI/ML applications in production.

Lead Organization: Computational Physics, Inc.