Open-Source Pipeline for Large-Scale Data Processing, Analysis and Collaboration
Status: Completed
Start Date: 2018-04-24
End Date: 2020-04-23
Description: NASA's observational products generate petabytes of scientific data, which are highly underutilized due to computational requirements; disjoint data access protocols; and task-specific, non‑reusable code development. Our overall objective is to accelerate NASA science through development of an open-source, Python-based Pipeline for Observational Data Processing, Analysis, and Collaboration (PODPAC). The PODPAC software framework will enable widespread exploitation of earth science data by enabling multi-scale and multi-windowed access, exploration, and integration of available earth science datasets to support both analysis and analytics; automatically accounting for geospatial data formats, projections, and resolutions; simplifying implementation and parallelization of geospatial data processing routines; unifying sharing of data and algorithms; and enabling seamless transition from local development to cloud processing. To achieve these objectives, we will work with NASA Science Team members involved with the SMAP (Soil Moisture Active Passive) and EOSDIS (Earth Observing System Data and Information System) programs and the wider scientific community to define technical specifications for the software, and plan a list of prioritized enhancements for each quarterly release cycle; further develop the core Python library based on user feedback and using agile development practices; develop integrations with cloud computing resources, specifically targeting Amazon Web Services; develop and demonstrate best-available remotely sensed soil moisture, high-resolution downscaled soil moisture, and flood/drought monitoring applications to promote infusion into NASA programs; and engage with scientific community through conferences, meetings, webcasts, and by providing support in order to promote adoption of the software.
Benefits: We are initially targeting the SMAP program for PODPAC transition by supporting, publishing, and exploiting their observational soil moisture data products, while also developing value-added products. Creare has teamed with the NASA Science Team Leader for the SMAP satellite mission, and will use PODPAC to derive global high-resolution data products from the SMAP radiometer data to support applications in hydrology, agriculture, and humanitarian response. We will also target other NASA earth science observational programs, such as AIRS, AMSR-E, AMSR2, GMI, MODIS, and VIIRS for further deployment and transition of PODPAC. PODPAC will also be made available as open-source software for access by any NASA scientists performing geospatial data analysis.
We envision primary non-NASA applications for high-resolution soil moisture prediction and data analytics in the areas of agriculture, forestry, disaster and humanitarian response, and recreation. In particular, the agriculture industry would benefit from detailed knowledge of near surface and root zone soil moisture conditions by enabling improved irrigation and fertilization efficiencies.
We envision primary non-NASA applications for high-resolution soil moisture prediction and data analytics in the areas of agriculture, forestry, disaster and humanitarian response, and recreation. In particular, the agriculture industry would benefit from detailed knowledge of near surface and root zone soil moisture conditions by enabling improved irrigation and fertilization efficiencies.
Lead Organization: Creare, LLC