SparkRS - Spark for Remote Sensing

Status: Completed

Start Date: 2015-06-17

End Date: 2015-12-17

Description: The proposed innovation is Spark-RS, an open source software project that enables GPU-accelerated remote sensing workflows in an Apache Spark distributed computing cluster. Current state-of-the-art parallel systems like Hadoop and Spark offer horizontally scalable analytics and reduced costs for enterprises, but weren't built to natively consume and process large remote sensing raster datasets. Conversely, GPUs can vastly accelerate image processing operations. Some open source projects have arisen that showcase hybrid Hadoop/GPU computing. However, there are no mature open source projects that utilize GPUs within Spark (an eventual replacement of MapReduce) and none that were built to process large remote sensing imagery. This is the primary role of the proposed innovation, Spark-RS. Spark-RS contains three primary components. One is a parallel large image loading component that quickly loads large multi-band imagery into a Spark cluster. The second component is a remote sensing library for Spark applications. It provides an API for reading and writing large images and wraps many common image operations from existing open source and NASA-built remote sensing libraries. The third component is a GPU management library for Spark. It simplifies and abstracts utilization of GPUs within a Spark application.
Benefits: Each of the datasets listed in this SBIR's description and their corresponding applications are all potential candidates for use by Spark-RS since they involve large multi-spectral and hyper-spectral raster-based observations. These include HyspIRI, JPSS-1, NPP, SDO, MRO, MERRA, MERRA2, LandSat among many, many others. Thus, any NASA datacenter that has a Hadoop-based cluster will benefit from the proposed innovation, Spark-RS.

Spark-RS can equally be applied to myriad other remote sensing and GIS applications across U.S. government agencies moving towards big data platforms like Hadoop and Spark. There are at least 28 different U.S. government agencies that utilize or produce geospatial data. Not all utilize raster datasets, but many do. In particular, the Department of Defense (Army, Air Force, Navy, USMC) and the Intelligence Community (NSA, CIA, NGA, DIA, etc.) all produce and consume large amounts of image-based data. With the increasing amount of non-ortho-rectified oblique imagery-based datasets from sensors from aerial photography (WAMI/FMV), Spark-RS could also play a critical role. In addition, domestic agencies like USGS, FBI, EPA, FEMA, etc. also have vast quantities of raster-based datasets. Lastly, industrial applications including GIS mapping companies, aerial photography companies & medial imaging companies all can benefit and, importantly, contribute back to the sustained success of Spark-RS.

Lead Organization: Spiritus, Inc