Loading…
ApacheCon NA 2015 has ended
Monday, April 13 • 11:45am - 12:35pm
Applying Apache Hadoop to NASA’s Big Climate Data - Glenn Tamkin, NASA

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance analytics because it optimizes computer clusters and combines distributed storage of large data sets with parallel computation. We have built a platform for developing new climate analysis capabilities with Hadoop.

Hadoop is well known for text-based problems. Our scenario involves binary data. So, we created custom Java applications to read/write data during the MapReduce process. Our solution is unique because it: a) uses a custom composite key design for fast data access, and b) utilizes the Hadoop Bloom filter, a data structure designed to identify rapidly and memory-efficiently whether an element is present.

This presentation, which touches on motivation, use cases, and lessons learned, will explore the software architecture, including all Apache contributions (Avro, Maven, etc.).

Speakers
avatar for Glenn Tamkin

Glenn Tamkin

NASA
Mr. Tamkin is the lead software engineer and architect for the NASA Center for Climate Simulation’s (NCCS) Climate Informatics project. Recently, he has built a Hadoop-based system designed to perform analytics across NASA’s Big Climate Data. Prior endeavors extended from spacecraft... Read More →


Monday April 13, 2015 11:45am - 12:35pm CDT
Texas II

Attendees (0)