Datasets / Data Intensive Scientific Computing on Petabyte Scalable Infrastructure Project


Data Intensive Scientific Computing on Petabyte Scalable Infrastructure Project

Published By National Aeronautics and Space Administration

Issued over 9 years ago

US
beta

Summary

Type of release
a one-off release of a single dataset

Data Licence
Not Applicable

Content Licence
Creative Commons CCZero

Verification
automatically awarded

Description

The infrastructure and programming paradigm for petabyte-level data processing performed at companies like Google and Yahoo shed some promising lights on the data-intensive scientific computing. Open source software and inexpensive commodity hardware make proprietary technologies within the grasp of academic communities. By leveraging these commercially proven and publicly available technologies, we are going to develop a suite of novel data management and analysis libraries, as an extension to existing primitive algorithms originally designed for web search. These libraries take advantage of the underlying petabyte-scalable data infrastructure, parallelize computation transparently and allow scientists and future commercial users to perform rather complex tasks (data mining, data visualization and machine learning) in a data intensive environment.