New Public Datasets Added To AWS
Written by Kay Ewbank   
Wednesday, 06 February 2019

Amazon has announced nine new AWS public datasets for researchers and developers interested in machine learning, environmental science, geospatial, astronomy, cybersecurity, and housing.

The AWS Public Dataset Program covers the cost of storage for publicly available high-value cloud-optimized datasets. The datasets within it can be used for analysis on AWS, and the aim is also to develop new cloud-native techniques, formats, and tools that lower the cost of working with data.

aws

 

The machine learning dataset is a massively multilingual image dataset from the University of Pennsylvania. The dataset contains images paired with the words they represent in 100 languages, and the dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, as well as parallel to the word's translation into English. The image below shows five images for the Indonesian word "kucing", a word with high predicted concreteness, along with its top 4 ranked translations using CNN features:

cats

 

There are three environmental datasets. The first is a set of atmospheric deterministic and probabilistic forecasts from the UK Meteorological Office. This is actually an update to previously available data, but is now updated daily.

The second environmental dataset is a collection of scientific information for land owners from the Queensland Government. The database is made up of Australian climate data from 1889 to the present.

The third collection of environmental data is air quality and radiation data from Safecast. Safecast was started after the Fukushima Daiichi Nuclear Power Plant meltdown, when volunteers began monitoring radiation levels. Air quality measurements were added later, and the project has spread around the world.

There are two new Geospatial datasets; the USGS 3D elevation data, which contains elevation data in the form of light detection and ranging (LiDAR) data over the United States, Hawaii, and the U.S. territories, with data acquired over an 8-year period; and a set of images collected by the China-Brazil Earth Resources Satellite from AMS Kepler.
 
In the astronomy sector, there's data from the Transiting Exoplanet Survey Satellite (TESS), a two-year survey looking for exoplanets in orbit around bright stars.
 
The Open City Model data has also been made available. This is an initiative to provide cityGML data for all the buildings in the United States. By using other open datasets in conjunction with the researchers' own code and algorithms, the intention is to provide 3D geometries for every US building.
 

The final addition is a collection of datasets from QIIME 2. The Microbiome research user tutorial datasets contains the user documents and datasets for QIIME 2. QIIME is an extensible and decentralized microbiome analysis package with a focus on data and analysis transparency. It enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results. 

 

aws

More Information

Massively Multilingual Image Dataset

Learning Translations via Images with a Massively Multilingual Image Dataset

Atmospheric Deterministic and Probabilistic Forecasts

Scientific Information for Land Owners

Safecast Air Quality and Radiation data

USGS 3DEP LiDAR Point Clouds 

China-Brazil Earth Resources Satellite

Transiting Exoplanet Survey Satellite

Open City Model

Microbiome Research User Tutorial Datasets

Related Articles

Amazon Releases Managed Message Broker Service for ActiveMQ

AWS Lambda for the Impatient Part 1

AWS Lambda for the Impatient Part 2

AWS Lambda for the Impatient Part 3

Amazon Adds Game Dev Options To AWS

Amazon Strengthens Data Offerings

New Amazon Elasticsearch Service
Amazon Introduces Quicksight - Cloud BI

New AWS Managed Services

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Rust Twice As Productive As C++
03/04/2024

Google director of engineering, Lars Bergstrom, gave a talk at the recent Rust Nation UK conference and claimed that Rust was twice as productive as C++. Given how good Google is at C++, this is quite [ ... ]



Pure Virtual C++ 2024 Sessions Announced
19/04/2024

Microsoft has announced the sessions for Pure Virtual C++ 2024, which is taking place on April 30th 15:00 UTC. People who sign up will get access to five sessions happening on the day, alongside a ran [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 06 February 2019 )