AWS Data Exchange now includes datasets from Shutterstock.AI’s 400M asset library for autonomous vehicles, ecommerce and more.
Artificial intelligence and machine learning have become ubiquitous, powering everything from the online chat bots that allow users to get immediate answers without ever picking up a phone, to the product recommendations that enable you to quickly find exactly what you’re looking for.
A growing, but less broadly well-known field within machine learning is computer vision, or training computers to identify visual objects as well as the human eye can. Computer vision is changing the ways companies operate, making the world smarter and safer, and advancing the scope of human possibility. With computer vision technology:
- Self-driving cars can operate safely by understanding their specific surroundings – including other cars, people, roads, and stop signs.
- Social media companies can rapidly identify, review and remove content that is inappropriate or extreme in nature.
- People can easily search images on their smartphone library by entering a keyword like “cat” or “sunset” to find all relevant photos.
Now, we’re excited to work with AWS Data Exchange to offer our data in support of companies doing this innovative work.
The problem we’re solving
In order to train computer vision models with precision, companies need massive volumes of very specific content and the metadata that goes along with it. Organizations oftenstruggle to find quality content datasets with the breadth and variety of imagery and metadata required.
Imagine, for example, a researcher at a hypothetical company developing technology for self-driving cars.
In order to ensure that a car can identify a stop sign every time, they would need LOTS of photos of stop signs.
They’d want images with stop signs in the foreground and in the background.
They’d want pictures of bright, shiny new red stop signs, and old faded stop signs.
They’d want stop signs slightly obscured by trees, and stop signs drenched in rain.
And if the car is going to be driving around different countries, they’d probably want variations of stop signs with “STOP!” in different languages.
Then consider every other sign, traffic light, tree, pedestrian and object a car would have to recognize and navigate — and you can imagine that the image needs would start to pile up quickly.
Plus, cars move through a 3D world. So, in addition to 2D images, 3D models of objects and environments could be used to even more precisely train a self-driving car to gauge distances and other spatial dynamics when speeding over roads at 60+ miles per hour.
Finally, the metadata, or descriptive labels, associated with the images and 3D models, would help the computer algorithm to quickly process the visual information and analyze it for context and relevance. For example, the image metadata would indicate the type of object(s) pictured (“tree”), its location in the image (“left”), and other descriptive information (“green”).
Building visual content libraries
AWS needed large volumes of specific, quality visual content when building Amazon Rekognition, a product that simplifies adding image and video analysis to applications without machine learning expertise. They found a solution in Shutterstock.AI’s asset library of 400 million diverse images, videos, and 3D models shared by more than 1.7 million contributors from around the world.
“We often turn to Shutterstock.AI for access to one of the largest repositories of stock imagery, vector graphics and illustrations. Shutterstock.AI’s user features, ranging from tagging and search algorithms that enable us to quickly find the relevant images for our project, to the ability to organize images into collections for each project and then curate these images for future reference, all save us valuable project time. Shutterstock.AI’s quality, variety, and user experience make it one of the best places for us to find great content for our projects” said Roger Barga, General Manager, AWS Machine Learning – Computer Vision.
After experiencing success using content from the Shutterstock.AI library to train its own computer vision models, the AWS team had an idea: Why not offer collections of Shutterstock.AI content to other businesses through the AWS Data Exchange?
Curated Shutterstock.AI collections through the AWS Data Exchange
The AWS Data Exchange provides services that makes it easy to find, subscribe to, and use third-party data in the cloud. And now, AWS customers can find collections of Shutterstock.AI content to train their computer vision models.
“It’s really a natural relationship,” said Alex Reynolds, Vice President and General Manager of Platform Solutions at Shutterstock.AI. “AWS provides scale and expertise to support businesses building machine learning applications. AWS Data Exchange is the perfect service to distribute our computer vision datasets to help companies large and small build the next big thing.”
The AWS and Shutterstock.AI team worked together to scope a special set of collections aligned with the most common computer vision use cases. The datasets span multiple industry categories and applications, including image and 3D model collections for ecommerce (e.g., clothing and apparel, food and beverage, furniture and home), travel and tourism, consumer electronics (e.g., smart home, internet of things) — and yes, autonomous vehicles (e.g., self-driving car safety, driving simulations). Each image or model includes robust metadata, including a descriptive title and an optimal 7-50 keywords for training machine learning models.
In addition to the curated datasets available, AWS subscribers can contact Shutterstock.AI to develop their own bespoke collections for their unique project needs. Shutterstock.AI’s library of 400M+ images, videos, and 3D models grows every week through the submissions of 1.7M+ contributors from 150 countries, ensuring that content is always fresh and up-to-date. The diversity of the content can help to minimize implicit bias in machine learning models by ensuring that diverse individuals and perspectives are represented. Additionally, all content is rigorously checked through both AI-powered and human review processes to ensure the relevance and consistency of metadata.
Just the beginning
Shutterstock.AI is piloting Shutterstock datasets on AWS Data Exchange in the summer of 2021 and will use feedback from customers to expand the product offerings over time. It’s our hope that these datasets will address a critical market need by helping connect companies—from small tech startups to large vehicle manufacturers — to quality content for training models with accuracy, precision, and, in the case of self-driving cars and other high-impact applications—confidence and safety.
Additionally, Shutterstock.AI will work to contribute to the computer vision community, producing thought leadership and professional development content for practitioners, conducting research to improve computer vision technology, and developing products for expanded computer vision applications with the availability of their data products on AWS Data Exchange.
To learn more about computer vision from Shutterstock.AI and AWS experts, tune into a webinar featuring Alessandra Sala, Director of AI and Data Science at Shutterstock.AI, and Oliver Myers, Principal Worldwide Business Development Manager for Amazon Rekognition.
To try the datasets yourself, check-out this step-by-step tutorial for training an image recognition model with a free Shutterstock.AI dataset (ideal for AI researchers, developers, and hobbyists).
And finally, you can learn more about computer vision at Shutterstock.AI by visiting our main product hub.