The RoboTurk Real Robot Dataset

We collected a large-scale dataset on three different real world tasks: Laundry Layout, Tower Creation, and Object Search. All three datasets were collected using the RoboTurk platform, collected by crowdsourced workers remotely. Our dataset consists of 2144 different demonstrations from 54 unique users. We provide both the complete dataset for training and smaller subsamples of the dataset for exploration.

We are providing this comprehensive and diverse dataset that, instead of largely 2D manipulation tasks, include tasks with complex 3D motions that can be utilized for similar 3D manipulation tasks. Furthermore, our tasks are long-horizon, so it is important for prediction models to be able to reason about different parts of the task, given some context or history. In addition, our dataset can be used for action-conditioned video prediction for model based predictive control1 or for action-free video prediction2.

We will describe the structure of the dataset in the sections below and demonstrate video prediction results.

Laundry Layout (17.2 GB) Object Search (18.8 GB) Tower Creation (18.8 GB)

We have provided a Github repository that includes scripts for exploring the dataset and for training video prediction:

After You Download

After unzipping the dataset, the following subdirectories can be found within the each directory. Every directory has the same structure as described below:

{task_name}_aligned_dataset.hdf5: A postprocessed, aligned set of data that contains control data from the user and joint data from the robot along with the corresponding timestamps with the structure described in the next section

{task_name}.zip: A postprocessed, aligned set of folders that, for each demonstration, contains three MP4 videos: depth videos from the Kinect (kinect_depth_aligned) which contain monochromatic data from the kinect depth sensor, RGB videos from the Kinect (kinect_rgb_aligned), and RGB videos from the webcam (usb_aligned).

The file structure of the video data is at follows and we will provide scripts below that map the demos to the video files:

Video Prediction

We show results from our experiments with Stochastic Variational Video Prediction to show the use of our dataset in action-conditioned video prediction which has possible uses in model predictive control. The scripts to run the same experiments are explained in the Github wiki.

Qualitative Results

Predictions on Laundry Layout
Ground Truth
Predictions on Tower Creation
Ground Truth


          title={Scaling robot supervision to hundreds of hours with roboturk: Robotic manipulation dataset through human reasoning and dexterity},
          author={Mandlekar, Ajay and Booher, Jonathan and Spero, Max and Tung, Albert and Gupta, Anchit and Zhu, Yuke and Garg, Animesh and Savarese, Silvio and Fei-Fei, Li},
          booktitle={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},