Skip to content

Workloads repository

SWIMProjectUCB edited this page May 4, 2012 · 9 revisions

The SWIM workload repository is expanding. We will add more workloads here as we obtain approval to release them.

Currently there are four previously synthesized day-long workloads and another short duration test workload:

These workloads are one day in duration, and contains 24 historical trace samples, each of 1 hour long.

FB-2009 comes from historical Hadoop traces on a 600-machine cluster at Facebook. The original trace spans 6 months from May 2009 to October 2009, and contains roughly 1 million jobs.

FB-2010 comes from historical Hadoop traces on the same cluster at Facebook, now grown to 3000 machines. The original trace spans 1.5 months from October 2010 to November 2010, and also contains roughly 1 million jobs.

File format: Tab separated fields, one record per row per job, each row contains fields in the following order:

1. new_unique_job_id
2. submit_time_seconds (relative to the start of the workload)
3. inter_job_submit_gap_seconds
4. map_input_bytes
5. shuffle_bytes
6. reduce_output_bytes
7. anonymized_input_path (only FB-2010_samples_24_times_1hr_withInputPaths_0.tsv)

This is the same format as the output of WorkloadSynthesis.pl.