-
Notifications
You must be signed in to change notification settings - Fork 94
Workloads repository
The SWIM workload repository is expanding. We will add more workloads here as we obtain approval to release them.
Currently there are four previously synthesized day-long workloads and another short duration test workload:
- FB-2009_samples_24_times_1hr_0.tsv
- FB-2009_samples_24_times_1hr_1.tsv
- FB-2010_samples_24_times_1hr_0.tsv
- FB-2010_samples_24_times_1hr_withInputPaths_0.tsv
- FB-2009_samples_24_times_1hr_0_first50jobs.tsv (for testing)
These workloads are one day in duration, and contains 24 historical trace samples, each of 1 hour long.
FB-2009
comes from historical Hadoop traces on a 600-machine cluster at Facebook. The original trace spans 6 months from May 2009 to October 2009, and contains roughly 1 million jobs.
FB-2010
comes from historical Hadoop traces on the same cluster at Facebook, now grown to 3000 machines. The original trace spans 1.5 months from October 2010 to November 2010, and also contains roughly 1 million jobs.
File format: Tab separated fields, one record per row per job, each row contains fields in the following order:
1. new_unique_job_id
2. submit_time_seconds (relative to the start of the workload)
3. inter_job_submit_gap_seconds
4. map_input_bytes
5. shuffle_bytes
6. reduce_output_bytes
7. anonymized_input_path (only FB-2010_samples_24_times_1hr_withInputPaths_0.tsv)
This is the same format as the output of WorkloadSynthesis.pl.