Skip to content
This repository has been archived by the owner on Oct 10, 2022. It is now read-only.

Add download script #3

Merged
merged 1 commit into from
May 8, 2019
Merged

Add download script #3

merged 1 commit into from
May 8, 2019

Conversation

akreal
Copy link
Contributor

@akreal akreal commented May 7, 2019

No description provided.

@snakers4
Copy link
Owner

snakers4 commented May 7, 2019

Many thanks
I will be reviewing shortly

You managed to download the whole dataset?
Was speed good?

@akreal
Copy link
Contributor Author

akreal commented May 7, 2019

Download is still running (I tested the script on smaller list of files). The speed looks good:

2019-05-07 17:30:57 (9.14 MB/s) - ‘asr_public_phone_calls_2.csv’ saved [83885346/83885346]
2019-05-07 17:30:58 (13.2 MB/s) - ‘asr_public_stories_1.csv’ saved [6637511/6637511]
2019-05-07 17:30:59 (31.0 MB/s) - ‘asr_public_stories_2.csv’ saved [10237130/10237130]
2019-05-07 17:31:00 (14.9 MB/s) - ‘public_lecture_1.csv’ saved [925515/925515]
2019-05-07 17:31:00 (32.1 MB/s) - ‘public_series_1.csv’ saved [2709024/2709024]
2019-05-07 17:31:07 (19.0 MB/s) - ‘public_youtube700.csv’ saved [104246624/104246624]
2019-05-07 17:31:08 (13.4 MB/s) - ‘ru_RU.csv’ saved [667639/667639]
2019-05-07 17:31:08 (8.73 MB/s) - ‘russian_single.csv’ saved [443268/443268]
2019-05-07 17:31:21 (33.7 MB/s) - ‘tts_russian_addresses_rhvoice_4voices.csv’ saved [288187135/288187135]
2019-05-07 17:50:41 (16.1 MB/s) - ‘asr_public_phone_calls_1.tar.gz’ saved [19464792668/19464792668]
2019-05-07 18:04:41 (24.7 MB/s) - ‘asr_public_phone_calls_2.tar.gz_aa’ saved [21474836480/21474836480]
2019-05-07 18:23:44 (18.0 MB/s) - ‘asr_public_phone_calls_2.tar.gz_ab’ saved [21474836480/21474836480]
2019-05-07 18:36:06 (16.6 MB/s) - ‘asr_public_phone_calls_2.tar.gz_ac’ saved [12612534250/12612534250]
2019-05-07 18:40:13 (15.6 MB/s) - ‘asr_public_stories_1.tar.gz’ saved [4011600315/4011600315]
2019-05-07 18:48:28 (15.8 MB/s) - ‘asr_public_stories_2.tar.gz’ saved [8070243785/8070243785]
2019-05-07 19:03:03 (23.6 MB/s) - ‘audiobooks_2.tar.gz_ac’ saved [21474836480/21474836480]
2019-05-07 19:19:41 (20.6 MB/s) - ‘audiobooks_2.tar.gz_ad’ saved [21474836480/21474836480]
2019-05-07 19:39:32 (17.2 MB/s) - ‘audiobooks_2.tar.gz_ae’ saved [21474836480/21474836480]

@akreal
Copy link
Contributor Author

akreal commented May 7, 2019

Two files (asr_public_phone_calls_2.tar.gz and tts_russian_addresses_rhvoice_4voices.tar) could not be downloaded, so I removed them from the list. Everything else has been downloaded successfully.

Thanks a lot for the dataset!

@snakers4
Copy link
Owner

snakers4 commented May 8, 2019

#2

I will merge the script, it's very clever, many thanks!
Hopefully someone will also improvise something similar in python

Two files (asr_public_phone_calls_2.tar.gz and tts_russian_addresses_rhvoice_4voices.tar) could not be downloaded, so I removed them from the list. Everything else has been downloaded successfully.

The files are there in the bucket, could you please try downloading again and report which particular link is broken (those are multi-part links)?
Sometimes we also faced non-availability of some of the files, re-trying helped.

image

image

@snakers4 snakers4 merged commit e59f8ca into snakers4:master May 8, 2019
@akreal
Copy link
Contributor Author

akreal commented May 8, 2019

The multi-part links are fine. Initially I had also whole versions of the same files in the list, because I copy-pasted it from the MD5 digest table, but actually there are no such links anywhere.

@snakers4
Copy link
Owner

snakers4 commented May 8, 2019

Yes, that's because I deleted them)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants