-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hansard(s) #47
Comments
Looks like the Canadian hansard is about 5GB in .csv format and wouldn't require too much cleanup. If the UK hansard is of a similar size and would require tooling to scrape, it might not be worth it. Can we confirm that this content is indeed public domain? |
The UK hansard is under the Open Parliament Licence which tracks pretty closely with Public domain except has personal information and national security exemptions. For the Canadian one, there doesn't seem to be any authoritative source on the dataset site or the government's. The dataset seems to be mostly sourced from government publications according to the website, so should probably come under this. I sent them an email. The UK hansard is also available as a consolidated dataset and also requires minimal formatting. Just trying to choose between:
or
|
I would love to help with this. (Looks like several countries do not clearly specify the licensing information of the Hansards) |
Hey! I'm mostly done with the Canadian and UK ones, and yeah haven't been able to get much license information for all others. The Australian one is CC-BY-ND-NC which is out of scope . The Singapore one is also under a limited license iirc. |
Yes, I noticed both Australia and Singapore were out of scope. |
A lot of Commonwealth counties provide official transcripts of parliamentary debates going back many years. The work on the Canadian one already seems to be done (couldn't find a license, can ask them), and the UK Hansard can be easily scraped.
The text was updated successfully, but these errors were encountered: