Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hansard(s) #47

Open
baberabb opened this issue Jan 9, 2024 · 5 comments
Open

Hansard(s) #47

baberabb opened this issue Jan 9, 2024 · 5 comments
Labels
external project We will be including this data but the work will be done primarily by someone else.

Comments

@baberabb
Copy link
Contributor

baberabb commented Jan 9, 2024

A lot of Commonwealth counties provide official transcripts of parliamentary debates going back many years. The work on the Canadian one already seems to be done (couldn't find a license, can ask them), and the UK Hansard can be easily scraped.

@craffel craffel added the external project We will be including this data but the work will be done primarily by someone else. label May 6, 2024
@craffel
Copy link
Collaborator

craffel commented May 6, 2024

Looks like the Canadian hansard is about 5GB in .csv format and wouldn't require too much cleanup. If the UK hansard is of a similar size and would require tooling to scrape, it might not be worth it. Can we confirm that this content is indeed public domain?

@baberabb
Copy link
Contributor Author

baberabb commented May 13, 2024

The UK hansard is under the Open Parliament Licence which tracks pretty closely with Public domain except has personal information and national security exemptions.

For the Canadian one, there doesn't seem to be any authoritative source on the dataset site or the government's. The dataset seems to be mostly sourced from government publications according to the website, so should probably come under this. I sent them an email.

The UK hansard is also available as a consolidated dataset and also requires minimal formatting. Just trying to choose between:

<member for parliament>

<what they say>

<next speaker>

or

<member for parliament>: <what they say>
<next speaker>

@KranthiGV
Copy link

I would love to help with this.

(Looks like several countries do not clearly specify the licensing information of the Hansards)

@baberabb
Copy link
Contributor Author

I would love to help with this.

(Looks like several countries do not clearly specify the licensing information of the Hansards)

Hey! I'm mostly done with the Canadian and UK ones, and yeah haven't been able to get much license information for all others. The Australian one is CC-BY-ND-NC which is out of scope . The Singapore one is also under a limited license iirc.

@KranthiGV
Copy link

I would love to help with this.
(Looks like several countries do not clearly specify the licensing information of the Hansards)

Hey! I'm mostly done with the Canadian and UK ones, and yeah haven't been able to get much license information for all others. The Australian one is CC-BY-ND-NC which is out of scope . The Singapore one is also under a limited license iirc.

Yes, I noticed both Australia and Singapore were out of scope.
Is there anything else you think I can help you with? (even if it is not this specific Hansard task)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external project We will be including this data but the work will be done primarily by someone else.
Projects
None yet
Development

No branches or pull requests

3 participants