Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pakistan_ppra_*: Parse HTML listings? #1014

Open
yolile opened this issue May 20, 2023 · 5 comments
Open

pakistan_ppra_*: Parse HTML listings? #1014

yolile opened this issue May 20, 2023 · 5 comments
Labels
existing spider unavailable The data source is entirely unavailable

Comments

@yolile
Copy link
Member

yolile commented May 20, 2023

https://www.ppra.org.pk/api/index.php/api/release,
https://www.ppra.org.pk/api/index.php/api/records and
https://www.ppra.org.pk/api/index.php/api

return 503

@yolile yolile added existing spider unavailable The data source is entirely unavailable labels May 20, 2023
@yolile
Copy link
Member Author

yolile commented Oct 20, 2023

Note that if you go to https://www.ppra.org.pk/ and then click:
image

There is a list of tenders in OCDS format:
image

But the "download all" button doesn't work.

We could scrape https://www.ppra.org.pk/opendata.asp?PageNo=1 to get the list of links to download, e.g https://www.ppra.org.pk/ocds.asp?id=523047

@jpmckinney
Copy link
Member

Looks like they have 87 pages currently: https://www.ppra.org.pk/opendata.asp?PageNo=87

Retrievable in full by software, either by using an HTML page listing bulk download URLs, or by using machine-readable data as the only input.

They don't really pass this one, as we mean a single HTML page listing (with links to bulk downloads).

If we think there's value, however, we can add it.

@jpmckinney jpmckinney removed the unavailable The data source is entirely unavailable label Apr 10, 2024
@jpmckinney jpmckinney changed the title pakistan_ppra_* no longer available pakistan_ppra_*: Parse HTML listings Apr 10, 2024
@jpmckinney
Copy link
Member

@yolile Should we remove Pakistan? Scraping links from individual HTML pages seems to be the only way (https://www.ppra.org.pk/opendata.asp?PageNo=1) but that doesn't meet our minimum criteria for inclusion in Collect.

If we remove it from Collect, we can delete the Publication in the registry, since it has never succeeded in obtaining a collection.

@jpmckinney jpmckinney changed the title pakistan_ppra_*: Parse HTML listings pakistan_ppra_*: Parse HTML listings? Aug 19, 2024
@jpmckinney jpmckinney added the unavailable The data source is entirely unavailable label Aug 19, 2024
@yolile
Copy link
Member Author

yolile commented Aug 19, 2024

Sounds good.

@allakulov, could you inform Carey about this so that we can decide whether to reach out to Pakistan and try to make them fix this?

@allakulov
Copy link

I have informed Carey and we are following up with PPRA. I'll keep you posted.

@allakulov allakulov reopened this Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
existing spider unavailable The data source is entirely unavailable
Projects
None yet
Development

No branches or pull requests

3 participants