-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BadZipfile: File is not a zip file #247
Comments
For some additional context, below is a table showing the % of bills that had their bill text properly extracted broken down by congress. So it looks like 22% of bills failed with the BadZipfile error in each congress, except for the 114th which had no failures for some strange reason.
|
I've been seeing something similar and have a patch to delete bad files so they are re-downloaded on the next run. It seems like this is fixing it - it may be lots of temporary glitches. But I'm not entirely sure. I will share my changes when I can (I'm on vacation and am limiting my screen time).
…On December 29, 2019 11:17:38 AM GMT+09:00, dsal1951 ***@***.***> wrote:
For some additional context, below is a table showing the % of bills
that had their bill text properly extracted broken down by congress. So
it looks like **22% of bills failed with the BadZipfile error in each
congress**, except for the 114th which had no failures for some strange
reason.
> 103 0.784871
104 0.784470
105 0.783371
106 0.785635
107 0.772524
108 0.779064
109 0.773925
110 0.784199
111 0.777987
112 0.772009
113 0.777428
114 1.000000
115 0.783106
116 0.767378
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#247 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Sorry for the delay. This seems to be working for me: #255 |
Merged #255, so deeming this resolved. |
I'm trying to pull bill text for previous congresses for some data science research I'm working on. When I try to run the following command
./run govinfo --collections=BILLS --extract=mods,text,xml,pdf --congress=114
I get a BadZipfile error on every bill (example for s29 below). If I try to manually open the package.zip, I end up with the never ending zip -> cpgz -> cycle.
The strange thing is that if I delete all of the text-versions subdirectories and then rerun the same command, it works fine for the vast majority of bills (~90%). I haven't been able to figure out any rhyme or reason to this behavior but can confirm that I've observed it across Mac OSx and Ubuntu as well as many congresses.
The text was updated successfully, but these errors were encountered: