Skip to content

Releases: dismantl/CaseHarvester

2.0

06 Apr 18:45
2.0
6d9efe2
Compare
Choose a tag to compare

Release notes for Case Harvester version 2.0:

  • New case format parsers:
    • ODYCOSA: Appellate Court of MD (formerly Special Court of Appeals)
    • ODYCOA: Supreme Court of MD (formerly Court of Appeals)
  • Daily collection of newly posted case numbers via the Collector component
  • Added an Orchestrator component (written in Go) and rewrote the Spider and Scraper components in order to bypass the absurd anti-bot protections on Case Search, including DataDome. I'm not going to document their design; if the anti-transparency nerds at the Maryland Judiciary and DataDome want to figure out how it works, they can read the fucking code 🖕🏻🖕🏻
  • Judge info has been added to MDEC formats

Thanks to our current and former sponsors for helping us cover our server costs and funding research and development for bypassing DataDome:

1.2

25 Sep 15:04
1.2
a6e2c9b
Compare
Choose a tag to compare
1.2

Release notes for Case Harvester version 1.2:

  • New case format parsers:
    • K: Circuit Court Criminal Cases
    • DSCP: District Court Civil Citations
    • DSTRAF: District Court Traffic Cases
    • PG: Prince George's County Circuit Court Criminal Cases
    • PGV: Prince George's County Circuit Court Civil Cases
    • MCCR: Montgomery County Criminal Cases
    • MCCI: Montgomery County Civil Cases
  • Monthly exports of all tables to S3 for public download
  • Auto-scale scraper service based on size of scraper queue
  • Added column_metadata table to hold info about column meaning and other attributes used in Case Explorer
  • Unredact civil cases
  • Miscellaneous parser fixes

This release also includes a workaround for a new anti-scraping measure that was added to the Maryland Judiciary Case Search. Worryingly, MJCS now also seems to have a half-completed reCAPTCHA implementation, which, if it were to be fully implemented and deployed, would make scraping significantly harder and would thus be a major blow to transparency of the MD court system.

1.1.1

20 Jul 15:17
1.1.1
8d0e668
Compare
Choose a tag to compare

Bug fix release.

  • last_scrape should be null when no successful scrapes have been made
  • Spider fix for timeout splits

1.1

12 Jul 17:33
1.1
5b55615
Compare
Choose a tag to compare
1.1

Release notes for Case Harvester version 1.1:

  • Added new parser for ODYCIVIL cases (~6.5 million)
  • New and improved scraping schedule
  • Improved concurrency
  • Automatic redaction of defendant information
  • Configurable user-agent
  • Parser executes outside VPC to cut costs
  • Removed scraper failed queue
  • Added template ECS spider task definition
  • Upgraded database engine from Postgres 9.6 to 11.5
  • Replaced BTree indexes with hash indexes
  • Updates/fixes to ODYCRIM, ODYTRAF parsers
  • Uses HTTPS version of MJCS

1.1rc3

06 Jul 21:25
1.1rc3
Compare
Choose a tag to compare
1.1rc3 Pre-release
Pre-release

Bug fixes, see commit log.

1.1rc2

25 May 21:43
1.1rc2
Compare
Choose a tag to compare
1.1rc2 Pre-release
Pre-release

Includes bug fixes and changes how scraping is scheduled as described in the README.

1.0

21 Apr 22:58
961e631
Compare
Choose a tag to compare
1.0

Version: 1.0
Codename: Reaper
Description: First stable release of Case Harvester.
Features:

  • Automated cloud infrastructure deployment via AWS Cloudformation. No servers to maintain.
  • Fully automated and scheduled spidering/scraping
  • Comprehensive command line interface for manual use and testing
  • Tune concurrency and other settings with configuration profiles or environment variables
  • Docker image for easy portability
  • Version history recorded as case details get updated over time
  • Development and production environments
  • Autogenerated database schematic documentation
  • Alembic database versioning
  • Extensible parser class covering six different case formats so far