Skip to content

ministryofjustice/bichard7-next-data

Repository files navigation

bichard7-next-data

This repository contains (mostly) static data that is directly used by the Bichard7 service.

How is this data used?

The data contained within the output-data folder is automatically published as both an npm package and a Maven artefact by the Release GitHub Actions workflow in this repository. These packages are then referenced as dependencies from other repositories within the Bichard code base (such as bichard7-next-core and bichard7-next).

How is this data versioned and deployed?

All merges and commits to the main branch will cause the release GitHub Action to run, which will:

  1. Increment the patch version number in output-data/package.json (using npm version patch)
  2. Replace the pom.xml file with pom.template.xml, substituting the $PACKAGE_JSON_VERSION variable in the template with the new version number (using envsubst)
  3. Commit the changes to output-data/package.json, output-data/package-lock.json and pom.xml (which will be version number bumps) as the @bichard7 user
  4. Create a git tag labelled with the version number that points at the commit that was just made
  5. Push the commit directly to the main branch. The @bichard7 user has been added as an exception for the branch protection/PR requirements in the settings for this repo, which means it can push commits directly to main.
  6. Build and publish the npm and maven packages
  7. Checkout a copy of the bichard7-next-core repo, update the npm standing data dependency to the exact version number from step 1 (using npm install bichard7-next-data@<version>), and create a PR with the results
  8. Checkout a copy of the bichard7-next repo, update the gradle standing data dependency to the exact version number from step 1 (using sed on the bichard-backend/build.gradle file), and create a PR with the results

This means that every time new commits are added to main, new Maven and npm packages will automatically be published.

If breaking changes are introduced to main, it's advisable to manually bump the major version release number as part of those changes. This follows Semantic Versioning principles.

How is this data updated?

With the exception of the offence code data and organisation unit data, the data in this repository is static and will not be updated automatically. Manual changes can be made to this data directly in the output-data folder, and when the PR containing these changes is merged to main, the release GitHub Actions workflow described above will publish the changes.

The offence code data and organisation unit data are the output of a build process that combines data from multiple sources to produce the final version that is packaged and used by Bichard. These data files in the output-data folder should not be edited directly; these files are just the output of the 'build' process.

Offence Code Data

The output offence code data (output-data/data/offence-code.json) is generated by combining a number of different data sources into one set of offence codes. Input data is:

  • b7-overrides. Any offence code that is referenced in this file must have its offence category set to "B7" so that it is ignored by Bichard. NB: This exists for compatibility with the legacy dataset and should be removed in future.
  • cjs-offences. Offence code data exported from the data standards team and published here
  • pnc-ccjs-cjs-offences. Data exported from the PNC
  • pnld-offences Data exported from the PNLD

To rebuild the offence code data:

# Download the latest external offence code data sources to input-data/
$ npm run download-offence-code-data

# Combine all sources into the final output in output-data/
$ npm run merge-offence-data

Accessing the PNLD website

We use puppeteer to interact with the PNLD in the browser. From time to time we experience differences in the browser through updates to html or broken links so its helpful debugging. To access it in the browser go to PNLD and look in the table bellow to find out where to get the credentials

The function PnldFileDownloader is called by the script download-offence-code-data.ts and needs four environment variables in order to access the PNLD service.

environment variable description
PNLD_USERNAME User name can be found in 1password in the shared vault
PNLD_PASSWORD Password can be found in 1password in the shared vault
PNLD_LOGIN_URL https://www.pnld.co.uk/standard-offence-wording-extracts/
PNLD_DOWNLOAD_URL This depends on the zip file we want to download:
Full extract: https://www.pnld.co.uk/standard-offence-wording-extracts/full-extract
Monthly Update (Current Month): https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract
Monthly Updates (Last Month): https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract-1-month-prior), [Monthly Updates (2 Months Ago): https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract-2-months-prior

Here is an example of what the command would look like to download the full-extract:

PNLD_DOWNLOAD_URL="https://www.pnld.co.uk/standard-offence-wording-extracts/monthly-delta-extract-1-month-prior" PNLD_LOGIN_URL="https://www.pnld.co.uk/standard-offence-wording-extracts/" PNLD_PASSWORD=<PASSWORD> PNLD_USERNAME=<USERNAME> npx ts-node src/download-offence-code-data.ts

Organisation Unit Data

The organisation unit data (output-data/data/organisation-unit.json) is generated by combining Police Service and Court data:

  • Court Organisation Units updated daily by the update standing data GitHub Actions workflow. The source spreadsheet is downloaded from the criminal justice system data standards page. The spreadsheet does not include thirdLevelPsaCode, therefore this data is backfilled from the existing OU records or manually.
  • Police Organisation Units generated from the the PNC spreadsheet.

To rebuild the organisation unit data:

# Download the latest external organisation unit data sources to input-data/
# and combine all sources into the final output in output-data/
$ npm run download-organisation-unit-data

Consistently Formatting Data

In order to make differences between versions of data easy to read, the data should be sorted before committing. This can be done by running ./data-formatter/format.sh from the root of the repository. This will sort the arrays of data by alphabetical attribute name and then output them with their attributes sorted.

Importing new data from the PNC

Follow these steps to import an updated data export from the PNC.

  1. Ask Ben to request a new export (note: at the moment this needs sending from Ben's CJSM as that's the only approved address)
  2. Receive the files
  3. Unzip the files and place the xlsx docs in the root of this project (note: the files are normally named the same way each time, but they should end in .CJS.xlsx and FSCODE.xlsx for them to be found automatically. You can ignore the file ending in ACPO.xlsx)
  4. Run npm run import-pnc-data to convert these xlsx files to JSON
  5. Run npm run merge-offence-data to regenerate the standing data based on these new input files
  6. Make a PR
  7. Delete the xlsx files