Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardised col_name check #133

Merged
merged 7 commits into from
Sep 6, 2024
Merged

Standardised col_name check #133

merged 7 commits into from
Sep 6, 2024

Conversation

rmbielby
Copy link
Contributor

@rmbielby rmbielby commented Aug 27, 2024

Brief overview of changes

I've expanded the ethnicity headers check to allow checks of any standardised col_name that comes under harmonisation rules. This runs off a look-up table of common aliases or sub-strings used in publications that we can check on and their related standard col_names. The test does a scan for these aliases / sub-strings and throws an error if it finds them as non-standard col_names.

Why are these changes being made?

End users currently have a confusing collection of different col_names across different data files. We'd like to improve consistency in the data catalogue.

Detailed description of changes

The initial look-up table is provided in data/harmonised_col_names.csv. This can be added to as more themes fall under the standardised bracket. For now we have:

  • ethnicity
  • SEN
  • Establishment types and phases

The ethnicity_headers() function has been made more generic and renamed as standard_filter_headers().

Lots of the test data failed the more generalised test, so I've gone through and updated all the data files that were now failing to meet the latest standards.

Additional information for reviewers

I've not added any tests for standardised filter items as part of this test as it was already breaking lots of the unit tests as it was, so has become quite a substantial update despite it being one small test that's been updated. Feel like this is about right for a single PR and I can add further tests on a second PR.

Issue ticket number/s and link

@rmbielby rmbielby self-assigned this Aug 27, 2024
@cjrace
Copy link
Collaborator

cjrace commented Aug 28, 2024

@cjrace cjrace assigned cjrace and unassigned rmbielby Aug 29, 2024
R/mainTests.r Outdated Show resolved Hide resolved
R/mainTests.r Outdated Show resolved Hide resolved
data/harmonised_col_names.csv Show resolved Hide resolved
global.R Outdated Show resolved Hide resolved
tests/testthat/mainTests/ethnicity_meets_standards.csv Outdated Show resolved Hide resolved
R/mainTests.r Show resolved Hide resolved
@cjrace cjrace assigned rmbielby and unassigned cjrace Aug 29, 2024
@rmbielby rmbielby assigned cjrace and rmbielby and unassigned rmbielby Aug 30, 2024
@rmbielby rmbielby merged commit 0d23027 into main Sep 6, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants