Data Integrity Strategy for Reports #689

lolaodelola · 2023-06-29T10:56:05Z

We need to ensure there's a robust data integrity strategy to ensure the accuracy of the data in reports. Some questions to discuss:

How do we know the data does not currently include errors?
How are we ensuring that an app or database update isn't creating errors in the reports?
What queries are performed before and after a change to ensure anomalies are not present?
How are we double checking that the data in the reports is internally consistent?
- Is the total number of assertion verdicts for all reports generated from a given plan consistent?
Does the number of failing verdicts + number of passing verdicts = total verdicts?

lolaodelola · 2023-08-21T13:58:34Z

Hi @mcking65,

It's been a while since this was discussed but just to remind you this came about due to an issue arising about AT & Browser Combos, and you bringing up to Boaz & I that you wanted to discuss a more robust system for ensuring data integrity.
A lot of these questions have now been addressed internally via the app's testing suite. The only question still possibly requiring discussion (if you feel necessary) is:

What queries are performed before and after a change to ensure anomalies are not present?

This isn't something we currently do but I know in the past you've been involved with discussions with other members of the team about data when database changes are made.

lolaodelola · 2023-08-29T13:11:35Z

@mcking65 Can we make sure this is on tomorrow's agenda please?

css-meeting-bot · 2023-09-27T17:42:44Z

The ARIA-AT Community Group just discussed Issue 689: Data integrity strategy.

The full IRC log of that discussion

<jugglinmike> Topic: Issue 689: Data integrity strategy
<jugglinmike> github: https://github.com//issues/689
<jugglinmike> Matt_King: One of the things that we have to have a lot of confidence in is that all the tallies and counts and information we present in reports is accurate--and that we don't break it
<jugglinmike> Matt_King: When you run a report, the system is going to count up the number of passes and fails, its going to calculate percentages, and it's going to capture dates and times
<jugglinmike> Matt_King: There are an awful lot of ways for things to go wrong in that process
<jugglinmike> Matt_King: And as we transfer data to APG in the form of support tables, I wanted to ask: how are we going to approach making sure that the system doesn't produce errors?
<jugglinmike> Lola_Odelola: Through some of the work that we've already done, some of these questions have already been answered
<jugglinmike> Lola_Odelola: An outstanding question is: do we have any ideas for the types of queries we'd like to preform to make sure there are no anomalies in the data?
<jugglinmike> Lola_Odelola: What kind of anomalies would we want to check before and after a deployment?
<jugglinmike> howard-e: For the most part, I'd want to be able to trust that the tests that are being added--that the system catches problems with those
<jugglinmike> Matt_King: I added quite a long list in the V2 format--a list of checks for the format
<jugglinmike> Matt_King: While I was doing that, though, I wasn't thinking about how mistakes in the plan could introduce inconsistencies in the data
<jugglinmike> Matt_King: There are some checks like, "every AT must have at least one command mapped to every assertion" or something like that
<jugglinmike> Matt_King: And I have a separate issue related to being able to specify that an AT has no command
<jugglinmike> Matt_King: But now, I'm thinking more about the data that's on the "reports" site
<jugglinmike> Matt_King: For instance, the number of assertions which have verdicts--that number shouldn't change after a data migration
<jugglinmike> Matt_King: I think it would also be important to check that for subcategories of the data (e.g. the total number of reports generated from recommended test plans, the total number of recomended test plans)
<jugglinmike> James_Scholes: Are we talking about validating user input? What are we validating against?
<jugglinmike> Matt_King: Against the data before the deployment. This is about ensuring that we maintain data integrity during deployment operations
<jugglinmike> Matt_King: Maybe we need to enumerate the scenerios where we believe data integrity could be compromised
<jugglinmike> Matt_King: I'm assuming that when you save and close in the test runner, that some checks are performed
<jugglinmike> James_Scholes: To what extent are checks like that already present?
<jugglinmike> James_Scholes: for example, during a test run, when a Tester provides results for a single Test, I assume that when they save those results, checks are made to verify that the data is correctly saved
<jugglinmike> Lola_Odelola: I think that part of the issue here (as Matt mentioned earlier), this is an oldish issue, and in the time since it was created, there have been a lot of improvement to the code and the processes
<jugglinmike> Lola_Odela: Now, what we want to identify is: are there scenarios that could cause inconsistent data? We're asking because we have seen inconsistent data in situations we didn't expect
<jugglinmike> s/Odela/Odelola/
<jugglinmike> Lola_Odelola: I'm happy for this to be put on the back burner until something happens in the future where we need to readjust
<jugglinmike> Matt_King: Okay, though I'm still interested in building/documenting a fairly rigorous set of checks to ensure data integrity before and after deployment
<jugglinmike> Matt_King: That is, any time we make changes to the data model
<jugglinmike> Matt_King: For instance, when we do the refactor, we want to make sure that the total number of assertions doesn't change, et. cetera
<jugglinmike> James_Scholes: I'm not necessarily advocating for tabling this discussion, but I do believe that we need to have the existing checks documented before we can have a constructive conversation on the topic
<jugglinmike> Lola_Odelola: That makes sense. We can put something together for the group

lolaodelola · 2023-10-16T17:48:40Z

The action item for me was to compose a list of checks we might already be doing. After review, our tests are predominantly unit tests for actions that can be performed in the app. The level of testing we’d need would be for environment changes which would include pre/post deployment.

We don’t have any tests that check the environment is consistent between deploys.

As the definition of this issue has changed since I first wrote it, @mcking65 would you be able to provide/link to an example where the data changed between deploys?

ccanash · 2024-06-12T21:22:17Z

Closing this as no longer valid @mcking65

lolaodelola added the agenda To be added to community group agenda label Jun 29, 2023

lolaodelola assigned mcking65 Jun 29, 2023

ccanash assigned lolaodelola Oct 5, 2023

ccanash closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Integrity Strategy for Reports #689

Data Integrity Strategy for Reports #689

lolaodelola commented Jun 29, 2023

lolaodelola commented Aug 21, 2023 •

edited

Loading

lolaodelola commented Aug 29, 2023

css-meeting-bot commented Sep 27, 2023

lolaodelola commented Oct 16, 2023

ccanash commented Jun 12, 2024

Data Integrity Strategy for Reports #689

Data Integrity Strategy for Reports #689

Comments

lolaodelola commented Jun 29, 2023

lolaodelola commented Aug 21, 2023 • edited Loading

lolaodelola commented Aug 29, 2023

css-meeting-bot commented Sep 27, 2023

lolaodelola commented Oct 16, 2023

ccanash commented Jun 12, 2024

lolaodelola commented Aug 21, 2023 •

edited

Loading