Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning, changelog, notices of updates to the data format #11

Open
JoshData opened this issue Apr 27, 2016 · 6 comments
Open

Versioning, changelog, notices of updates to the data format #11

JoshData opened this issue Apr 27, 2016 · 6 comments

Comments

@JoshData
Copy link

A few related questions:

  • Can we assign version numbers to the data format and put those version numbers in the data files so that looking at a file it's possible to tell under what format it was generated?
  • A CHANGELOG.md file at the root of this repo would be a great place to list changes to the data format. Here's an example of a CHANGELOG from one of my projects. Now that the documentation is in Markdown we could just look at the diffs, but a CHANGELOG would be clearer and more explicit.
  • There should be a system in place for handling backward-incompatible data format changes.... before more backward-incompatible changes are made. And definitely before THOMAS goes off-line.
  • How should data users stay informed about data format changes? For now, you could recommend that they 'watch' this repo, and then you can send out announcements just by creating new github issues (since github will email everyone that is watching the repo). That's not ideal for various reasons, but it'll work. A recommendation for how to stay informed should be posted somewhere, e.g. in the README here.

Why to do versioning:

  • I might have a local cache of the bill status files and want to know what data format they conform to / were generated under so that I know how to interpret the XML.
  • A version number might also be the only way to know if the files being provided by GPO have been updated or not after a data format change. Certain changes might not be evident just from looking at the XML, and one would have to guess based on the file last modification date to know if it was updated post-data-format-update.
  • The isByRequest field moved. That was a backward-incompatible change. Anyone that accidentally ingests new files without being aware of the change is likely to then be mis-understanding whether or not bills were introduced by request. A version number inside the XML would solve this so that the data user can be aware when there's a new version that might require updating their application.

How to do versioning:

A date would work well here as a version number. e.g. The 2016-04-27 version would represent the bill status data format as it was on April 27, 2016. The date would be incremented only when the data format actually changes. This gets around deep questions of whether a change should increment a major version number, minor version number, etc etc.

A simple integer (version 201) would also work, as would semantic versioning (1.2.3).

@llaplant
Copy link
Member

Great suggestions! We are planning to implement both a CHANGLOG.md and a <version> element in an upcoming release.

@104PL104
Copy link
Collaborator

@JoshData
Copy link
Author

Could we re-open this? A change log was just one part of my request.

So, for an example of why versioning would be helpful: Right now ~500 files are published according to a previous schema (see #17). Because of a lag in getting those files regenerated, the data files posted aren't all generated in a consistent format. I don't want to have to guess which schema is in use for which file.

@JoshData
Copy link
Author

JoshData commented Aug 29, 2016

Tagging #25 here as an example of why we need to talk more about this. The changelog was posted only after files with the new format were published.

@llaplant
Copy link
Member

A new top-level element <version> is now available in day-forward bill status files. Current value for this field is 1.0.0. Here are two examples https://www.gpo.gov/fdsys/bulkdata/BILLSTATUS/115/sres/BILLSTATUS-115sres62.xml and https://www.gpo.gov/fdsys/bulkdata/BILLSTATUS/115/hr/BILLSTATUS-115hr2100.xml. Version number is provided to GPO by the Library of Congresss, and it is passed through to files on the bulk data repository. Plan is to increment value when data format changes. Element will be available in all bill status bulk data files after reprocessing. Check it out and let us know what you think. We will also be updating documentation including Change Log and User Guide.

@sunilgulabani
Copy link

@JoshData @llaplant Do we have xsd defined for the BillStatus XMLs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants