Merge pull request #15 from daniel-tran/0.7.0

Version 0.7.0
daniel-tran · Jan 25, 2023 · 36605f2 · 36605f2
2 parents 1c54b98 + 352ec5b
commit 36605f2
Show file tree

Hide file tree

Showing 60 changed files with 1,006 additions and 170 deletions.
diff --git a/.github/workflows/run_tests.yaml b/.github/workflows/run_tests.yaml
@@ -18,7 +18,7 @@ jobs:
         # Omit macos-latest due to its high cost to run in GitHub Actions.
         # Note that we can also test specific OS versions, e.g. windows-2016
         os: [ubuntu-latest, windows-latest]
-        python-version: ['3.7', '3.8', '3.9']
+        python-version: ['3.8', '3.9', '3.10']
     name: Build for ${{ matrix.os }} with Python ${{ matrix.python-version }}
     steps:
       - name: Checkout
@@ -59,6 +59,8 @@ jobs:
       - name: Run XML unit tests
         run: |
           cd test
+          echo "Running Legacy XML file interface unit tests..."
+          python unit_tests_legacy_xml_file_interface.py
           echo "Running XML file interface unit tests..."
           python unit_tests_xml_file_interface.py
           echo "Running XML Extractor unit tests..."

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,12 @@
 # Change Log
 
+## 0.7.0
+- Refined xml_file_interface to use a more standard XML document structure with easier integration with XSLT
+  - Added `use_legacy_mode` flag to XML Downloader and Extractor to continue using the original behaviour and assist with transitioning to the updated XML file interface
+  - Added legacy_xml_file_interface module for backward compatibility with XML files using the previous (deprecated) document structure
+- Added download timestamp and library version information to output files
+- Drop library support for Python 3.7
+
 ## 0.6.1
 - Fixed an issue where the JSON file interface was writing Unicode characters incorrectly to output files
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -60,6 +60,17 @@ Implementation details specifically geared for performance optimisation, such as
 # Development Guide
 Some parts of the library have a repeatable process for adding on new features and such, which are documented below:
 
+## Introducing breaking changes
+When modifying some behaviour that no longer ensures backward compatibility with the previous release, there are two recommended approaches:
+
+1. Modified behaviour is now the default expectation, so users wanting the old behaviour have to "opt out".
+2. Old behaviour is still the default expectation, so users wanting the modified behaviour have to "opt in".
+
+At minimum, the support for both old and new behaviour should be maintained for at least one minor release. Afterwards, contributors have two options:
+
+1. The old behaviour can be formally removed in the next major version.
+2. The old behaviour is maintained along with the new behaviour into the foreseeable future.
+
 ## Supporting a new translations
 - Add a new translation code to the appropriate method under `meaningless\utilities\common.py`.
 - Add a new test case to `system_tests_bible_translations.py` for the new translation. This is used to validate end-to-end correctness.
@@ -101,7 +112,9 @@ This is the dictionary structure that is passed into `write` and returned from `
   },
   "Info": {
     "Language": "Translation Language",
-    "Translation": "Translation Code"
+    "Translation": "Translation Code",
+    "Timestamp": "Timestamp in ISO 8601 format",
+    "Meaningless": "Version of Meaningless the file was downloaded from"
   }
 }
 ```
diff --git a/README.md b/README.md
@@ -99,15 +99,17 @@ if __name__ == '__main__':
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.yaml` in the current working directory with the following contents:
-```
+Running the above code would produce a file called `Ecclesiastes.yaml` in the current working directory with the approximate contents:
+```yaml
 Ecclesiastes:
   1:
     2: "² “Meaningless! Meaningless!”\n    says the Teacher.\n“Utterly meaningless!\n\
       \    Everything is meaningless.”"
 Info:
   Language: English
   Translation: NIV
+  Timestamp: '0000-00-00T00:00:00.000000+00:00'
+  Meaningless: 0.0.0
 ```
 
 ## YAML Extractor
@@ -144,15 +146,17 @@ if __name__ == '__main__':
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.yaml` in the current working directory with the following contents:
-```python
+Running the above code would produce a file called `Ecclesiastes.yaml` in the current working directory with the approximate contents:
+```yaml
 Ecclesiastes:
   1:
     2: "² “Meaningless! Meaningless!”\n    says the Teacher.\n“Utterly meaningless!\n\
       \    Everything is meaningless.”"
 Info:
   Language: English
   Translation: NIV
+  Timestamp: '0000-00-00T00:00:00.000000+00:00'
+  Meaningless: 0.0.0
   Customised?: true
 ```
 
@@ -167,8 +171,8 @@ if __name__ == '__main__':
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.json` in the current working directory with the following contents:
-```
+Running the above code would produce a file called `Ecclesiastes.json` in the current working directory with the approximate contents:
+```json
 {
   "Ecclesiastes": {
     "1": {
@@ -177,6 +181,8 @@ Running the above code would produce a file called `Ecclesiastes.json` in the cu
   },
   "Info": {
     "Language": "English",
+    "Meaningless": "0.0.0",
+    "Timestamp": "0000-00-00T00:00:00.000000+00:00",
     "Translation": "NIV"
   }
 }
@@ -216,8 +222,8 @@ if __name__ == '__main__':
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.json` in the current working directory with the following contents:
-```python
+Running the above code would produce a file called `Ecclesiastes.json` in the current working directory with the approximate contents:
+```json
 {
   "Ecclesiastes": {
     "1": {
@@ -227,6 +233,8 @@ Running the above code would produce a file called `Ecclesiastes.json` in the cu
   "Info": {
     "Customised?": true,
     "Language": "English",
+    "Meaningless": "0.0.0",
+    "Timestamp": "0000-00-00T00:00:00.000000+00:00",
     "Translation": "NIV"
   }
 }
@@ -243,13 +251,105 @@ if __name__ == '__main__':
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.xml` in the current working directory with the following contents:
+Running the above code would produce a file called `Ecclesiastes.xml` in the current working directory with the approximate contents:
+```xml
+<?xml version="1.0" encoding="utf-8"?>
+<root>
+  <info>
+    <language>English</language>
+    <translation>NIV</translation>
+    <timestamp>0000-00-00T00:00:00.000000+00:00</timestamp>
+    <meaningless>0.0.0</meaningless>
+  </info>
+  <book name="Ecclesiastes" tag="_Ecclesiastes">
+    <chapter number="1" tag="_1">
+      <passage number="2" tag="_2">² “Meaningless! Meaningless!”
+    says the Teacher.
+“Utterly meaningless!
+    Everything is meaningless.”</passage>
+    </chapter>
+  </book>
+</root>
+```
+
+## XML Extractor
+Much like the YAML Extractor, the XML Extractor uses the generated files from the XML Downloader to find passages.
+```python
+from meaningless import XMLExtractor
+
+if __name__ == '__main__':
+    bible = XMLExtractor()
+    passage = bible.get_passage('Ecclesiastes', 1, 2)
+    print(passage)
+```
+Output:
+
+Assuming the XML downloader has already generated an XML file in the current directory called `Ecclesiastes.xml` which contains the book of Ecclesiastes in XML format:
+```
+² “Meaningless! Meaningless!”
+    says the Teacher.
+“Utterly meaningless!
+    Everything is meaningless.”
+```
+
+## XML File Interface
+The XML File Interface is a set of helper methods used to read and write XML files. Unlike the other file interfaces, this is more geared towards the specific document format used by the XML Downloader and Extractor, so you may observe some strange behaviour if you try using this for general purpose XML file interactions.
+```python
+from meaningless import XMLDownloader, xml_file_interface
+
+if __name__ == '__main__':
+    downloader = XMLDownloader()
+    downloader.download_passage('Ecclesiastes', 1, 2)
+    bible = xml_file_interface.read('./Ecclesiastes.xml')
+    bible['Info']['Customised'] = True
+    xml_file_interface.write('./Ecclesiastes.xml', bible)
+```
+Output:
+
+Running the above code would produce a file called `Ecclesiastes.xml` in the current working directory with the approximate contents:
+```xml
+<?xml version="1.0" encoding="utf-8"?>
+<root>
+  <info>
+    <language>English</language>
+    <translation>NIV</translation>
+    <timestamp>0000-00-00T00:00:00.000000+00:00</timestamp>
+    <meaningless>0.0.0</meaningless>
+    <customised>true</customised>
+  </info>
+  <book name="Ecclesiastes" tag="_Ecclesiastes">
+    <chapter number="1" tag="_1">
+      <passage number="2" tag="_2">² “Meaningless! Meaningless!”
+    says the Teacher.
+“Utterly meaningless!
+    Everything is meaningless.”</passage>
+    </chapter>
+  </book>
+</root>
+```
+
+**Note that you are allowed to write badly formed XML documents using this file interface, but they will cause runtime errors in your code upon trying to read and process them.**
+
+## Legacy XML Downloader
+The Legacy XML Downloader is effectively the same as the XML Downloader prior to version 0.7.0.
+```python
+from meaningless import XMLDownloader
+
+if __name__ == '__main__':
+    downloader = XMLDownloader(use_legacy_mode=True)
+    downloader.download_passage('Ecclesiastes', 1, 2)
+```
+Output:
+
+Running the above code would produce a file called `Ecclesiastes.xml` in the current working directory with the approximate contents:
 ```xml
 <?xml version="1.0" encoding="utf-8"?>
 <root>
   <Info>
     <Language>English</Language>
     <Translation>NIV</Translation>
+    <Timestamp>0000-00-00T00:00:00.000000+00:00</Timestamp>
+    <Meaningless>0.0.0</Meaningless>
   </Info>
   <Ecclesiastes>
     <_1>
@@ -268,47 +368,49 @@ Note that the following adjustments are made to the downloaded contents to ensur
 2. All tag names starting with a number are prefixed.
 3. Tags corresponding to book names use a placeholder character for spaces.
 
-## XML Extractor
-Much like the YAML Extractor, the XML Extractor uses the generated files from the XML Downloader to find passages.
+## Legacy XML Extractor
+The Legacy XML Extractor is effectively the same as the XML Downloader prior to version 0.7.0, and as such, only supports processing of XML files from versions prior to 0.7.0 or produced by the Legacy XML File Interface
 ```python
 from meaningless import XMLExtractor
 
 if __name__ == '__main__':
-    bible = XMLExtractor()
+    bible = XMLExtractor(use_legacy_mode=True)
     passage = bible.get_passage('Ecclesiastes', 1, 2)
     print(passage)
 ```
 Output:
 
-Assuming the XML downloader has already generated a XML file in the current directory called `Ecclesiastes.xml` which contains the book of Ecclesiastes in XML format:
+Assuming the Legacy XML downloader has already generated a XML file in the current directory called `Ecclesiastes.xml` which contains the book of Ecclesiastes in XML format:
 ```
 ² “Meaningless! Meaningless!”
     says the Teacher.
 “Utterly meaningless!
     Everything is meaningless.”
 ```
 
-## XML File Interface
-The XML File Interface is a set of helper methods used to read and write XML files. Unlike the other file interfaces, this is more geared towards the XML document format used by the XML Downloader and Extractor, so you may observe some strange behaviour if you try using this for general purpose XML file interactions.
+## Legacy XML File Interface
+The Legacy XML File Interface is a set of helper methods used to read and write XML files using the document structure prior to version 0.7.0. You may observe some strange behaviour if you try using this for general purpose XML file interactions, so it is only recommended for use with files produced by the Legacy XML Downloader.
 ```python
-from meaningless import XMLDownloader, xml_file_interface
+from meaningless import XMLDownloader, legacy_xml_file_interface
 
 if __name__ == '__main__':
-    downloader = XMLDownloader()
+    downloader = XMLDownloader(use_legacy_mode=True)
     downloader.download_passage('Ecclesiastes', 1, 2)
-    bible = xml_file_interface.read('./Ecclesiastes.xml')
+    bible = legacy_xml_file_interface.read('./Ecclesiastes.xml')
     bible['Info']['Customised'] = True
-    xml_file_interface.write('./Ecclesiastes.xml', bible)
+    legacy_xml_file_interface.write('./Ecclesiastes.xml', bible)
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.xml` in the current working directory with the following contents:
+Running the above code would produce a file called `Ecclesiastes.xml` in the current working directory with the approximate contents:
 ```xml
 <?xml version="1.0" encoding="utf-8"?>
 <root>
   <Info>
     <Language>English</Language>
     <Translation>NIV</Translation>
+    <Timestamp>0000-00-00T00:00:00.000000+00:00</Timestamp>
+    <Meaningless>0.0.0</Meaningless>
     <Customised>true</Customised>
   </Info>
   <Ecclesiastes>
@@ -335,13 +437,13 @@ if __name__ == '__main__':
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.csv` in the current working directory with the following contents:
+Running the above code would produce a file called `Ecclesiastes.csv` in the current working directory with the approximate contents:
 ```
-Book,Chapter,Passage,Text,Language,Translation
+Book,Chapter,Passage,Text,Language,Translation,Timestamp,Meaningless
 Ecclesiastes,1,2,"² “Meaningless! Meaningless!”
     says the Teacher.
 “Utterly meaningless!
-    Everything is meaningless.”",English,NIV
+    Everything is meaningless.”",English,NIV,0000-00-00T00:00:00.000000+00:00,0.0.0
 ```
 
 ## CSV Extractor
@@ -365,7 +467,7 @@ Assuming the CSV downloader has already generated a CSV file in the current dire
 ```
 
 ## CSV File Interface
-The CSV File Interface is a set of helper methods used to read and write CSV files. Like the XML File Interface, this is geared towards the CSV document format used by the CSV Downloader and Extractor and cannot be used to add custom attributes to the output file when writing CSV data.
+The CSV File Interface is a set of helper methods used to read and write CSV files. This is geared towards the CSV document format used by the CSV Downloader and Extractor and cannot be used to add custom attributes to the output file when writing CSV data.
 ```python
 from meaningless import CSVDownloader, csv_file_interface
 
@@ -378,13 +480,13 @@ if __name__ == '__main__':
 ```
 Output:
 
-Running the above code would produce a file called `Ecclesiastes.csv` in the current working directory with the following contents:
+Running the above code would produce a file called `Ecclesiastes.csv` in the current working directory with the approximate contents:
 ```
-Book,Chapter,Passage,Text,Language,Translation
+Book,Chapter,Passage,Text,Language,Translation,Timestamp,Meaningless
 Ecclesiastes,1,2,"² “Meaningless! Meaningless!”
     says the Teacher.
 “Utterly meaningless!
-    Everything is meaningless.”",English (EN),NIV
+    Everything is meaningless.”",English (EN),NIV,0000-00-00T00:00:00.000000+00:00,0.0.0
 ```
 
 ## Text searching within files

diff --git a/VERSION.txt b/VERSION.txt
diff --git a/docs/_static/documentation_options.js b/docs/_static/documentation_options.js
@@ -1,6 +1,6 @@
 var DOCUMENTATION_OPTIONS = {
     URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
-    VERSION: '0.6.1',
+    VERSION: '0.7.0',
     LANGUAGE: 'None',
     COLLAPSE_INDEX: false,
     BUILDER: 'html',