Added an example within the documentation for custom readers supporting pandas DataFrames. #707

BenjaminFraser · 2022-08-31T11:15:26Z

Added a new example (Custom_Pandas_Dataloader.py) within the documentation in docs/examples for the definition of custom Readers that support pandas DataFrames.

This allows a wide range of data formats supported by pandas to be taken advantage of for Ground Truth Readers and Detection Readers, without the need manually define custom data ingestion processes for each type, e.g. JSON, XML, Parquet, HDF5, .txt, .zip.

Given its similarity to the requirements of the custom reader documentation example (#354), I've linked this pull request to that, which hopefully is not a problem.

These classes do have the disadvantage of requiring the entire dataset in memory. However, it seems that the ability to directly use pandas DataFrames is a feature several users of Stonesoup have shown interest in, which is understandable given the flexibility and processing functionalities this can provide.

The example in Custom_Pandas_Dataloader.py includes the definitions of DataFrameGroundTruthReader and DataFrameDetectionReader classes. Each of these inherit from the existing GroundTruthReader class, along with a custom defined _DataFrameReader class.

These classes operate similarly to the existing CSVGroundTruthReader and CSVDetectionReader classes, except they take as input a pandas DataFrame already read into memory, rather than a path to .csv file. They also have modified generator functions for producing the time and paths / detections.

These have been useful for some work I've done using Stonesoup for some UAV-based non-cooperative radar research, and so hopefully they are also of value to other members of the community!

Update on progression and fixes to aspects of this PR, as of 22 Oct 22:

Added pandas to dev of setup.cfg.
Updated references to Stone Soup to be consistent - two words throughout documentation.
Added demonstration of ground truth reader after initialisation by outputting first iteration to docs.
Added support for fields already in DateTime format.
Added pandas_reader.py within reader directory with the three new classes: _DataFrameReader, DataFrameGroundTruthReader, and DataFrameDetectionReader. Tests are still to be developed for these (hence failing on the draft commits currently).
Added tests in test_pandas_reader.py within stonesoup/reader/tests.

A point noted with the tests is that there is currently full coverage of all classes defined in pandas_reader.py, however Codecov flags the pandas import check (which raises an import error if pandas is not installed) as failed.

To-do / enhancements:

Take advantage of pandas grouping to make code more efficient (as suggested by Steven below).
Link documentation example to the classes defined within pandas_reader.py, using something such as inspect.getsource.

… definition of custom Readers that support pandas DataFrames. This has the benefit of being able to take advantage of the range of data formats that pandas supports for ground truth and detection data being read into stonesoup. The example includes the custom definition of a DataFrameGroundTruthReader and DataFrameDetectionReader class. Both of these inherit from the GroundTruthReader class, along with a custom defined _DataFrameReader class. Each of these classes supports reading of pandas dataframes that are already read into memory, in a similar way to the CSVGroundTruthReader and CSVDetectionReader [issue 354].

sdhiscocks · 2022-08-31T14:49:11Z

Thanks for the contribution @BenjaminFraser.

I see docs are failing to build due to pandas being missing dependency. If you could add pandas the dev dependencies in setup.py that should resolve it:

Stone-Soup/setup.py

Lines 31 to 35 in 435883a

    
           extras_require={ 
        
               'dev': [ 
        
                   'pytest-flake8', 'pytest-cov', 'pytest-remotedata', 'flake8<5', 
        
                   'Sphinx', 'sphinx_rtd_theme', 'sphinx-gallery>=0.10.1', 'pillow', 'folium', 'plotly', 
        
               ],

It'd be good to have the readers in the main code base (probably with an optional dependency on pandas) so users can easily access them. And also good to keep the example you've created as both a how to use them, but also, in reference to #354, to show how to create custom readers. (Minor issue of if they are modified, we'll have to be sure to update in both places, unless in the example could do something with inspect.getsource)

sdhiscocks · 2022-08-31T14:56:58Z

(Minor issue of if they are modified, we'll have to be sure to update in both places, unless in the example could do something with inspect.getsource)

Or use of Sphinx literalinclude directive, which can add some syntax highlighting.

…, ref pull request dstl#707 failing to build docs.

BenjaminFraser · 2022-08-31T15:13:58Z

That's no problem at all, and including the Readers within the main code base sounds like a good idea! The only sticking point was including it with pandas as an optional dependency, but I'll look into that, which should hopefully be straightforward enough.

I'll take a look later when I have the chance and put together another PR for those points!

sdhiscocks · 2022-08-31T15:17:59Z

The only sticking point was including it with pandas as an optional dependency, but I'll look into that, which should hopefully be straightforward enough.

We've done this before by simply raising an error on importing of dependencies.

Stone-Soup/stonesoup/reader/opensky.py

Lines 5 to 10 in 5276c1b

    
           try: 
        
               import requests 
        
               from requests.compat import urljoin 
        
           except ImportError as error: 
        
               raise ImportError( 
        
                   "Usage of opensky requires the dependency 'requests' is installed. ") from error

codecov · 2022-08-31T15:21:44Z

Codecov Report

Base: 94.81% // Head: 94.84% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (af4c77b) compared to base (f27eaeb).
Patch coverage: 97.33% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #707      +/-   ##
==========================================
+ Coverage   94.81%   94.84%   +0.02%     
==========================================
  Files         169      170       +1     
  Lines        8221     8296      +75     
  Branches     1216     1230      +14     
==========================================
+ Hits         7795     7868      +73     
- Misses        316      318       +2     
  Partials      110      110

Flag	Coverage Δ
integration	`68.50% <0.00%> (-0.63%)`	⬇️
unittests	`92.69% <97.33%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
stonesoup/reader/pandas_reader.py	`97.33% <97.33%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.