Skip to content

Releases: zooniverse/aggregation-for-caesar

Version 4.0.0

10 Aug 09:12
73bd8f4
Compare
Choose a tag to compare

New feature

Version 4 of the aggregation code brings the ability to extract and reduce data across major workflow versions. Because of this change the command line interface has been updated with a new way to enter workflow versions:

The old way

Before the minor workflow version was entered as a different flag

panoptes_aggregation config penguin-watch-workflows.csv 6465 -v 52 -m 76

The new way

Now it is entered with the major workflow version under the -v flag

panoptes_aggregation config penguin-watch-workflows.csv 6465 -v 52.76

Specifying a version range

If you want to aggregate across all minor versions of a major workflow version just included the major version in the -v flag

panoptes_aggregation config penguin-watch-workflows.csv 6465 -v 52

If you want to go between specific versions (even between major versions) use the new --min_version and --max_version flags (both flags are inclusive)

panoptes_aggregation config penguin-watch-workflows.csv 6465 --min_version 51.3 --max_version 53.5

New documentation string for panoptes_aggregation config

usage: panoptes_aggregation config [-h] [-d DIR] [-v VERSION]
                                   [--min_version MIN_VERSION]
                                   [--max_version MAX_VERSION] [-k KEYWORDS]
                                   [-vv]
                                   workflow_csv workflow_id
Make configuration files for panoptes data extraction and reduction based on a
workflow export
optional arguments:
  -h, --help            show this help message and exit
Load Workflow Files:
  This file can be exported from a project\'s Data Export tab
  workflow_csv          The csv file containing the workflow data
Workflow ID and version numbers:
  Enter the workflow ID, major version number, and minor version number
  workflow_id           the workflow ID you would like to extract
  -v VERSION, --version VERSION
                        The workflow version to extract. If only a major
                        version is given (e.g. -v 3) all minor versions will
                        be extracted at once. If a minor version is provided
                        (e.g. -v 3.14) only that specific version will be
                        extracted.
  --min_version MIN_VERSION
                        The minimum workflow version to extract (inclusive).
                        This can be provided as either a major version (e.g.
                        --min_version 3) or a major version with a minor
                        version (e.g. --min_version 3.14). If this flag is
                        provided the --version flag will be ignored.
  --max_version MAX_VERSION
                        The maximum workflow version to extract (inclusive).
                        This can be provided as either a major version (e.g.
                        --max_version 3) or a major version with a minor
                        version (e.g. --max_version 3.14). If this flag is
                        provided the --version flag will be ignored.
Other keywords:
  Additional keywords to be passed into the configuration files
  -k KEYWORDS, --keywords KEYWORDS
                        keywords to be passed into the configuration of a task
                        in the form of a json string, e.g. '{"T0":
                        {"dot_freq": "line"} }' (note: double quotes must be
                        used inside the brackets)
Save Config Files:
  The directory to save the configuration files to
  -d DIR, --dir DIR     The directory to save the configuration files to
Other options:
-vv, --verbose increase output verbosity

Version 3.7.0

27 Jun 12:30
a06c573
Compare
Choose a tag to compare

This version adds support for new clustering options for drawing tasks:

  • The OPTICS clustering algorithm is now available for all drawing task types
  • The "intersection over union" metric can be used for all closed shape drawing task types

A full explanation of all the clustering options can be found in the documents: https://aggregation-caesar.zooniverse.org/How_Clustering_Works.html

Version 3.6.0

12 May 13:57
Compare
Choose a tag to compare

Version 3.6.0

This version adds support for the new simple-dropdown task that is part of the new classifier v2.0 on the Zooniverse. Changes were also made to the way dropdown tasks values show up in the auto-config key lookup table. The hash values used in the classification export is now included in the lookup table with the text associated with that dropdown option.

other changes

Various dependency bumps including changing the minimum version of Pandas to 1.0.0.

Version 3.5

17 Feb 10:40
Compare
Choose a tag to compare

Version 3.5

This version release mostly focuses on bumping the package version of various dependencies. Most notably Numpy 1.20 or higher is now required, this is to ensure the package installs correctly with pip and the same version of Numpy is being used to compile the other dependencies (e.g. HDBSCAN) and run the code.

minor updates to optics transcription reducer

  • The default gutter_eps value is now 300.0 to better match the kind of data using this reducer.

Version 3.4.5

22 Sep 08:47
Compare
Choose a tag to compare

Additions

  • New extractor: all_tasks_empty_extractor to check if every task is empty for a single classification
  • New reducer: first_n_true_reducer for checking if the first N items in a boolean list are True

Updates

  • A flagged filed is added to transcription reductions that copies the output of low_consensus field. This is needed for auto line flagging in ALICE
  • Various dependencies have been bumped to their latest versions

Bug fixes

  • The dropdown extractor failed if all classifications were blank, this has been fixed

Version 3.4.4

17 Jun 08:14
Compare
Choose a tag to compare

New features

  • The ability to handle classifier v2.0 subtasks

Bug fixes

  • Text subtasks now work as expected
  • Userify will create a "blank" user if Panoptes gives a user not found error

Version 3.4.3

18 Mar 10:25
Compare
Choose a tag to compare

Another bug fix for the text reducer.

Version 3.4.2

18 Mar 09:49
Compare
Choose a tag to compare

This fixes some bugs with the offline version of the text reducer and brings the output of the text reducer to be more in line with the output of the transcription reducers.

Version 3.3

08 Jan 13:41
Compare
Choose a tag to compare

What's New

Multiprocessing

Multiprocessing is now available for the command line tools, the -c or --cpu_count flag can be used for either the panoptes_aggregation extract or panoptes_aggregation reduce commands. This sets how many cpu cores should be used when processing the CSV files (defaults to 1 core). Using this shows significant speed ups when running with 2 cores, but does not see much improvement in running time when using 3 or more cores.

This option can also be set in the GUI.

Improvements in the running time of the offline reduction code

Before (slow) filters for repeated classifications by a single volunteer were always being run all subjects, the code now checks if there are any repeated classifications before applying the filters.

General bug fixes

Several bug fixes for extractors and reducers based on errors seen in Caesar.

Version 3.2.1

12 Dec 11:40
Compare
Choose a tag to compare

-Lock down all packages
-Fix depreciation warnings