Skip to content
Elisabeth Engl edited this page Jul 28, 2020 · 5 revisions

Introduction to parameters in OCR-D

(as of writing this article, OCR-D/core is at 2.12.6, OCR-D/spec at 3.9.0)

The actual functionality of OCR-D is implemented in the form of processors, command line tools that adhere to the OCR-D CLI spec. For an overview which processors are available and how to combine them into workflows, see the OCR-D workflow guide.

All OCR-D processors have the same command line interface, meaning they all support the same set of flags and options when invoked. However, processors can define processor-specific settings in their ocrd-tool.json, called parameters. When running a processor, users can specify these parameters with the -p and -P command line options.

Which parameters are supported by a processor?

To find out which parameters are supported by a procesor, use the --help flag. For example, for ocrd-tesserocr-recognize, this is the help output:

$ ocrd-tesserocr-recognize --help

Usage: ocrd-tesserocr-recognize [OPTIONS]

  Recognize text in lines with Tesseract (using annotated derived images, or masking and cropping images from coordinate polygons)

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "dpi" [number - -1]
    pixel density in dots per inch (overrides any meta-data in the
    images); disabled when negative
   "raw_lines" [boolean - false]
    Do not attempt additional segmentation
    (baseline+xheight+ascenders/descenders prediction) when using line
    images (i.e. when textequiv_level<region). Can increase accuracy for
    certain workflows. Disable when line segments/images may contain
    components of more than 1 line, or larger gaps/white-spaces.
   "textequiv_level" [string - "word"]
    Lowest PAGE XML hierarchy level to add the TextEquiv results to; when
    below `region`, implicitly adds segmentation below the line level,
    but requires existing line segmentation
    Possible values: ["region", "line", "word", "glyph"]
   "char_whitelist" [string - ""]
    Enumeration of character hypotheses (from the model) to allow
    exclusively; overruled by blacklist if set.
   "model" [string]
    tessdata model to apply (an ISO 639-3 language specification or some
    other basename, e.g. deu-frak or Fraktur)
   "overwrite_words" [boolean - false]
    Remove existing layout and text annotation below the TextLine level
    (regardless of textequiv_level).
   "char_blacklist" [string - ""]
    Enumeration of character hypotheses (from the model) to suppress;
    overruled by unblacklist if set.
   "char_unblacklist" [string - ""]
    Enumeration of character hypotheses (from the model) to allow
    inclusively.
   "padding" [number - 0]
    Number of background-filled pixels to add around the line image (i.e.
    the annotated AlternativeImage if it exists or the higher-level
    image cropped to the bounding box and masked by the polygon
    otherwise) on each side before recognition.

Default Wiring:
  ['OCR-D-SEG-BLOCK', 'OCR-D-SEG-LINE', 'OCR-D-SEG-WORD', 'OCR-D-SEG-GLYPH'] -> ['OCR-D-OCR-TESS']

You can find a description of the parameters in the section Parameters. Every parameter (e.g. overwrite_words) is listed with its name (overwrite_words), its datatype (boolean - so either true or false), its default value (false) and a description of what the parameter does ("Remove existing layout and text annotation below the TextLine level (regardless of textequiv_level)").

How can I pass parameters to a processor?

There are three ways to pass parameters to a processor:

  1. -P KEY VALUE: set parameters individually
  2. -p JSON_FILE: as a JSON file JSON_FILE
  3. -p JSON_STRING: as literal JSON

Option 1. has been introduced in OCR-D/core v2.11.0 and is currently the recommended way to specify parameters.

Option 2. allows to define the parameters in a JSON file, including #-prefixed comments. This is most useful for processors to define and describe sets of parameters.

Option 3. was the preferred way to pass parameters until the introduction of -P KEY VALUE. Its advantage over -p JSON_FILE is that the parameters can be defined ad-hoc on the command line. A major disadvantage is that quoting can become tricky when there's another level of indirection, such as when running a processor within a Docker container.

Can I combine parameter options?

You can combine all variants of parameter passing and both -p and -P are repeatable. This allows for composition, i.e. in the following

ocrd-foo -p defaults.json -P this-param 42

will first read the file defaults.json and parse it as JSON, then override the parameter this-param with the value 42 (a number).

Examples

The following three invocations are functionally equivalent:

echo '{"foo": "bar"}' > param.json
ocrd-foo -p param.json
ocrd-foo -p '{"foo": "bar"}'
ocrd-foo -P foo bar

This illustrates that -P is the most intuitive and therefore recommended way to pass parameters.

Notes on syntax

The -p variants of passing parameters require a well-formed JSON object, that is:

  • Enclosed in {}
  • Keys (parameter name) and values (parameter value) separated with :
  • Keys must be double-quoted ("param-name")
  • Values must be valid JSON data types:
    • string: double-quote (e.g. "some string value")
    • number: the digits of the number, decimal separator is . (e.g. 42, 3.1514)
    • boolean: true or false
    • array: A list of strings, numbers or boolean, separated by , and enclosed in []
    • object: The same syntax as for the whole parameter JSON

One extension of JSON we support in OCR-D are #-prefixed comments, i.e. you can describe the parameter JSON with comments like such:

{
  # This is set to true because we're augmenting existing OCR results
  # which may have words already
  "overwrite_words": true
}

For the -P KEY VALUE variant, these rules apply:

  • KEY must not be quoted
  • VALUE can be any of the JSON data types described above
  • If VALUE is not a valid JSON data type, it is interpreted as a string. That has the advantage that you can write -P param-name string-value instead of -P param-name '"string-value"'. ~

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally