Intro parameters

Introduction to parameters in OCR-D

(as of writing this article, OCR-D/core is at 2.12.6, OCR-D/spec at 3.9.0)

The actual functionality of OCR-D is implemented in the form of processors, command line tools that adhere to the OCR-D CLI spec. For an overview which processors are available and how to combine them into workflows, see the OCR-D workflow guide.

All OCR-D processors have the same command line interface, meaning they all support the same set of flags and options when invoked. However, processors can define processor-specific settings in their ocrd-tool.json, called parameters. When running a processor, users can specify these parameters with the -p and -P command line options.

Which parameters are supported by a processor?

To find out which parameters are supported by a processor, use the --help flag. For example, for ocrd-tesserocr-recognize, this is the help output:

Usage: ocrd-tesserocr-recognize [OPTIONS]

  Segment and/or recognize text with Tesseract (using annotated derived images, or masking and cropping images from coordinate polygons) on any level of the PAGE hierarchy.

  > Perform layout segmentation and/or text recognition with Tesseract
  > on the workspace.

  > Open and deserialise PAGE input files and their respective images,
  > then iterate over the element hierarchy down to the requested
  > ``textequiv_level`` if it exists and if ``segmentation_level`` is
  > lower (i.e. more granular) or ``none``.

  > Otherwise stop before (i.e. above) ``segmentation_level``. If any
  > segmentation exist at that level already, and ``overwrite_segments``
  > is false, then descend into these segments, else remove them.

  > Set up Tesseract to recognise each segment's image (either from
  > AlternativeImage or cropping the bounding box rectangle and masking
  > it from the polygon outline) with the appropriate mode and
  > ``model``.

  > Next, if there still is a gap between the current level in the PAGE
  > hierarchy and the requested ``textequiv_level``, then iterate down
  > the result hierarchy, adding new segments at each level (as well as
  > reading order references, text line order, reading direction and
  > orientation at the region/table level).

  > Then, at ``textequiv_level``, remove any existing TextEquiv, unless
  > ``overwrite_text`` is false, and add text and confidence results .

  > The special value ``textequiv_level=none`` behaves like ``glyph``,
  > except that no actual text recognition will be performed, only
  > layout analysis (so no ``model`` is needed, and new segmentation is
  > created down to the glyph level).

  > The special value ``segmentation_level=none`` likewise is lowest,
  > i.e. no actual layout analysis will be performed, only text
  > recognition (so existing segmentation is needed down to
  > ``textequiv_level``).

  > Finally, make all higher levels consistent with these text results
  > by concatenation, ordering according to each level's respective
  > readingDirection, textLineOrder, and ReadingOrder, and joining by
  > whitespace as appropriate for each level and according to its
  > Relation/join status.

  > In other words: - If ``segmentation_level=region``, then segment the
  > page into regions   (unless ``overwrite_segments=false``), else
  > iterate existing regions. - If ``textequiv_level=region``, then
  > recognize text in the region,   annotate it, and continue with the
  > next region. Otherwise... - If ``segmentation_level=cell`` or
  > higher,   then segment table regions into text regions (i.e. cells)
  > (unless ``overwrite_segments=false``), else iterate existing cells.
  > - If ``textequiv_level=cell``, then recognize text in the cell,
  > annotate it, and continue with the next cell. Otherwise... - If
  > ``segmentation_level=line`` or higher,   then segment text regions
  > into text lines   (unless ``overwrite_segments=false``), else
  > iterate existing text lines. - If ``textequiv_level=line``, then
  > recognize text in the text lines,   annotate it, and continue with
  > the next line. Otherwise... - If ``segmentation_level=word`` or
  > higher,   then segment text lines into words   (unless
  > ``overwrite_segments=false``), else iterate existing words. - If
  > ``textequiv_level=word``, then recognize text in the words,
  > annotate it, and continue with the next word. Otherwise... - If
  > ``segmentation_level=glyph`` or higher,   then segment words into
  > glyphs   (unless ``overwrite_segments=false``), else iterate
  > existing glyphs. - If ``textequiv_level=glyph``, then recognize text
  > in the glyphs and   continue with the next glyph. Otherwise... -
  > (i.e. ``none``) annotate no text and be done.

  > Note that ``cell`` is an _optional_ level that is only relevant for
  > table regions, not text or other regions.  Also, when segmenting
  > tables in the same run that detects them (via
  > ``segmentation_level=region`` and ``find_tables``), cells will just
  > be 'paragraphs'. In contrast, when segmenting tables that already
  > exist (via ``segmentation_level=cell``), cells will be detected in
  > ``sparse_text`` mode, i.e. as single-line text regions.

  > Thus, ``segmentation_level`` is the entry point level for layout
  > analysis, and setting it to ``none`` makes this processor behave as
  > recognition-only. Whereas ``textequiv_level`` selects the exit point
  > level for segmentation, and setting it to ``none`` makes this
  > processor behave as segmentation-only.

  > All segments above ``segmentation_level`` must already exist, and no
  > segments below ``textequiv_level`` will be newly created.

  > If ``find_tables``, then during region segmentation, also try to
  > detect table blocks and add them as TableRegion, then query the page
  > iterator for paragraphs and add them as TextRegion cells.

  > If ``block_polygons``, then during region segmentation, query
  > Tesseract for polygon outlines instead of bounding boxes for each
  > region. (This is more precise, but due to some path representation
  > errors does not always yield accurate/valid polygons.)

  > If ``sparse_text``, then during region segmentation, attempt to find
  > single-line text blocks in no particular order (Tesseract's page
  > segmentation mode ``SPARSE_TEXT``).

  > Finally, produce new output files by serialising the resulting
  > hierarchy.

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "dpi" [number - 0]
    pixel density in dots per inch (overrides any meta-data in the
    images); disabled when negative
   "padding" [number - 0]
    Extend detected region/cell/line/word rectangles by this many (true)
    pixels, or extend existing region/line/word images (i.e. the
    annotated AlternativeImage if it exists or the higher-level image
    cropped to the bounding box and masked by the polygon otherwise) by
    this many (background/white) pixels on each side before recognition.
   "segmentation_level" [string - "word"]
    Highest PAGE XML hierarchy level to remove existing annotation from
    and detect segments for (before iterating downwards); if ``none``,
    does not attempt any new segmentation; if ``cell``, starts at table
    regions, detecting text regions (cells). Ineffective when lower than
    ``textequiv_level``.
    Possible values: ["region", "cell", "line", "word", "glyph", "none"]
   "textequiv_level" [string - "word"]
    Lowest PAGE XML hierarchy level to re-use or detect segments for and
    add the TextEquiv results to (before projecting upwards); if
    ``none``, adds segmentation down to the glyph level, but does not
    attempt recognition at all; if ``cell``, stops short before text
    lines, adding text of text regions inside tables (cells) or on page
    level only.
    Possible values: ["region", "cell", "line", "word", "glyph", "none"]
   "overwrite_segments" [boolean - false]
    If ``segmentation_level`` is not none, but an element already
    contains segments, remove them and segment again. Otherwise use the
    existing segments of that element.
   "overwrite_text" [boolean - true]
    If ``textequiv_level`` is not none, but a segment already contains
    TextEquivs, remove them and replace with recognised text. Otherwise
    add new text as alternative. (Only the first entry is projected
    upwards.)
   "block_polygons" [boolean - false]
    When detecting regions, annotate polygon coordinates instead of
    bounding box rectangles.
   "find_tables" [boolean - true]
    When detecting regions, recognise tables as table regions
    (Tesseract's ``textord_tabfind_find_tables=1``).
   "sparse_text" [boolean - false]
    When detecting regions, use 'sparse text' page segmentation mode
    (finding as much text as possible in no particular order): only text
    regions, single lines without vertical or horizontal space.
   "raw_lines" [boolean - false]
    When detecting lines, do not attempt additional segmentation
    (baseline+xheight+ascenders/descenders prediction) on line images.
    Can increase accuracy for certain workflows. Disable when line
    segments/images may contain components of more than 1 line, or
    larger gaps/white-spaces.
   "char_whitelist" [string - ""]
    When recognizing text, enumeration of character hypotheses (from the
    model) to allow exclusively; overruled by blacklist if set.
   "char_blacklist" [string - ""]
    When recognizing text, enumeration of character hypotheses (from the
    model) to suppress; overruled by unblacklist if set.
   "char_unblacklist" [string - ""]
    When recognizing text, enumeration of character hypotheses (from the
    model) to allow inclusively.
   "model" [string]
    The tessdata text recognition model to apply (an ISO 639-3 language
    specification or some other basename, e.g. deu-frak or Fraktur).

Default Wiring:
  ['OCR-D-SEG-PAGE', 'OCR-D-SEG-REGION', 'OCR-D-SEG-TABLE', 'OCR-D-SEG-LINE', 'OCR-D-SEG-WORD'] -> ['OCR-D-SEG-REGION', 'OCR-D-SEG-TABLE', 'OCR-D-SEG-LINE', 'OCR-D-SEG-WORD', 'OCR-D-SEG-GLYPH', 'OCR-D-OCR-TESS']

You can find a description of the parameters in the section Parameters. Every parameter (e.g. overwrite_segments) is listed with its name (overwrite_segments), its datatype (boolean - so either true or false), its default value (false) and a description of what the parameter does ("Remove existing layout and text annotation below the TextLine level (regardless of textequiv_level)").

How can I pass parameters to a processor?

There are three ways to pass parameters to a processor:

-P KEY VALUE: set parameters individually
-p JSON_FILE: as a JSON file JSON_FILE
-p JSON_STRING: as literal JSON

Option 1. has been introduced in OCR-D/core v2.11.0 and is currently the recommended way to specify parameters.

Option 2. allows to define the parameters in a JSON file, including #-prefixed comments. This is most useful for processor developers to define and describe sets of parameters.

Option 3. was the preferred way to pass parameters until the introduction of -P KEY VALUE. Its advantage over -p JSON_FILE is that the parameters can be defined ad-hoc on the command line. A major disadvantage is that quoting can become tricky when there's another level of indirection, such as when running a processor within a Docker container.

Can I combine parameter options?

You can combine all variants of parameter passing and both -p and -P are repeatable. This allows for composition, i.e. in the following

ocrd-foo -p defaults.json -P this-param 42

will first read the file defaults.json and parse it as JSON, then override the parameter this-param with the value 42 (a number).

Examples

The following three invocations are functionally equivalent:

echo '{"foo": "bar"}' > param.json
ocrd-foo -p param.json
ocrd-foo -p '{"foo": "bar"}'
ocrd-foo -P foo bar

This illustrates that -P is the most intuitive and therefore recommended way to pass parameters.

Notes on syntax

The -p variants of passing parameters require a well-formed JSON object, that is:

Enclosed in {}
Keys (parameter name) and values (parameter value) separated with :
Keys must be double-quoted ("param-name")
Values must be valid JSON data types:
- string: double-quote (e.g. "some string value")
- number: the digits of the number, decimal separator is . (e.g. 42, 3.1514)
- boolean: true or false
- array: A list of strings, numbers or boolean, separated by , and enclosed in []
- object: The same syntax as for the whole parameter JSON

One extension of JSON we support in OCR-D are #-prefixed comments, i.e. you can describe the parameter JSON with comments like such:

{
  # This is set to true because we're augmenting existing OCR results
  # which may have words already
  "overwrite_segments": true
}

For the -P KEY VALUE variant, these rules apply:

KEY must not be quoted
VALUE can be any of the JSON data types described above
If VALUE is not a valid JSON data type, it is interpreted as a string. That has the advantage that you can write -P param-name string-value instead of -P param-name '"string-value"'. ~

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly