Skip to content
Konstantin Baierer edited this page Dec 2, 2020 · 5 revisions

Introduction to parameters in OCR-D

(as of writing this article, OCR-D/core is at 2.12.6, OCR-D/spec at 3.9.0)

The actual functionality of OCR-D is implemented in the form of processors, command line tools that adhere to the OCR-D CLI spec. For an overview which processors are available and how to combine them into workflows, see the OCR-D workflow guide.

All OCR-D processors have the same command line interface, meaning they all support the same set of flags and options when invoked. However, processors can define processor-specific settings in their ocrd-tool.json, called parameters. When running a processor, users can specify these parameters with the -p and -P command line options.

Which parameters are supported by a processor?

To find out which parameters are supported by a processor, use the --help flag. For example, for ocrd-tesserocr-recognize, this is the help output:

Usage: ocrd-tesserocr-recognize [OPTIONS]

  Segment and/or recognize text with Tesseract (using annotated derived images, or masking and cropping images from coordinate polygons) on any level of the PAGE hierarchy.

  > Perform layout segmentation and/or text recognition with Tesseract
  > on the workspace.

  > Open and deserialise PAGE input files and their respective images,
  > then iterate over the element hierarchy down to the requested
  > ``textequiv_level`` if it exists and if ``segmentation_level`` is
  > lower (i.e. more granular) or ``none``.

  > Otherwise stop before (i.e. above) ``segmentation_level``. If any
  > segmentation exist at that level already, and ``overwrite_segments``
  > is false, then descend into these segments, else remove them.

  > Set up Tesseract to recognise each segment's image (either from
  > AlternativeImage or cropping the bounding box rectangle and masking
  > it from the polygon outline) with the appropriate mode and
  > ``model``.

  > Next, if there still is a gap between the current level in the PAGE
  > hierarchy and the requested ``textequiv_level``, then iterate down
  > the result hierarchy, adding new segments at each level (as well as
  > reading order references, text line order, reading direction and
  > orientation at the region/table level).

  > Then, at ``textequiv_level``, remove any existing TextEquiv, unless
  > ``overwrite_text`` is false, and add text and confidence results .

  > The special value ``textequiv_level=none`` behaves like ``glyph``,
  > except that no actual text recognition will be performed, only
  > layout analysis (so no ``model`` is needed, and new segmentation is
  > created down to the glyph level).

  > The special value ``segmentation_level=none`` likewise is lowest,
  > i.e. no actual layout analysis will be performed, only text
  > recognition (so existing segmentation is needed down to
  > ``textequiv_level``).

  > Finally, make all higher levels consistent with these text results
  > by concatenation, ordering according to each level's respective
  > readingDirection, textLineOrder, and ReadingOrder, and joining by
  > whitespace as appropriate for each level and according to its
  > Relation/join status.

  > In other words: - If ``segmentation_level=region``, then segment the
  > page into regions   (unless ``overwrite_segments=false``), else
  > iterate existing regions. - If ``textequiv_level=region``, then
  > recognize text in the region,   annotate it, and continue with the
  > next region. Otherwise... - If ``segmentation_level=cell`` or
  > higher,   then segment table regions into text regions (i.e. cells)
  > (unless ``overwrite_segments=false``), else iterate existing cells.
  > - If ``textequiv_level=cell``, then recognize text in the cell,
  > annotate it, and continue with the next cell. Otherwise... - If
  > ``segmentation_level=line`` or higher,   then segment text regions
  > into text lines   (unless ``overwrite_segments=false``), else
  > iterate existing text lines. - If ``textequiv_level=line``, then
  > recognize text in the text lines,   annotate it, and continue with
  > the next line. Otherwise... - If ``segmentation_level=word`` or
  > higher,   then segment text lines into words   (unless
  > ``overwrite_segments=false``), else iterate existing words. - If
  > ``textequiv_level=word``, then recognize text in the words,
  > annotate it, and continue with the next word. Otherwise... - If
  > ``segmentation_level=glyph`` or higher,   then segment words into
  > glyphs   (unless ``overwrite_segments=false``), else iterate
  > existing glyphs. - If ``textequiv_level=glyph``, then recognize text
  > in the glyphs and   continue with the next glyph. Otherwise... -
  > (i.e. ``none``) annotate no text and be done.

  > Note that ``cell`` is an _optional_ level that is only relevant for
  > table regions, not text or other regions.  Also, when segmenting
  > tables in the same run that detects them (via
  > ``segmentation_level=region`` and ``find_tables``), cells will just
  > be 'paragraphs'. In contrast, when segmenting tables that already
  > exist (via ``segmentation_level=cell``), cells will be detected in
  > ``sparse_text`` mode, i.e. as single-line text regions.

  > Thus, ``segmentation_level`` is the entry point level for layout
  > analysis, and setting it to ``none`` makes this processor behave as
  > recognition-only. Whereas ``textequiv_level`` selects the exit point
  > level for segmentation, and setting it to ``none`` makes this
  > processor behave as segmentation-only.

  > All segments above ``segmentation_level`` must already exist, and no
  > segments below ``textequiv_level`` will be newly created.

  > If ``find_tables``, then during region segmentation, also try to
  > detect table blocks and add them as TableRegion, then query the page
  > iterator for paragraphs and add them as TextRegion cells.

  > If ``block_polygons``, then during region segmentation, query
  > Tesseract for polygon outlines instead of bounding boxes for each
  > region. (This is more precise, but due to some path representation
  > errors does not always yield accurate/valid polygons.)

  > If ``sparse_text``, then during region segmentation, attempt to find
  > single-line text blocks in no particular order (Tesseract's page
  > segmentation mode ``SPARSE_TEXT``).

  > Finally, produce new output files by serialising the resulting
  > hierarchy.

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "dpi" [number - 0]
    pixel density in dots per inch (overrides any meta-data in the
    images); disabled when negative
   "padding" [number - 0]
    Extend detected region/cell/line/word rectangles by this many (true)
    pixels, or extend existing region/line/word images (i.e. the
    annotated AlternativeImage if it exists or the higher-level image
    cropped to the bounding box and masked by the polygon otherwise) by
    this many (background/white) pixels on each side before recognition.
   "segmentation_level" [string - "word"]
    Highest PAGE XML hierarchy level to remove existing annotation from
    and detect segments for (before iterating downwards); if ``none``,
    does not attempt any new segmentation; if ``cell``, starts at table
    regions, detecting text regions (cells). Ineffective when lower than
    ``textequiv_level``.
    Possible values: ["region", "cell", "line", "word", "glyph", "none"]
   "textequiv_level" [string - "word"]
    Lowest PAGE XML hierarchy level to re-use or detect segments for and
    add the TextEquiv results to (before projecting upwards); if
    ``none``, adds segmentation down to the glyph level, but does not
    attempt recognition at all; if ``cell``, stops short before text
    lines, adding text of text regions inside tables (cells) or on page
    level only.
    Possible values: ["region", "cell", "line", "word", "glyph", "none"]
   "overwrite_segments" [boolean - false]
    If ``segmentation_level`` is not none, but an element already
    contains segments, remove them and segment again. Otherwise use the
    existing segments of that element.
   "overwrite_text" [boolean - true]
    If ``textequiv_level`` is not none, but a segment already contains
    TextEquivs, remove them and replace with recognised text. Otherwise
    add new text as alternative. (Only the first entry is projected
    upwards.)
   "block_polygons" [boolean - false]
    When detecting regions, annotate polygon coordinates instead of
    bounding box rectangles.
   "find_tables" [boolean - true]
    When detecting regions, recognise tables as table regions
    (Tesseract's ``textord_tabfind_find_tables=1``).
   "sparse_text" [boolean - false]
    When detecting regions, use 'sparse text' page segmentation mode
    (finding as much text as possible in no particular order): only text
    regions, single lines without vertical or horizontal space.
   "raw_lines" [boolean - false]
    When detecting lines, do not attempt additional segmentation
    (baseline+xheight+ascenders/descenders prediction) on line images.
    Can increase accuracy for certain workflows. Disable when line
    segments/images may contain components of more than 1 line, or
    larger gaps/white-spaces.
   "char_whitelist" [string - ""]
    When recognizing text, enumeration of character hypotheses (from the
    model) to allow exclusively; overruled by blacklist if set.
   "char_blacklist" [string - ""]
    When recognizing text, enumeration of character hypotheses (from the
    model) to suppress; overruled by unblacklist if set.
   "char_unblacklist" [string - ""]
    When recognizing text, enumeration of character hypotheses (from the
    model) to allow inclusively.
   "model" [string]
    The tessdata text recognition model to apply (an ISO 639-3 language
    specification or some other basename, e.g. deu-frak or Fraktur).

Default Wiring:
  ['OCR-D-SEG-PAGE', 'OCR-D-SEG-REGION', 'OCR-D-SEG-TABLE', 'OCR-D-SEG-LINE', 'OCR-D-SEG-WORD'] -> ['OCR-D-SEG-REGION', 'OCR-D-SEG-TABLE', 'OCR-D-SEG-LINE', 'OCR-D-SEG-WORD', 'OCR-D-SEG-GLYPH', 'OCR-D-OCR-TESS']

You can find a description of the parameters in the section Parameters. Every parameter (e.g. overwrite_segments) is listed with its name (overwrite_segments), its datatype (boolean - so either true or false), its default value (false) and a description of what the parameter does ("Remove existing layout and text annotation below the TextLine level (regardless of textequiv_level)").

How can I pass parameters to a processor?

There are three ways to pass parameters to a processor:

  1. -P KEY VALUE: set parameters individually
  2. -p JSON_FILE: as a JSON file JSON_FILE
  3. -p JSON_STRING: as literal JSON

Option 1. has been introduced in OCR-D/core v2.11.0 and is currently the recommended way to specify parameters.

Option 2. allows to define the parameters in a JSON file, including #-prefixed comments. This is most useful for processor developers to define and describe sets of parameters.

Option 3. was the preferred way to pass parameters until the introduction of -P KEY VALUE. Its advantage over -p JSON_FILE is that the parameters can be defined ad-hoc on the command line. A major disadvantage is that quoting can become tricky when there's another level of indirection, such as when running a processor within a Docker container.

Can I combine parameter options?

You can combine all variants of parameter passing and both -p and -P are repeatable. This allows for composition, i.e. in the following

ocrd-foo -p defaults.json -P this-param 42

will first read the file defaults.json and parse it as JSON, then override the parameter this-param with the value 42 (a number).

Examples

The following three invocations are functionally equivalent:

echo '{"foo": "bar"}' > param.json
ocrd-foo -p param.json
ocrd-foo -p '{"foo": "bar"}'
ocrd-foo -P foo bar

This illustrates that -P is the most intuitive and therefore recommended way to pass parameters.

Notes on syntax

The -p variants of passing parameters require a well-formed JSON object, that is:

  • Enclosed in {}
  • Keys (parameter name) and values (parameter value) separated with :
  • Keys must be double-quoted ("param-name")
  • Values must be valid JSON data types:
    • string: double-quote (e.g. "some string value")
    • number: the digits of the number, decimal separator is . (e.g. 42, 3.1514)
    • boolean: true or false
    • array: A list of strings, numbers or boolean, separated by , and enclosed in []
    • object: The same syntax as for the whole parameter JSON

One extension of JSON we support in OCR-D are #-prefixed comments, i.e. you can describe the parameter JSON with comments like such:

{
  # This is set to true because we're augmenting existing OCR results
  # which may have words already
  "overwrite_segments": true
}

For the -P KEY VALUE variant, these rules apply:

  • KEY must not be quoted
  • VALUE can be any of the JSON data types described above
  • If VALUE is not a valid JSON data type, it is interpreted as a string. That has the advantage that you can write -P param-name string-value instead of -P param-name '"string-value"'. ~

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally