`ocrd network` CLI syntax and terminology #1032

kba · 2023-03-29T09:32:23Z

Okay, another note that will potentially be raised soon. Currently, the supported processing-worker CLI is still the old one:
# 1.
$ ocrd processing-worker <processor-name> --queue=<queue-address> --database=<database-address>

# 2.
$ <processor-name> --queue=<queue-address> --database=<database-address>
We (me and @joschrew) are aware that in the spec there is a change that is not adapted here, yet:
# 1. Use ocrd CLI bundled with OCR-D/core
$ ocrd server <processor-name> --type=worker --queue=<queue-address> --database=<database-address>

# 2. Use processor name
$ <processor-name> --server --type=worker --queue=<queue-address> --database=<database-address>
Where the --type can be either: worker (processing worker)
ocrd server <processor-name> --type=worker --queue=<queue-address> --database=<database-address>
or server (the REST API wrapper based on #884).
ocrd server <processor-name> --type=server --database=<database-address>
However, it's a bad idea to extend the processing worker code to support the REST API processing server (aka standalone processing worker that has nothing to do with the bigger Processing Server in the spec and does not need the queues).

Now, try to imagine a good way to explain to the OCR-D community without confusing them:

why the <processor-name> is a server, but is not a server when --type=worker and is referred to with processing-worker and no direct requests can be sent

why the <processor-name> is a server, and actual server when --type=server and is referred to with processing-server

why both are grouped together under ocrd server... or ... --server ... but potentially implemented together under processing_worker.py

why Processing Server (PS-big) and processing servers (PS-small), standalone ocrd processors, are different concepts but both referred to with processing server.

Sounds confusing? It's even more when it has to be implemented in a clean way. So there are 3 main reasons to not adapt that yet:

The priority changed a bit and the higher priority now is to deploy working Processing Server, Workflow Server, and Workspace Server together on a live VM instance, so KITODO can use that to continue their development.

Identify and fix problems that arise when combining the 3 servers above from 2 different repositories (Processing-Server #974 and reference WebAPI impl).

To not complicate the current implementation without first trying to think how to separate concepts properly and avoid potential problems. Ideally:

the standalone processing servers (aka server processing workers) should be implemented as a separate class (name suggestions?), sibling of ProcessingWorker, and both will share the common methods.

the CLI for both should be separated with improved naming conventions.

The text was updated successfully, but these errors were encountered:

MehmedGIT · 2023-03-29T09:42:36Z

Thank you @kba! I have decided even against the --type=worker/server option because it shadows the type in Python... Instead, there are --agent_type=worker/server and --agent_address (when type is server) options. Check the #1030 implementation to see the currently used CLI syntax. I would suggest waiting on this topic till finishing #1030 for a review, then the CLI syntax will be revisited/refactored.

MehmedGIT · 2023-04-05T14:11:57Z

Some updates to this topic that are based on the current state of #1030.

Network modules/agents based on the WebAPI spec:

Deployer Agent (currently used by the Processing Server to deploy other modules)
Processing Server (processing proxy between Processing Workers and Processor Servers)
Processing Worker
Processor Server (Standalone REST OCR-D processor)
RabbitMQ Server (The message broker between the Processing Server and the Processing Workers)
MongoDB
Workflow Server/Endpoint
Workspace Server/Endpoint
(Note: The Workspace/Workflow servers are currently available only here, and not part of the ocrd_network yet)
Client

How are the network modules started, steps with example usage:

ocrd network processing-server /path/to/processing-server-config --address <server-address>
RabbitMQ and MongoDB are deployed on the addresses specified in the processing server configuration file with the aid of the Deployer agent which is currently part of the Processing Server.
The Deployer agent, based on the configuration file, deploys:

Processing Workers:
ocrd-* --type worker --database <address> --queue <address>
Processor Servers:
ocrd-* --type server --database <address> --address <server-address>
Of course, the other variants are supported as well, i.e.:
ocrd network processing-worker <processor-name> --database <address> --queue <address>
ocrd network processor-server <processor-name> --database <address> --address <server-address>

How is the client supposed to be used? There is no clear agreement yet, so consider everything that follows as my initial naive ideas! Before continuing with the Client CLI implementation, we need to decide what to support and what arguments/options to provide.

Currently, as part of #1030, only the processing/processor endpoint is (partially) implemented:
ocrd network client processing processor <processor-name> --address <processing-server-address> ... arguments/options ... where the arguments and options are the well-known ones for the specific ocrd-processor. There are also some extensions (such as --result-queue-name and --callback-url) to allow more flexibility (when the client is another machine).

Then the processing status of a job can be checked with:
ocrd network client processing status <job-id>. The job-id here is returned as a response to the previous request.

Similar to the example above, the client can support:
ocrd network client <workspace/workflow/discovery> ... arguments/options ...

To keep the client CLI usage simple, the addresses of the servers can be configurated through environment variables.
My current suggestion is the OCRD_NETWORK_SERVER_ADDR_* pattern where * can be one of PROCESSING/WORKSPACE/WORKFLOW. Clearly, we need more thoughts here for the standalone Processor Servers.

That's all from my side for now.

bertsky · 2023-04-17T14:32:31Z

Regarding the command line client, IMO it should be as consistent with the existing CLIs as possible. And I would prefer names of operations instead of HTTP mnemonics (POST/GET/PUT), for example:

ocrd network client discovery installation|processors|resources|... (or whatever will be on /discovery)
ocrd network client workspace list|download|upload|remove
ocrd network client workflow list|download|upload|run where the latter should look like ocrd process (i.e. block)
ocrd network client processing process <executable> ... looking like ocrd-<executable> ... (i.e. block)

So no status and no asynchronous / polling. (And that's independent of whether we make the Processor Server API itself blocking or not... the client could still implement a polling loop or callback or whatever is necessary.)

MehmedGIT · 2023-04-17T15:18:21Z

it should be as consistent with the existing CLIs as possible. And I would prefer names of operations instead of HTTP mnemonics (POST/GET/PUT)

Agree. I am also not a big fan of the HTTP mnemonics for the CLI.

Worth noting that the list option can get big really fast if the listing is not done just on user level when there are many users.

bertsky · 2023-04-17T15:31:07Z

Worth noting that the list option can get big really fast if the listing is not done just on user level when there are many users.

Yes, but adding user management like fastapi-users should not be difficult, IMO this is out of scope for core.

bertsky · 2023-07-12T11:19:23Z

@MehmedGIT we currently have the ocrd network client processing processor NAME --address ADDR --agent-type=worker (via publish_to_queue asynchronously) and --agent-type=server (via push_to_processor_server synchronously). The latter also uses the Processing Server – but shouldn't the client in that case try to connect to the Processor Server directly? Or would that be another independent client command (say ocrd network client processor ADDR)?

(Background: @joschrew in OCR-D/ocrd_all#386 started implementing the ocrd_all-based deployment with docker compose for servers and the client CLIs – currently for the Processing Server model only. For the Processor Server model it would not make sense to use the Processing Server at all, but we have no client CLI yet.)

MehmedGIT · 2023-07-12T11:29:26Z

Or would that be another independent client command (say ocrd network client processor ADDR)?

It would be rather that to allow both ways.

For the Processor Server model it would not make sense to use the Processing Server at all, but we have no client CLI yet.

Right, the client CLI should be extended to support that.

replace --agent-type with subcommand, #1032, #1085

kba mentioned this issue Mar 29, 2023

Processing-Server #974

Merged

bertsky changed the title ~~ocrd network CLI syntax~~ ocrd network CLI syntax and terminology Mar 29, 2023

MehmedGIT mentioned this issue Apr 4, 2023

Standalone Processor Server (#884 + #974) #1030

Merged

bertsky mentioned this issue Jun 21, 2023

Allow building with thin module Docker containers OCR-D/ocrd_all#69

Open

kba added a commit that referenced this issue Sep 6, 2023

replace --agent-type with subcommand, #1032, #1085

d2ce099

kba added a commit that referenced this issue Sep 11, 2023

Merge pull request #1087 from OCR-D/worker-cli

d718553

replace --agent-type with subcommand, #1032, #1085

This was referenced Jul 31, 2024

Workflow endpoint of the Processing Server #1083

Merged

ocrd network client workflow: not implemented yet #1264

Open

bertsky mentioned this issue Aug 22, 2024

Extend the network client #1269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ocrd network` CLI syntax and terminology #1032

`ocrd network` CLI syntax and terminology #1032

kba commented Mar 29, 2023

MehmedGIT commented Mar 29, 2023 •

edited

Loading

MehmedGIT commented Apr 5, 2023

bertsky commented Apr 17, 2023

MehmedGIT commented Apr 17, 2023

bertsky commented Apr 17, 2023

bertsky commented Jul 12, 2023 •

edited

Loading

MehmedGIT commented Jul 12, 2023

ocrd network CLI syntax and terminology #1032

ocrd network CLI syntax and terminology #1032

Comments

kba commented Mar 29, 2023

MehmedGIT commented Mar 29, 2023 • edited Loading

MehmedGIT commented Apr 5, 2023

bertsky commented Apr 17, 2023

MehmedGIT commented Apr 17, 2023

bertsky commented Apr 17, 2023

bertsky commented Jul 12, 2023 • edited Loading

MehmedGIT commented Jul 12, 2023

`ocrd network` CLI syntax and terminology #1032

`ocrd network` CLI syntax and terminology #1032

MehmedGIT commented Mar 29, 2023 •

edited

Loading

bertsky commented Jul 12, 2023 •

edited

Loading