use threading to run jobmanager loop #614

jdries · 2024-09-05T19:01:41Z

I propose to allow running the jobmanager cycle in a separate thread, allowing for a more clean option to interrupt it.

soxofaan

Overall I'm not sure we should do this with asyncio, as everything (HTTP requests, file IO) are just classic blocking calls currently, so there is not much to gain.

I think in this case working with Python threading will be a bit simpler

soxofaan · 2024-09-12T12:39:38Z

openeo/extra/job_management.py

@@ -252,6 +255,43 @@ def _normalize_df(self, df: pd.DataFrame) -> pd.DataFrame:

        return df

+    def start_job_thread(self,start_job: Callable[[], BatchJob],


Using "thread" in naming and docs might be confusing and setting wrong expectations as asyncio is not about threading but coroutines

it's now converted to use an actual 'Thread' object, so the confusion is gone?

@soxofaan if this is fine now, we can merge and continue with the other PR's

openeo/extra/job_management.py

soxofaan · 2024-09-19T14:39:18Z

tests/extra/test_job_management.py

+            year = int(row["year"])
+            return BatchJob(job_id=f"job-{year}", connection=connection)
+
+        df = manager._normalize_df(df)


shouldn't this normalize_df be handled automatically in the manager?

we lack a good mechanism to initialize a job db correctly, we'll have to come up with something

@soxofaan I introduced initialize_job_db to address this issue.

(using darker and isort)

and avoid infinite wait (by default)

soxofaan · 2024-09-23T10:10:46Z

FYI: I pushed some tweaks (some automatic code style cleanups), but more importantly some tweaks to the sleep/wait logic in 95a4ec7. Feel free to revert

soxofaan · 2024-09-23T10:17:35Z

tests/extra/test_job_management.py

+            return BatchJob(job_id=f"job-{year}", connection=connection)
+
+        job_db = CsvJobDatabase(output_file)
+        manager.initialize_job_db(job_db, df)


#614 (comment):

@soxofaan I introduced initialize_job_db to address this issue.

I see, but I'd think that this initialization is not something the user should be bothered with,
can't this just be done automatically like in run_jobs:

openeo-python-client/openeo/extra/job_management.py

Lines 358 to 363 in 0c2fdc1

if job_db.exists():

# Resume from existing db

_log.info(f"Resuming `run_jobs` from existing {job_db}")

elif df is not None:

df = self._normalize_df(df)

job_db.persist(df)

To initialize a job db, you need a dataframe (once), I try to avoid that, due to the confusion that Victor explained here:
#607 (comment)

db initialization API needs some more thought, which is for another PR

soxofaan · 2024-09-27T10:57:05Z

remove the initialize_job_db concern from the scope of this PR and merged in ffa7be2

related to #614

jdries added 2 commits September 5, 2024 20:59

use asyncio to run jobmanager loop

b8d0059

add clean cancelling

9b624ed

soxofaan reviewed Sep 12, 2024

View reviewed changes

jdries added 2 commits September 13, 2024 17:17

Merge branch 'master' into async_jobmanager

35e5fa5

preliminary conversion to use python threading instead of asyncio

2d44e7e

jdries changed the title ~~use asyncio to run jobmanager loop~~ use threading to run jobmanager loop Sep 15, 2024

jdries marked this pull request as ready for review September 15, 2024 12:02

job manager: changelog

995e530

soxofaan reviewed Sep 19, 2024

View reviewed changes

openeo/extra/job_management.py Show resolved Hide resolved

openeo/extra/job_management.py Outdated Show resolved Hide resolved

soxofaan reviewed Sep 19, 2024

View reviewed changes

jdries and others added 6 commits September 19, 2024 18:22

PR improvements

c7e0544

add API for clean initialization of job db

bfd99e3

Merge remote-tracking branch 'origin/master' into async_jobmanager

0c7899b

PR #614 code style cleanup

1e12682

(using darker and isort)

PR #614 some minor clarifications

1e80314

PR #614 job thread: do micro-sleeps for quick exit

95a4ec7

and avoid infinite wait (by default)

soxofaan reviewed Sep 23, 2024

View reviewed changes

soxofaan added 4 commits September 27, 2024 11:52

Merge branch 'master' into async_jobmanager

d19be05

PR #614 drop initialize_job_db from scope of this PR

219f972

db initialization API needs some more thought, which is for another PR

fixup! PR #614 drop initialize_job_db from scope of this PR

062846c

fixup! PR #614 drop initialize_job_db from scope of this PR

a60e101

soxofaan added a commit that referenced this pull request Sep 27, 2024

PR #614 drop initialize_job_db from scope of this PR

20c453a

db initialization API needs some more thought, which is for another PR

soxofaan closed this Sep 27, 2024

soxofaan deleted the async_jobmanager branch September 27, 2024 10:57

soxofaan mentioned this pull request Sep 27, 2024

Better API for database initialization with job manager #635

Open

soxofaan added a commit that referenced this pull request Sep 27, 2024

job manager: cleaner _job_update_loop

215ebce

related to #614

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use threading to run jobmanager loop #614

use threading to run jobmanager loop #614

jdries commented Sep 5, 2024

soxofaan left a comment

soxofaan Sep 12, 2024

jdries Sep 13, 2024

jdries Sep 19, 2024

soxofaan Sep 19, 2024

jdries Sep 19, 2024

jdries Sep 20, 2024

soxofaan commented Sep 23, 2024

soxofaan Sep 23, 2024

jdries Sep 23, 2024

soxofaan commented Sep 27, 2024

		@@ -252,6 +255,43 @@ def _normalize_df(self, df: pd.DataFrame) -> pd.DataFrame:

		return df

		def start_job_thread(self,start_job: Callable[[], BatchJob],

	if job_db.exists():
	# Resume from existing db
	_log.info(f"Resuming `run_jobs` from existing {job_db}")
	elif df is not None:
	df = self._normalize_df(df)
	job_db.persist(df)

use threading to run jobmanager loop #614

use threading to run jobmanager loop #614

Conversation

jdries commented Sep 5, 2024

soxofaan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soxofaan commented Sep 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soxofaan commented Sep 27, 2024