Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester geonode current #8197

Closed

Conversation

ricardogsilva
Copy link
Member

@ricardogsilva ricardogsilva commented Oct 4, 2021

This PR adds the GeonodeUnifiedHarvesterWorker harvester worker for being able to harvest from remote GeoNode versions 3.2+

Implementation also introduces the GeonodeCurrentHarvester as a harvester class that is able to harvest from recent (3.2+) versions of GeoNode.

GeonodeUnifiedHarvesterWorker is then implemented as a thin layer on top that dispatches to GeonodeLegacyHarvester or GeonodeCurretnHArvester depending on the detected version of the remote GeoNode deployment. Version detection is done by checking the existence of the api/v2/datasets endpoint, as this endpoint has only been introduced on GeoNode after version 3.2 had been released.

This PR also includes a data migration to convert any pre-existing harvesters to use the new unified worker class.

giohappy and others added 30 commits January 20, 2021 09:37
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.2 to 1.26.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.3/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.2...1.26.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Toni <toni.schoenbuchner@csgis.de>
…6881)

* [Fixes GeoNode#6880] Circle CI upload tests fail irregulary

* CircleCI test fix: sometimes expires due to upload timeout in the test environment

* - Avoid infinite loop on upload testing

* Revert "CircleCI test fix: sometimes expires due to upload timeout in the test environment"

This reverts commit 66139fd.

Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>
…de#6911)

* get meaningful document filenames on download

* - Strip extension from document title before slugify it (e.g.: image.jpg instead of imagejpg.jpg)

Co-authored-by: afabiani <alessio.fabiani@gmail.com>
Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
…ng slash at the end of GEOSERVER_LOCATION (GeoNode#6913)

* [Fixes GeoNode#6916] gsimporter.api.NotFound caused by missing trailing slash at the end of GEOSERVER_LOCATION

* [Fixes GeoNode#6916] unit test for GEOSERVER_LOCATION
Bumps [django-cors-headers](https://github.com/adamchainz/django-cors-headers) from 3.6.0 to 3.7.0.
- [Release notes](https://github.com/adamchainz/django-cors-headers/releases)
- [Changelog](https://github.com/adamchainz/django-cors-headers/blob/master/HISTORY.rst)
- [Commits](adamchainz/django-cors-headers@3.6.0...3.7.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [amqp](https://github.com/celery/py-amqp) from 5.0.3 to 5.0.5.
- [Release notes](https://github.com/celery/py-amqp/releases)
- [Changelog](https://github.com/celery/py-amqp/blob/master/Changelog)
- [Commits](celery/py-amqp@v5.0.3...v5.0.5)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pip](https://github.com/pypa/pip) from 21.0 to 21.0.1.
- [Release notes](https://github.com/pypa/pip/releases)
- [Changelog](https://github.com/pypa/pip/blob/master/NEWS.rst)
- [Commits](pypa/pip@21.0...21.0.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [coverage](https://github.com/nedbat/coveragepy) from 5.3.1 to 5.4.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](nedbat/coveragepy@coverage-5.3.1...coverage-5.4)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pytest](https://github.com/pytest-dev/pytest) from 6.2.1 to 6.2.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@6.2.1...6.2.2)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [djangorestframework-gis](https://github.com/openwisp/django-rest-framework-gis) from 0.16 to 0.17.
- [Release notes](https://github.com/openwisp/django-rest-framework-gis/releases)
- [Changelog](https://github.com/openwisp/django-rest-framework-gis/blob/master/CHANGES.rst)
- [Commits](openwisp/django-rest-framework-gis@v0.16.0...v0.17.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… it has… (GeoNode#6923)

* [Fixes GeoNode#6922][REST API v2] Expose the curated thumbnail URL if it has been uploaded

* - Add REST APIs test suite to CircleCI
* [Cleanup and Refactor] Remove QGIS server backend dependencies

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* - Fix LGTM issues
…iddleware

Feature#650 basic auth middleware
@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2021

This pull request introduces 1 alert when merging a9e112a into 0bf76e1 - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2021

This pull request introduces 1 alert when merging 6d47621 into 0bf76e1 - view on LGTM.com

new alerts:

  • 1 for Unused import

@ricardogsilva
Copy link
Member Author

bump @afabiani ready for your review (and hopefully merge)

@afabiani
Copy link
Member

thanks @ricardogsilva will review this ASAP

@ricardogsilva ricardogsilva mentioned this pull request Oct 21, 2021
@afabiani
Copy link
Member

@ricardogsilva I have been testing this PR so far, while it works quite well with the GeoNode Harvester, I'm afraid we broke something with the WMS one instead.

By trying to harvest a WMS dataset I notice that some info are missing, as an instance

"name": "",
"alternate": "",

The WMS link seems to be wrong too

"links": [
                {
                    "extension": "html",
                    "link_type": "OGC:WMS",
                    "name": "OGC WMS:  Service",
                    "mime": "text/html",
                    "url": "http://localhost:8080/geoserver/ows"
                }
            ]

I guess because of this
https://github.com/GeoNode/geonode/pull/8197/files#diff-bfbd859a5f6c4d9fc78fbbefd10f46d5fe34aaa115f3f8641a3e201ff737d955L255-L259

@afabiani
Copy link
Member

Also this change here https://github.com/GeoNode/geonode/pull/8197/files#diff-2bfc7654963f56a9bfa2e71a0716d21ac1e6ef688fdccd9f6631006123cd6b59L73 causes an exception:

Traceback (most recent call last):
  File "/mnt/c/Work/geonode/geonode/services/serviceprocessors/base.py", line 177, in harvest_resource
    harvest_resources.apply_async(args=([resource_id, ], _h_session.pk))
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 569, in apply_async
    return self.apply(args, kwargs, task_id=task_id or uuid(),
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 790, in apply
    ret = tracer(task_id, args, kwargs, request)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 467, in trace_task
    I, R, state, retval = on_error(task_request, exc, uuid)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 450, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/mnt/c/Work/geonode/geonode/harvesting/tasks.py", line 173, in harvest_resources
    harvesting_workflow.apply_async()
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1408, in apply_async
    return self.apply(args, kwargs,
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1427, in apply
    args=(tasks.apply(args, kwargs).get(propagate=propagate),),
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1117, in apply
    return app.GroupResult(group_id, [
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1118, in <listcomp>
    sig.apply(args=args, kwargs=kwargs, **options) for sig, _, _ in tasks
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 186, in apply
    return self.type.apply(args, kwargs, **options)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 790, in apply
    ret = tracer(task_id, args, kwargs, request)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 467, in trace_task
    I, R, state, retval = on_error(task_request, exc, uuid)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 450, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/mnt/c/Work/geonode/geonode/harvesting/tasks.py", line 213, in _harvest_resource
    harvested_resource_info = worker.get_resource(harvestable_resource)
  File "/mnt/c/Work/geonode/geonode/harvesting/harvesters/wms.py", line 315, in get_resource
    distribution=resourcedescriptor.RecordDistribution(
TypeError: __init__() got an unexpected keyword argument 'legend_url'
__init__() got an unexpected keyword argument 'legend_url'
Traceback (most recent call last):
  File "/mnt/c/Work/geonode/geonode/services/serviceprocessors/base.py", line 177, in harvest_resource
    harvest_resources.apply_async(args=([resource_id, ], _h_session.pk))
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 569, in apply_async
    return self.apply(args, kwargs, task_id=task_id or uuid(),
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 790, in apply
    ret = tracer(task_id, args, kwargs, request)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 467, in trace_task
    I, R, state, retval = on_error(task_request, exc, uuid)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 450, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/mnt/c/Work/geonode/geonode/harvesting/tasks.py", line 173, in harvest_resources
    harvesting_workflow.apply_async()
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1408, in apply_async
    return self.apply(args, kwargs,
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1427, in apply
    args=(tasks.apply(args, kwargs).get(propagate=propagate),),
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1117, in apply
    return app.GroupResult(group_id, [
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 1118, in <listcomp>
    sig.apply(args=args, kwargs=kwargs, **options) for sig, _, _ in tasks
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/canvas.py", line 186, in apply
    return self.type.apply(args, kwargs, **options)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/task.py", line 790, in apply
    ret = tracer(task_id, args, kwargs, request)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 467, in trace_task
    I, R, state, retval = on_error(task_request, exc, uuid)
  File "/home/afabiani/.virtualenvs/geonode/lib/python3.8/site-packages/celery/app/trace.py", line 450, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/mnt/c/Work/geonode/geonode/harvesting/tasks.py", line 213, in _harvest_resource
    harvested_resource_info = worker.get_resource(harvestable_resource)
  File "/mnt/c/Work/geonode/geonode/harvesting/harvesters/wms.py", line 315, in get_resource
    distribution=resourcedescriptor.RecordDistribution(
TypeError: __init__() got an unexpected keyword argument 'legend_url'

@afabiani
Copy link
Member

@ricardogsilva so, it looks that most of the issues above have been fixed on the arcgis PR, nevertheless we did lose something on WMS... it looks like the ows_url is not correctly populated. Taking a look at this.

@afabiani
Copy link
Member

@ricardogsilva I commented this on #8229 too, IMHO we should implement def _get_resource_descriptor methods setting the name, alternate and ows_url for the WMSHarvester too.

@afabiani afabiani added this to the 4.0.0 milestone Oct 25, 2021
@afabiani
Copy link
Member

Superseded by #8229

@afabiani afabiani closed this Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed CLA Bot: community license agreement signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants