Skip to content

Commit

Permalink
[Fixes #7945] Ingest harvested layer data to geonode (#7947)
Browse files Browse the repository at this point in the history
* Bump urllib3 from 1.26.2 to 1.26.3 (#6908)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.2 to 1.26.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.3/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.2...1.26.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Toni <toni.schoenbuchner@csgis.de>

* [Fixes #6880] Circle CI upload tests fail irregulary (#6881)

* [Fixes #6880] Circle CI upload tests fail irregulary

* CircleCI test fix: sometimes expires due to upload timeout in the test environment

* - Avoid infinite loop on upload testing

* Revert "CircleCI test fix: sometimes expires due to upload timeout in the test environment"

This reverts commit 66139fd.

Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>

* [Fixes #6914] Remove "add to basket" tool for documents and maps (#6915)

* Added malnajdi as contributor

* [Fixes #6910] meaningful filename for document download (#6911)

* get meaningful document filenames on download

* - Strip extension from document title before slugify it (e.g.: image.jpg instead of imagejpg.jpg)

Co-authored-by: afabiani <alessio.fabiani@gmail.com>
Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>

* - CircleCI Upload Tests: trying to reduce more the risk of infinite loop on "wait_for_progress"

* [Fixes #6916] gsimporter.api.NotFound caused by missing trailing slash at the end of GEOSERVER_LOCATION (#6913)

* [Fixes #6916] gsimporter.api.NotFound caused by missing trailing slash at the end of GEOSERVER_LOCATION

* [Fixes #6916] unit test for GEOSERVER_LOCATION

* Bump django-cors-headers from 3.6.0 to 3.7.0 (#6901)

Bumps [django-cors-headers](https://github.com/adamchainz/django-cors-headers) from 3.6.0 to 3.7.0.
- [Release notes](https://github.com/adamchainz/django-cors-headers/releases)
- [Changelog](https://github.com/adamchainz/django-cors-headers/blob/master/HISTORY.rst)
- [Commits](adamchainz/django-cors-headers@3.6.0...3.7.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump amqp from 5.0.3 to 5.0.5 (#6905)

Bumps [amqp](https://github.com/celery/py-amqp) from 5.0.3 to 5.0.5.
- [Release notes](https://github.com/celery/py-amqp/releases)
- [Changelog](https://github.com/celery/py-amqp/blob/master/Changelog)
- [Commits](celery/py-amqp@v5.0.3...v5.0.5)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump pip from 21.0 to 21.0.1 (#6900)

Bumps [pip](https://github.com/pypa/pip) from 21.0 to 21.0.1.
- [Release notes](https://github.com/pypa/pip/releases)
- [Changelog](https://github.com/pypa/pip/blob/master/NEWS.rst)
- [Commits](pypa/pip@21.0...21.0.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump coverage from 5.3.1 to 5.4 (#6903)

Bumps [coverage](https://github.com/nedbat/coveragepy) from 5.3.1 to 5.4.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](nedbat/coveragepy@coverage-5.3.1...coverage-5.4)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump pytest from 6.2.1 to 6.2.2 (#6907)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 6.2.1 to 6.2.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@6.2.1...6.2.2)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump djangorestframework-gis from 0.16 to 0.17 (#6902)

Bumps [djangorestframework-gis](https://github.com/openwisp/django-rest-framework-gis) from 0.16 to 0.17.
- [Release notes](https://github.com/openwisp/django-rest-framework-gis/releases)
- [Changelog](https://github.com/openwisp/django-rest-framework-gis/blob/master/CHANGES.rst)
- [Commits](openwisp/django-rest-framework-gis@v0.16.0...v0.17.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* - Algin setup.cfg to requirements.txt

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has… (#6923)

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has been uploaded

* - Add REST APIs test suite to CircleCI

* [Fixes #6918] Removal of QGIS support (#6919)

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* - Fix LGTM issues

* allow Basic authenticated requests in LOCKDOWN mode

* fix to avoid circular import

* flake8 check fix

* added tests

* [Fixes #6880] Circle CI upload tests fail irregulary (#6881)

* [Fixes #6880] Circle CI upload tests fail irregulary

* CircleCI test fix: sometimes expires due to upload timeout in the test environment

* - Avoid infinite loop on upload testing

* Revert "CircleCI test fix: sometimes expires due to upload timeout in the test environment"

This reverts commit 66139fd.

Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>

* [Fixes #6914] Remove "add to basket" tool for documents and maps (#6915)

* Added malnajdi as contributor

* Bump pip from 21.0 to 21.0.1 (#6900)

Bumps [pip](https://github.com/pypa/pip) from 21.0 to 21.0.1.
- [Release notes](https://github.com/pypa/pip/releases)
- [Changelog](https://github.com/pypa/pip/blob/master/NEWS.rst)
- [Commits](pypa/pip@21.0...21.0.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* - Algin setup.cfg to requirements.txt

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has… (#6923)

* [Fixes #6922][REST API v2] Expose the curated thumbnail URL if it has been uploaded

* - Add REST APIs test suite to CircleCI

* [Fixes #6918] Removal of QGIS support (#6919)

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* [Cleanup and Refactor] Remove QGIS server backend dependencies

* - Fix LGTM issues

* allow Basic authenticated requests in LOCKDOWN mode

* fix to avoid circular import

* - Align to upstream master branch

* [Fixes #7945] Ingest harvested layer data to geonode

* Refactor celery tasks in order to allow performing harvesting of selected harvestable resources

* Implement copying of remote resources for the GeoNode legacy harvester

* Improve harvesting session and the admin

* Ensure `extension` is present for harvested GeoNode documents

* Add missing default attributes to harvested resources

* fixing tests

* fixing tests

* fix migration files conflict

* Renamed `geonode` harvester module to `geonodeharvester`

Hoping to make lgtm happy

* Add migrations to rename geonode harvester module name

* Improve support for harvesting raster datasets and disable harvesting maps

Co-authored-by: Giovanni Allegri <giohappy@gmail.com>
Co-authored-by: allyoucanmap <bovio.stefano@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Toni <toni.schoenbuchner@csgis.de>
Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>
Co-authored-by: Florian Hoedt <gannebamm@gmail.com>
Co-authored-by: Mohammed Y. Alnajdi <mohdnagfy@gmail.com>
Co-authored-by: biegan <bieganowski.rev@gmail.com>
Co-authored-by: Ricardo Garcia Silva <ricardo@kartoza.com>
  • Loading branch information
11 people committed Sep 29, 2021
1 parent bf23880 commit 2164b02
Show file tree
Hide file tree
Showing 23 changed files with 919 additions and 398 deletions.
14 changes: 7 additions & 7 deletions geonode/geoserver/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def ingest(self, files: typing.List[str], /, uuid: str = None, resource_type: ty
user=defaults.get('user', instance.owner),
defaults=defaults,
action_type='create',
importer_session_opts=kwargs)
**kwargs)
return instance

def copy(self, instance: ResourceBase, /, uuid: str = None, owner: settings.AUTH_USER_MODEL = None, defaults: dict = {}) -> ResourceBase:
Expand Down Expand Up @@ -242,8 +242,9 @@ def import_dataset(self, method: str, uuid: str, /, instance: ResourceBase = Non
instance = None
return instance

def _revise_resource_value(self, instance, files: list, user, action_type: str, importer_session_opts: dict = {}):
def _revise_resource_value(self, instance, files: list, user, action_type: str, importer_session_opts: typing.Optional[typing.Dict] = None):
from geonode.upload.files import ALLOWED_EXTENSIONS
session_opts = dict(importer_session_opts) if importer_session_opts is not None else {}

spatial_files_type = get_spatial_files_dataset_type(ALLOWED_EXTENSIONS, files)

Expand All @@ -256,7 +257,7 @@ def _revise_resource_value(self, instance, files: list, user, action_type: str,

_name = instance.get_real_instance().name
if not _name:
_name = importer_session_opts.get('name', None) or os.path.splitext(os.path.basename(spatial_files_type.base_file))[0]
_name = session_opts.get('name', None) or os.path.splitext(os.path.basename(spatial_files_type.base_file))[0]
instance.get_real_instance().name = _name

gs_dataset = None
Expand All @@ -272,8 +273,7 @@ def _revise_resource_value(self, instance, files: list, user, action_type: str,
_workspace = gs_dataset.resource.workspace.name if gs_dataset.resource.workspace else None

if not _workspace:
if importer_session_opts:
_workspace = importer_session_opts.get('workspace', instance.get_real_instance().workspace)
_workspace = session_opts.get('workspace', instance.get_real_instance().workspace)
if not _workspace:
_workspace = instance.get_real_instance().workspace or settings.DEFAULT_WORKSPACE

Expand All @@ -282,7 +282,7 @@ def _revise_resource_value(self, instance, files: list, user, action_type: str,
_dsname = ogc_server_settings.datastore_db['NAME']
_ds = create_geoserver_db_featurestore(store_name=_dsname, workspace=_workspace)
if _ds:
_target_store = importer_session_opts.get('target_store', None) or _dsname
_target_store = session_opts.get('target_store', None) or _dsname

# opening Import session for the selected layer
import_session = gs_uploader.start_import(
Expand Down Expand Up @@ -343,7 +343,7 @@ def _revise_resource_value(self, instance, files: list, user, action_type: str,
store_name=_target_store,
workspace=_workspace
)
transforms = importer_session_opts.get('transforms', None)
transforms = session_opts.get('transforms', None)
if transforms:
task.set_transforms(transforms)
# Starting import process
Expand Down
181 changes: 148 additions & 33 deletions geonode/harvesting/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@
messages,
)
from django.db import transaction
from django.urls import reverse
from django.utils.html import (
format_html,
mark_safe,
)

from . import (
forms,
Expand All @@ -47,19 +52,26 @@ class HarvesterAdmin(admin.ModelAdmin):
"status",
"name",
"scheduling_enabled",
"remote_url",
"remote_available",
"update_frequency",
"harvester_type",
"get_num_harvestable_resources",
"get_num_harvestable_resources_selected",
"get_worker_specific_configuration",
"show_link_to_selected_harvestable_resources",
"show_link_to_latest_harvesting_session",
)
list_filter = (
"status",
)

readonly_fields = (
"status",
"remote_available",
"last_checked_availability",
"last_checked_harvestable_resources",
"last_check_harvestable_resources_message",
"num_harvestable_resources",
"show_link_to_selected_harvestable_resources",
"show_link_to_latest_harvesting_session",
)

list_editable = (
Expand Down Expand Up @@ -99,7 +111,8 @@ def save_model(self, request, obj: models.Harvester, form, change):
(
"Harvester worker specific configuration has been changed. "
"Updating list of this harvester's harvestable "
"resources asynchronously... "
"resources asynchronously. When this is done the harvester "
"status will be set to `ready`. Refresh this page in order to monitor it."
),
level=messages.WARNING
)
Expand Down Expand Up @@ -154,17 +167,13 @@ def update_harvestable_resources(self, request, queryset):
if len(being_updated) > 0:
self.message_user(
request,
f"Updating harvestable resources asynchronously for {being_updated}..."
(
f"Updating harvestable resources asynchronously for {being_updated}. "
f"This operation can take a while to complete. Check the harvesters' "
f"status for when it becomes `ready`"
)
)

@admin.display(description="Number of selected resources to harvest")
def get_num_harvestable_resources_selected(self, harvester: models.Harvester):
return harvester.harvestable_resources.filter(should_be_harvested=True).count()

@admin.display(description="Number of existing harvestable resources")
def get_num_harvestable_resources(self, harvester: models.Harvester):
return harvester.num_harvestable_resources

@admin.action(description="Perform harvesting on selected harvesters")
def perform_harvesting(self, request, queryset):
being_harvested = []
Expand All @@ -173,7 +182,8 @@ def perform_harvesting(self, request, queryset):
if should_continue:
harvester.status = harvester.STATUS_PERFORMING_HARVESTING
harvester.save()
tasks.harvesting_dispatcher.apply_async(args=(harvester.pk,))
harvesting_session = models.HarvestingSession.objects.create(harvester=harvester)
tasks.harvesting_dispatcher.apply_async(args=(harvester.pk, harvesting_session.pk))
being_harvested.append(harvester)
else:
self.message_user(request, error_msg, level=messages.ERROR)
Expand All @@ -182,37 +192,89 @@ def perform_harvesting(self, request, queryset):
self.message_user(
request, f"Performing harvesting asynchronously for {being_harvested}")

@admin.display(description="Number of selected resources to harvest")
def get_num_harvestable_resources_selected(self, harvester: models.Harvester):
return harvester.harvestable_resources.filter(should_be_harvested=True).count()

def _should_act(harvester: models.Harvester) -> typing.Tuple[bool, str]:
if harvester.status != harvester.STATUS_READY:
error_message = (
f"Harvester {harvester!r} is currently busy. Please wait until its status "
f"is {harvester.STATUS_READY!r} before retrying"
)
result = False
else:
available = utils.update_harvester_availability(harvester)
if not available:
error_message = (
f"harvester {harvester!r} is not available, skipping harvesting...")
result = False
@admin.display(description="Number of existing harvestable resources")
def get_num_harvestable_resources(self, harvester: models.Harvester):
return harvester.num_harvestable_resources

@admin.display(description="current worker-specific configuration")
def get_worker_specific_configuration(self, harvester: models.Harvester):
worker = harvester.get_harvester_worker()
worker_config = worker.get_current_config()
result = "<ul>"
for key, value in worker_config.items():
result += format_html("<li>{}: {}</li>", key, value)
result += "</ul>"
return mark_safe(result)

@admin.display(description="Go to selected harvestable resources")
def show_link_to_selected_harvestable_resources(self, harvester: models.Harvester):
num_selected = models.HarvestableResource.objects.filter(
harvester=harvester, should_be_harvested=True).count()
if num_selected > 0:
changelist_uri = reverse("admin:harvesting_harvestableresource_changelist")
result = mark_safe(
format_html(
f'<a class="button grp-button" href="{changelist_uri}?harvester__id__exact={harvester.id}&should_be_harvested__exact=1">Go</a>'
)
)
else:
result = True
error_message = ""
return result, error_message
result = None
return result

@admin.display(description="Go to latest harvesting session")
def show_link_to_latest_harvesting_session(self, harvester: models.Harvester):
latest = models.HarvestingSession.objects.filter(harvester=harvester).latest("started")
changelist_uri = reverse("admin:harvesting_harvestingsession_change", args=(latest.id,))
return mark_safe(
format_html(
f'<a class="button grp-button" href="{changelist_uri}">Go</a>'
)
)


@admin.register(models.HarvestingSession)
class HarvestingSessionAdmin(admin.ModelAdmin):
list_display = (
"id",
"status",
"started",
"updated",
"ended",
"total_records_found",
"records_to_harvest",
"records_harvested",
"calculate_harvesting_progress",
"harvester",
)
readonly_fields = (
"id",
"status",
"started",
"updated",
"ended",
"records_to_harvest",
"records_harvested",
"calculate_harvesting_progress",
"harvester",
"session_details",
)

def has_change_permission(self, request, obj=None):
return False

def has_add_permission(self, request):
return False

@admin.display(description="progress(%)")
def calculate_harvesting_progress(self, harvesting_session: models.HarvestingSession):
if harvesting_session.records_to_harvest == 0:
result = 0
else:
result = int((harvesting_session.records_harvested / harvesting_session.records_to_harvest) * 100)
return result


@admin.register(models.HarvestableResource)
Expand All @@ -224,7 +286,7 @@ class HarvestableResourceAdmin(admin.ModelAdmin):
"last_harvested",
"unique_identifier",
"title",
"harvester",
"show_link_to_harvester",
"should_be_harvested",
"remote_resource_type",
)
Expand Down Expand Up @@ -255,6 +317,7 @@ class HarvestableResourceAdmin(admin.ModelAdmin):

actions = [
"toggle_should_be_harvested",
"harvest_selected_resources",
]

def delete_queryset(self, request, queryset):
Expand Down Expand Up @@ -287,10 +350,62 @@ def toggle_should_be_harvested(self, request, queryset):
self.message_user(
request, "Toggled harvestable resources' `should_be_harvested` attribute")

@admin.action(description="Harvest selected resources")
def harvest_selected_resources(self, request, queryset):
selected_harvestable_resources = {}
for harvestable_resource in queryset:
harvester_resources = selected_harvestable_resources.setdefault(harvestable_resource.harvester, [])
harvester_resources.append(harvestable_resource.id)
for harvester, harvestable_resource_ids in selected_harvestable_resources.items():
should_continue, error_msg = _should_act(harvester)
if should_continue:
harvester.status = models.Harvester.STATUS_PERFORMING_HARVESTING
harvester.save()
harvesting_session = models.HarvestingSession.objects.create(harvester=harvester)
tasks.harvest_resources.apply_async(args=(harvestable_resource_ids, harvesting_session.pk))
self.message_user(
request,
f"Harvesting {len(harvestable_resource_ids)} resources from {harvester.name!r} harvester..."
)
else:
self.message_user(
request,
error_msg,
level=messages.ERROR
)

@admin.display(description="harvester")
def show_link_to_harvester(self, harvestable_resource: models.HarvestableResource):
harvester = harvestable_resource.harvester
uri = reverse("admin:harvesting_harvester_change", args=(harvester.pk,))
return mark_safe(
format_html(
f'<a grp-button" href="{uri}">{harvester.name}</a>'
)
)


def _should_act(harvester: models.Harvester) -> typing.Tuple[bool, str]:
if harvester.status != harvester.STATUS_READY:
error_message = (
f"Harvester {harvester!r} is currently busy. Please wait until its status "
f"is {harvester.STATUS_READY!r} before retrying"
)
result = False
else:
available = utils.update_harvester_availability(harvester)
if not available:
error_message = (
f"harvester {harvester!r} is not available, skipping harvesting...")
result = False
else:
result = True
error_message = ""
return result, error_message


def _worker_config_changed(form) -> bool:
field_name = "harvester_type_specific_configuration"
# original = eval(form.data[f"initial-{field_name}"], {})
original = json.loads(form.data[f"initial-{field_name}"])
cleaned = form.cleaned_data.get(field_name)
return original != cleaned
3 changes: 2 additions & 1 deletion geonode/harvesting/api/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,9 @@ def update(self, instance: models.Harvester, validated_data):
post_update_task = tasks.update_harvestable_resources.signature(
args=(instance.id,))
elif desired_status == models.Harvester.STATUS_PERFORMING_HARVESTING:
harvesting_session = models.HarvestingSession.objects.create(harvester=instance)
post_update_task = tasks.harvesting_dispatcher.signature(
args=(instance.id,))
args=(instance.pk, harvesting_session.pk))
elif desired_status == models.Harvester.STATUS_CHECKING_AVAILABILITY:
post_update_task = tasks.check_harvester_available.signature(
args=(instance.id,))
Expand Down
2 changes: 1 addition & 1 deletion geonode/harvesting/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@


_DEFAULT_HARVESTERS: typing.Final = [
"geonode.harvesting.harvesters.geonode.GeonodeLegacyHarvester",
"geonode.harvesting.harvesters.geonodeharvester.GeonodeLegacyHarvester",
"geonode.harvesting.harvesters.wms.OgcWmsHarvester",
# "geonode.harvesting.harvesters.geonode.GeonodeCswHarvester",
]
Expand Down
Loading

0 comments on commit 2164b02

Please sign in to comment.