Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Metadata extraction of erronous values and Agent extraction #154

Merged
merged 4 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions config/schemes/ontology_submission.yml
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ copyrightHolder:
description: [
"SCHEMA: The party holding the legal copyright to the CreativeWork.",
"DCTERMS: A person or organization owning or managing rights over the resource." ]
extractedMetadata: false
extractedMetadata: true

### Description

Expand Down Expand Up @@ -503,7 +503,7 @@ hasCreator:
"DOAP: Maintainer of a project, a project leader.",
"SCHEMA:author: The author of this content or rating.",
"SCHEMA:creator: The creator/author of this CreativeWork." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "omv:hasCreator", "dc:creator", "dcterms:creator", "foaf:maker", "prov:wasAttributedTo", "doap:maintainer", "pav:authoredBy", "pav:createdBy", "schema:author", "schema:creator" ]

#Contributor
Expand All @@ -517,7 +517,7 @@ hasContributor:
"OMV: Contributors to the creation of the ontology.",
"PAV: The resource was contributed to by the given agent.",
"DOAP: Project contributor" ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "omv:hasContributor", "dc:contributor", "dcterms:contributor", "doap:helper", "schema:contributor", "pav:contributedBy" ]

#Curator
Expand All @@ -528,7 +528,7 @@ curatedBy:
description: [
"PAV: Specifies an agent specialist responsible for shaping the expression in an appropriate format. Often the primary agent responsible for ensuring the quality of the representation.",
"MOD: An ontology that is evaluated by an agent." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "mod:evaluatedBy", "pav:curatedBy" ]

#Translator
Expand All @@ -538,7 +538,7 @@ translator:
helpText: "Organization or person who adapts a creative work to different languages."
description: [
"SCHEMA: Organization or person who adapts a creative work to different languages, regional differences and technical requirements of a target market, or that translates during some event." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "schema:translator" ]

#Publisher
Expand All @@ -550,7 +550,7 @@ publisher:
"DCTERMS: An entity responsible for making the resource available.",
"SCHEMA: The publisher of creative work.",
"ADMS: The name of the agency that issued the identifier." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "dc:publisher", "dcterms:publisher", "schema:publisher", "adms:schemaAgency" ]

#Funded or sponsored by
Expand All @@ -562,7 +562,7 @@ fundedBy:
"MOD: An ontology that is sponsored by and developed under a project.",
"FOAF: An organization funding a project or person.",
"SCHEMA: The organization on whose behalf the creator was working." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "foaf:fundedBy", "mod:sponsoredBy", "schema:sourceOrganization" ]

#Endorsed by
Expand All @@ -573,7 +573,7 @@ endorsedBy:
description: [
"MOD: An ontology endorsed by an agent.",
"OMV: The parties that have expressed support or approval to this ontology." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "omv:endorsedBy", "mod:endorsedBy" ]

### Community
Expand Down Expand Up @@ -1429,7 +1429,7 @@ exampleIdentifier:
description: [
"VOID: Example resource of dataset.",
"IDOT: An example identifier used by one item (or record) from a dataset." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "void:exampleResource", "idot:exampleIdentifier" ]

#Key classes
Expand All @@ -1441,7 +1441,7 @@ keyClasses:
"OMV: Representative classes in the ontology.",
"FOAF: The primary topic of some page or document.",
"SCHEMA: Indicates the primary entity described in some page or other CreativeWork." ]
extractedMetadata: false
extractedMetadata: true
metadataMappings: [ "foaf:primaryTopic", "schema:mainEntity", "omv:keyClasses"]

#Metadata vocabulary used
Expand All @@ -1454,7 +1454,7 @@ metadataVoc:
"SCHEMA: Indicates (by URL or string) a particular version of a schema used in some CreativeWork.",
"ADMS: A schema according to which the Asset Repository can provide data about its content, e.g. ADMS.",
"MOD: A vocabulary(ies) that is used and/or referred to create the current ontology." ]
extractedMetadata: false
extractedMetadata: true
enforcedValues: {
"http://w3id.org/nkos/nkostype#classification_schema": "Classification scheme",
"http://www.w3.org/2000/01/rdf-schema#": "RDF Schema (RDFS)",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
ontology_iri = extract_ontology_iri
@submission.version = version_info if version_info
@submission.uri = ontology_iri if ontology_iri
@submission.save

if heavy_extraction
begin
# Extract metadata directly from the ontology
Expand All @@ -23,7 +25,13 @@
logger.error("Error while extracting additional metadata: #{e}")
end
end
@submission.save

if @submission.valid?
@submission.save
else
logger.error("Error while extracting additional metadata: #{@submission.errors}")
@submission = LinkedData::Models::OntologySubmission.find(@submission.id).first.bring_remaining

Check warning on line 33 in lib/ontologies_linked_data/services/submission_process/operations/submission_extract_metadata.rb

View check run for this annotation

Codecov / codecov/patch

lib/ontologies_linked_data/services/submission_process/operations/submission_extract_metadata.rb#L32-L33

Added lines #L32 - L33 were not covered by tests
end
end

def extract_version
Expand Down Expand Up @@ -72,7 +80,7 @@
unless attr_settings[:namespace].nil?
property_to_extract = "#{attr_settings[:namespace].to_s}:#{attr.to_s}"
hash_results = extract_each_metadata(ontology_uri, attr, property_to_extract, logger)
single_extracted = send_value(attr, hash_results) unless hash_results.empty?
single_extracted = send_value(attr, hash_results, logger) unless hash_results.empty?
end

# extracts attribute value from metadata mappings
Expand All @@ -82,20 +90,15 @@
break if single_extracted

hash_mapping_results = extract_each_metadata(ontology_uri, attr, mapping.to_s, logger)
single_extracted = send_value(attr, hash_mapping_results) unless hash_mapping_results.empty?
single_extracted = send_value(attr, hash_mapping_results, logger) unless hash_mapping_results.empty?
end

new_value = value(attr, type)

send_value(attr, old_value) if empty_value?(new_value) && !empty_value?(old_value)
send_value(attr, old_value, logger) if empty_value?(new_value) && !empty_value?(old_value)
end
end

# Set some metadata to default values if nothing extracted
def set_default_metadata

end

def empty_value?(value)
value.nil? || (value.is_a?(Array) && value.empty?) || value.to_s.strip.empty?
end
Expand All @@ -105,31 +108,45 @@
type.eql?(:list) ? Array(val) || [] : val || ''
end

def send_value(attr, value)
def send_value(attr, new_value, logger)
old_val = nil
single_extracted = false


if enforce?(attr, :list)
# Add the retrieved value(s) to the attribute if the attribute take a list of objects
metadata_values = value(attr, :list)
metadata_values = metadata_values.dup
old_val = value(attr, :list)
old_values = old_val.dup
new_values = new_value.values
new_values = new_values.map{ |v| find_or_create_agent(attr, v, logger) }.compact if enforce?(attr, :Agent)


metadata_values.push(*value.values)
old_values.push(*new_values)

@submission.send("#{attr}=", metadata_values.uniq)
@submission.send("#{attr}=", old_values.uniq)
elsif enforce?(attr, :concatenate)
# if multiple value for this attribute, then we concatenate it
# Add the concat at the very end, to easily join the content of the array
metadata_values = value(attr, :string)
metadata_values = metadata_values.split(', ')
new_values = value.values.map { |x| x.to_s.split(', ') }.flatten
old_val = value(attr, :string)
metadata_values = old_val.split(', ')
new_values = new_value.values.map { |x| x.to_s.split(', ') }.flatten

@submission.send("#{attr}=", (metadata_values + new_values).uniq.join(', '))
else
# If multiple value for a metadata that should have a single value: taking one value randomly (the first in the hash)
new_value = new_value.values.first

new_value = find_or_create_agent(attr, nil, logger) if enforce?(attr, :Agent)

@submission.send("#{attr}=", new_value)
single_extracted = true
end

@submission.send("#{attr}=", value.values.first)
return true
unless @submission.valid?
logger.error("Error while extracting metadata for the attribute #{attr}: #{@submission.errors[attr] || @submission.errors}")
new_value&.delete if enforce?(attr, :Agent) && new_value.respond_to?(:delete)
@submission.send("#{attr}=", old_val)
end
false

single_extracted
end

# Return a hash with the best literal value for an URI
Expand Down Expand Up @@ -256,6 +273,16 @@
LinkedData::Models::OntologySubmission.attribute_settings(attr)[:enforce].include?(type)
end

def find_or_create_agent(attr, old_val, logger)
agent = LinkedData::Models::Agent.where(agentType: 'person', name: old_val).first
begin
agent ||= LinkedData::Models::Agent.new(name: old_val, agentType: 'person', creator: @submission.ontology.administeredBy.first).save
rescue
logger.error("Error while extracting metadata for the attribute #{attr}: Can't create Agent #{agent.errors} ")
agent = nil

Check warning on line 282 in lib/ontologies_linked_data/services/submission_process/operations/submission_extract_metadata.rb

View check run for this annotation

Codecov / codecov/patch

lib/ontologies_linked_data/services/submission_process/operations/submission_extract_metadata.rb#L281-L282

Added lines #L281 - L282 were not covered by tests
end
agent
end
end
end
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def process_submission(logger, options = {})

parsed = @submission.ready?(status: %i[rdf])

@submission.extract_metadata(logger, user_params: options[:params], heavy_extraction: extract_metadata?(options))
@submission = @submission.extract_metadata(logger, user_params: options[:params], heavy_extraction: extract_metadata?(options))

@submission.generate_missing_labels(logger) if generate_missing_labels?(options)

Expand Down
3 changes: 3 additions & 0 deletions test/data/ontology_files/agrooeMappings-05-05-2016.owl
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
<rdf:Description rdf:about="http://lirmm.fr/2015/ontology/agroportal_ontology_example.owl">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Ontology"/>
<rdf:type rdf:resource="http://omv.ontoware.org/2005/05/ontology#Ontology"/>

<dc:identifier>http://lirmm.fr/ontology/agroportal_ontology_example.owl</dc:identifier>

<!-- GENERAL DESCRIPTION -->
<omv:name xml:lang="en"> AgroPortal ontology example</omv:name>
Expand Down Expand Up @@ -138,6 +140,7 @@
<doap:maintainer>Huguette Doap</doap:maintainer>

<omv:hasContributor rdf:resource="http://lirmm.fr/2015/resource/vincent"/>
<omv:hasCreator rdf:resource="http://lirmm.fr/2015/resource/vincent"/>
<omv:hasContributor rdf:resource="http://lirmm.fr/2015/resource/anne"/>
<dc:contributor>Benjamine Dessay</dc:contributor>
<dcterms:contributor>Léontine Dessaiterm</dcterms:contributor>
Expand Down
17 changes: 12 additions & 5 deletions test/models/test_ontology_submission.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1108,26 +1108,33 @@ def test_submission_metrics
def test_submission_extract_metadata
2.times.each do |i|
submission_parse("AGROOE", "AGROOE Test extract metadata ontology",
"./test/data/ontology_files/agrooeMappings-05-05-2016.owl", i+1,
"./test/data/ontology_files/agrooeMappings-05-05-2016.owl", i + 1,
process_rdf: true, extract_metadata: true, generate_missing_labels: false)
ont = LinkedData::Models::Ontology.find("AGROOE").first
ont = LinkedData::Models::Ontology.find("AGROOE").first
sub = ont.latest_submission
refute_nil sub

sub.bring_remaining
assert_equal false, sub.deprecated
assert_equal '2015-09-28', sub.creationDate.to_date.to_s
assert_equal '2015-10-01', sub.modificationDate.to_date.to_s
assert_equal "description example, AGROOE is an ontology used to test the metadata extraction, AGROOE is an ontology to illustrate how to describe their ontologies", sub.description
#assert_equal " LIRMM (default name) ", sub.publisher
assert_equal "description example, AGROOE is an ontology used to test the metadata extraction, AGROOE is an ontology to illustrate how to describe their ontologies", sub.description
assert_equal [RDF::URI.new('http://agroportal.lirmm.fr')], sub.identifier
assert_equal ["http://lexvo.org/id/iso639-3/fra", "http://lexvo.org/id/iso639-3/eng"].sort, sub.naturalLanguage.sort
#assert_equal ["Léontine Dessaiterm", "Anne Toulet", "Benjamine Dessay", "Augustine Doap", "Vincent Emonet"].sort, sub.hasContributor.sort
assert_equal [RDF::URI.new("http://lirmm.fr/2015/ontology/door-relation.owl"), RDF::URI.new("http://lirmm.fr/2015/ontology/dc-relation.owl"),
RDF::URI.new("http://lirmm.fr/2015/ontology/dcterms-relation.owl"),
RDF::URI.new("http://lirmm.fr/2015/ontology/voaf-relation.owl"),
RDF::URI.new("http://lirmm.fr/2015/ontology/void-import.owl")
].sort, sub.ontologyRelatedTo.sort




assert_equal ["Agence 007", "Éditions \"La Science en Marche\"", " LIRMM (default name) "].sort, sub.publisher.map { |x| x.bring_remaining.name }.sort
assert_equal ["Alfred DC", "Clement Jonquet", "Gaston Dcterms", "Huguette Doap", "Mirabelle Prov", "Paul Foaf", "Vincent Emonet"].sort, sub.hasCreator.map { |x| x.bring_remaining.name }.sort
assert_equal ["Léontine Dessaiterm", "Anne Toulet", "Benjamine Dessay", "Augustine Doap", "Vincent Emonet"].sort, sub.hasContributor.map { |x| x.bring_remaining.name }.sort
assert_equal 1, LinkedData::Models::Agent.where(name: "Vincent Emonet").count

sub.description = "test changed value"
sub.save
end
Expand Down
Loading