Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage and Turtle inserting at the ressource parsing #15

Closed
4 tasks done
Tracked by #279
syphax-bouazzouni opened this issue Mar 1, 2022 · 3 comments
Closed
4 tasks done
Tracked by #279
Labels

Comments

@syphax-bouazzouni
Copy link

syphax-bouazzouni commented Mar 1, 2022

To reproduce

Use case TAXREF-LD : https://taxref.i3s.unice.fr/~fmichel/taxrefld_singlefile_agropportal.ttl

java -DentityExpansionLimit=2500000 -Xmx10240M -jar /srv/ontoportal/ncbo_cron_deployments/shared/bundle/ruby/2.6.0/bundler/gems/ontologies_linked_data-ac4a68542c33/bin/owlapi-wrapper.jar -m /srv/ontoportal/data/repository/TAXREF-LD/2/taxrefld_singlefile_agropportal.ttl -o /srv/ontoportal/data/repository/TAXREF-LD/2 -r true

We get a java.lang.OutOfMemoryError: Java heap space exception even with our default 10gb of max heap size

Instead with 20gb of max heap size it worked

Solution

Add a configuration variable for setting owlapi_wrapper java heap size (see ncbo#124)

Todo

Questions ?

  • How much heap size should we use ?
  • Is it normal that 10gb isn't enough ?
@syphax-bouazzouni
Copy link
Author

After setting the 20gb java heap size, it's no more an owlapi out of memory error but now a 4store out of memory

Mar  1 12:25:38 agroportal 4store[28359]: httpd.c:598 starting add to http://data.bioontology.org/ontologies/TAXREF-LD/submissions/2 (2179291409 bytes)
Mar  1 12:25:38 agroportal 4s-httpd: 4store[28359]: httpd.c:598 starting add to http://data.bioontology.org/ontologies/TAXREF-LD/submissions/2 (2179291409 bytes)
Mar  1 12:25:43 agroportal 4store[28359]: import.c:167 Fatal error: out of dynamic memory in turtle_lexer__scan_bytes() at 1
Mar  1 12:25:43 agroportal 4s-httpd: 4store[28359]: import.c:167 Fatal error: out of dynamic memory in turtle_lexer__scan_bytes() at 1
Mar  1 12:25:43 agroportal 4store[12682]: httpd.c:1979 child 28359 terminated by signal 11
Mar  1 12:25:43 agroportal 4s-httpd: 4store[12682]: httpd.c:1979 child 28359 terminated by signal 11

@syphax-bouazzouni
Copy link
Author

syphax-bouazzouni commented Apr 15, 2022

State summary

We couldn't parse TAXREF-LD, at first because of the java heap size but we fixed it by increasing it. Now we have an issue with the next step when appending the triples to 4store

Metrics

  • TAXREF-LD size : 870,3 Mb
  • Pasred file size: 1,71 Go Gb
  • Turtle version appended to the triple store: 2.1Gb
  • Parsing time: 11 min
  • Parsing memory usage: 18 Gb

Possible solutions:

Test 1: Disable reasoner

used command

java -DentityExpansionLimit=2500000 -Xmx20480M -jar /srv/ontoportal/ncbo_cron_deployments/shared/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-4b46e62d7bb9/bin/owlapi-wrapper.jar -m /srv/ontoportal/data/repository/TAXREF-LD/1/taxrefld_singlefile_agropportal.ttl -o /srv/ontoportal/data/repository/TAXREF-LD/1 -r false

Result: didn't change anything

Test 2: Separate the triple store inserting from the RDF generation

The idea is to split the final append into multiple smaller insert

After the RDF generation step, we do a "delete and append to triple store"

      def delete_and_append(triples_file_path, logger, mime_type = nil)
        Goo.sparql_data_client.delete_graph(self.id)
        Goo.sparql_data_client.put_triples(self.id, triples_file_path, mime_type)
        logger.info("Triples #{triples_file_path} appended in #{self.id.to_ntriples}")
        logger.flush
      end

In the append triples step, we transform the XRDF to Turtle in a temporary file
Then we do a single "post" request to the triple store containing the turtle file as the request body, below an example of the first lines of this generated file

<http://data.bioontology.org/metadata/prefixIRI> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002020> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002208> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002209> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002439> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002440> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002441> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002442> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002444> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002445> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002455> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002456> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002457> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002458> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002469> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002632> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002633> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002634> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.obolibrary.org/obo/RO_0002635> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/abstract> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/accrualPeriodicity> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/bibliographicCitation> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/contributor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/coverage> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/creator> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/description> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/identifier> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/issued> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/language> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/licence> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/license> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/publisher> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/source> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/spatial> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/subject> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/dc/terms/title> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/ontology/bibo/abstract> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://purl.org/ontology/bibo/doi> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#dataDump> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#objectsTarget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#rootResource> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#sparqlEndpoint> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#subjectsTarget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#subset> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#triples> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#uriSpace> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rdfs.org/ns/void#vocabulary> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rs.tdwg.org/dwc/terms/family> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rs.tdwg.org/dwc/terms/genus> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rs.tdwg.org/dwc/terms/kingdom> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rs.tdwg.org/dwc/terms/order> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rs.tdwg.org/dwc/terms/phylum> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rs.tdwg.org/dwc/terms/subfamily> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://rs.tdwg.org/ontology/voc/TaxonName#nameComplete> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/about> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/author> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/contentUrl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/copyrightHolder> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/datePublished> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/encodingFormat> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/identifier> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/image> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/keywords> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/licence> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/license> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/mainEntityOfPage> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/name> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/propertyID> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/publisher> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/sameAs> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/subjectOf> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/thumbnail> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/url> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://schema.org/value> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://taxref.mnhn.fr/lod/property/vernacularNameXL> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#AnnotationProperty> .
<http://taxref.mnhn.fr/lod/property/vernacularNameXL> <http://purl.org/dc/terms/description> "Relates a taxon to one of its vernacular (common) names in the form of a SKOS extended label"@en .

@syphax-bouazzouni
Copy link
Author

syphax-bouazzouni commented Jul 1, 2022

fixed with ncbo/goo#122
deployed on

  • Testportal
  • Stageportal
  • Agroportal
  • Bioportal

@syphax-bouazzouni syphax-bouazzouni changed the title java.lang.OutOfMemoryError: Java heap space at ressource parsing Memory usage and Turtle inserting at the ressource parsing Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant