Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace broken Travis CI by GitHub action #168

Merged
merged 3 commits into from
Sep 15, 2023
Merged

Replace broken Travis CI by GitHub action #168

merged 3 commits into from
Sep 15, 2023

Conversation

stweil
Copy link
Member

@stweil stweil commented Sep 8, 2023

No description provided.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
It was replaced by a GitHub action, so replace the build status badge, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
@stweil
Copy link
Member Author

stweil commented Sep 8, 2023

The CI currently fails for no obvious reason when running PageConverter.jar to convert an ALTO file (which looks good) to PAGE XML, like it is done manually in this command:

LANG=C.UTF-8 java -jar ../vendor/JPageConverter/PageConverter.jar -source-xml wetzel_reisebegleiter_1901_0021.alto -target-xml out -convert-to LATEST
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.
	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1465)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1013)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:542)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.opti.SchemaParsingConfig.parse(SchemaParsingConfig.java:640)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.opti.SchemaParsingConfig.parse(SchemaParsingConfig.java:696)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.opti.SchemaDOMParser.parse(SchemaDOMParser.java:530)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.getSchemaDocument(XSDHandler.java:2227)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.parseSchema(XSDHandler.java:589)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadSchema(XMLSchemaLoader.java:618)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(XMLSchemaLoader.java:577)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(XMLSchemaLoader.java:543)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory.newSchema(XMLSchemaFactory.java:281)
	at java.xml/javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:612)
	at org.primaresearch.io.xml.XmlValidator.getSchema(XmlValidator.java:55)
	at org.primaresearch.dla.page.io.xml.XmlPageReader.createMainParser(XmlPageReader.java:82)
	at org.primaresearch.dla.page.io.xml.XmlPageReader.parse(XmlPageReader.java:176)
	at org.primaresearch.dla.page.io.xml.XmlPageReader.read(XmlPageReader.java:130)
	at org.primaresearch.dla.page.io.xml.PageXmlInputOutput.readPage(PageXmlInputOutput.java:212)
	at org.primaresearch.dla.page.converter.PageConverter.run(PageConverter.java:230)
	at org.primaresearch.dla.page.converter.PageConverter.main(PageConverter.java:161)

@kba
Copy link
Collaborator

kba commented Sep 11, 2023

I tried the call directly too but for me it fails because

java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.loc.gov/standards/alto/alto.xsd                                                             

because apparently the LoC is using Cloudflare which does not allow calling with the user agent of PageConverter.jar...

If you have an idea how to get past that, I can investigate further, for now I am stuck.

One guess would be that <?xml version="1.0" encoding="UTF-8"?> might be the reason for failing on the first character of the first line.

Copy link
Collaborator

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it looks fine, thanks @stweil.

@stweil
Copy link
Member Author

stweil commented Sep 11, 2023

Strange. Why do you see an IOException for an http URL, although the ALTO file uses an https URL?

The error message which I get with "lineNumber: 1; columnNumber: 1" is misleading. The ALTO input is processed and converted to a PAGE XML file which looks correct. So the SAXParseException occurs after the conversion. Nothing changes if I remove line 1 from the ALTO file.

@stweil
Copy link
Member Author

stweil commented Sep 15, 2023

I'll merge this pull request. The PageConverter issue will be handled separately.

@stweil stweil merged commit 8a67cba into master Sep 15, 2023
2 checks passed
@stweil stweil deleted the ci branch September 15, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants