Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/#392_Unpack_gz_files_and_#391follow_redirects #405

Merged
merged 22 commits into from
Jul 28, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
71f22c7
Update docs to be in line with working code
rhazn Jun 28, 2023
1124884
Merge pull request #382 from jvalue/update-docs-381
rhazn Jun 28, 2023
3a13303
Added 'followRedirects' property to http-extractor-meta and added res…
A-M-A-X Jul 17, 2023
e97b0a6
changed formatting
A-M-A-X Jul 17, 2023
a83e6f6
initial commit. ExampleJV was only initially set up
A-M-A-X Jul 18, 2023
561c088
Merge branch 'feature/391-HTTPExtractor-should-follow-redirects' into…
A-M-A-X Jul 18, 2023
6149013
Added gzip decompress to archive-interpreter and fixed issue with htt…
A-M-A-X Jul 18, 2023
7b963ad
added to description, format and tests
A-M-A-X Jul 18, 2023
a23642a
removed unused imports
A-M-A-X Jul 18, 2023
1eb6e1b
Merge pull request #1 from A-M-A-X/feature/392-Unpack-gz-files
A-M-A-X Jul 18, 2023
b119c5e
Merge remote-tracking branch 'jvalue/jayvee/origin/dev'
A-M-A-X Jul 21, 2023
5f55fb9
formatted code
A-M-A-X Jul 21, 2023
8d0d124
format
A-M-A-X Jul 21, 2023
9fe5645
removed jv example files
A-M-A-X Jul 21, 2023
88d480a
formatted and refactored code
A-M-A-X Jul 26, 2023
35aef3f
refactored to reduce repetition in code
A-M-A-X Jul 26, 2023
b1f1b0d
fixed error in previous commit
A-M-A-X Jul 26, 2023
5194dbe
.
A-M-A-X Jul 26, 2023
be91eda
added missing type to variable
A-M-A-X Jul 26, 2023
729788b
added pr feedback - simplified createAndPutFile to createFileFromArch…
A-M-A-X Jul 28, 2023
9499a52
Removed whitespaces
A-M-A-X Jul 28, 2023
d705a2b
Merge remote-tracking branch 'DevTest/dev'
A-M-A-X Jul 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions example/materials.jv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need new examples in the Jayvee repository right now, it would be better to move this to the SWC repo.

Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
// SPDX-FileCopyrightText: 2023 Friedrich-Alexander-Universitat Erlangen-Nurnberg
//
// SPDX-License-Identifier: AGPL-3.0-only

pipeline MaterialsDatabasePipeline {

block MaterialsDatabaseExtractor oftype HttpExtractor {
url: "https://figshare.com/ndownloader/files/31626647";
followRedirects: true;
}

block ZipArchiveInterpreter oftype ArchiveInterpreter {
archiveType: "zip";
}

block MaterialsDatabaseCSVPicker oftype FilePicker {
path: "/Databases/Combined/Combined_YieldStrength_GrainSize_Database.csv";
}

block MaterialsDatabaseTextFileInterpreter oftype TextFileInterpreter {

}

block MaterialsDatabaseCSVInterpreter oftype CSVInterpreter {
delimiter: ",";
enclosing: '"';
enclosingEscape: '"';
}

block MaterialsDatabaseTableInterpreter oftype TableInterpreter {
header: true;
columns: [
"Compound" oftype text,
"Blacklisted Compound?" oftype text,
"Yield Strength Value" oftype text,
"Yield Strength Unit" oftype text,
"Grain Size Value" oftype text,
"Grain Size Unit" oftype text,
"DOI" oftype text,
"Open Access" oftype text,
];
}

block MaterialsDatabaseLoader oftype SQLiteLoader {
table: "MaterialsDatabase";
file: "./MaterialsDatabase.sqlite";
}

MaterialsDatabaseExtractor
-> ZipArchiveInterpreter
-> MaterialsDatabaseCSVPicker
-> MaterialsDatabaseTextFileInterpreter
-> MaterialsDatabaseCSVInterpreter
-> MaterialsDatabaseTableInterpreter
-> MaterialsDatabaseLoader;
}
48 changes: 48 additions & 0 deletions example/thermoelectricMaterials.jv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need new examples in the Jayvee repository right now, it would be better to move this to the SWC repo.

Also please don't provide examples that do not work. These files are meant as documentation for users of Jayvee, not to test implementation ;).

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// SPDX-FileCopyrightText: 2023 Friedrich-Alexander-Universitat Erlangen-Nurnberg
//
// SPDX-License-Identifier: AGPL-3.0-only

pipeline ThermoelectricMaterialsPipeline {

block ThermoelectricMaterialsExtractor oftype HttpExtractor {
url: "https://figshare.com/ndownloader/files/28333845";
followRedirects: true;
}

block ThermoelectricMaterialsArchiveInterpreter oftype ArchiveInterpreter {
archiveType: "gz";
}

block ThermoelectricMaterialsFilePicker oftype FilePicker {
path: "/ucsb_thermoelectrics.json";
}

block ThermoelectricMaterialsTextFileInterpreter oftype TextFileInterpreter {

}

// Not working from here on forward, as json file is received and json interpreter is not yet implemented
block ThermoelectricMaterialsJSONInterpreter oftype CSVInterpreter {
enclosing: '"';
}

block ThermoelectricMaterialsTableInterpreter oftype TableInterpreter {
header: true;
columns: [

];
}

block ThermoelectricMaterialsLoader oftype SQLiteLoader {
table: "ThermoelectricMaterials";
file: "./ThermoelectricMaterials.sqlite";
}

ThermoelectricMaterialsExtractor
-> ThermoelectricMaterialsArchiveInterpreter
-> ThermoelectricMaterialsFilePicker
-> ThermoelectricMaterialsTextFileInterpreter
-> ThermoelectricMaterialsJSONInterpreter
-> ThermoelectricMaterialsTableInterpreter
-> ThermoelectricMaterialsLoader;
}
58 changes: 58 additions & 0 deletions libs/extensions/std/exec/src/archive-interpreter-executor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
implementsStatic,
} from '@jvalue/jayvee-execution';
import { IOType, PrimitiveValuetypes } from '@jvalue/jayvee-language-server';
import * as JSZip from 'jszip';

Check warning on line 22 in libs/extensions/std/exec/src/archive-interpreter-executor.ts

View workflow job for this annotation

GitHub Actions / CI

There should be at least one empty line between import groups
import * as zlib from 'node:zlib';

Check warning on line 23 in libs/extensions/std/exec/src/archive-interpreter-executor.ts

View workflow job for this annotation

GitHub Actions / CI

`node:zlib` import should occur before import of `path`

import {
inferFileExtensionFromFileExtensionString,
Expand Down Expand Up @@ -55,12 +56,69 @@
}
return R.ok(fs.right);
}
if (archiveType === 'gz') {
const fs = await this.loadGzFileToInMemoryFileSystem(
archiveFile,
context,
);
if (R.isErr(fs)) {
return fs;
}
return R.ok(fs.right);
}

return R.err({
message: `Archive is not a zip-archive`,
diagnostic: { node: context.getCurrentNode(), property: 'name' },
});
}

private async loadGzFileToInMemoryFileSystem(

Check failure on line 76 in libs/extensions/std/exec/src/archive-interpreter-executor.ts

View workflow job for this annotation

GitHub Actions / CI

Async method 'loadGzFileToInMemoryFileSystem' has no 'await' expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should duplicate a lot of the loadZipFileToInMemoryFileSystem logic. I assume only the extraction of files would be different, putting them into our in memory filesystem etc is all repeated logic.

It would be better to implement it that way, e.g., by extracting the shared logic of putting files into our in-memory file system into one function and reduce these two functions to just extracting the relevant archives and then using the new function to put files into a file system.

archiveFile: BinaryFile,
context: ExecutionContext,
): Promise<R.Result<FileSystem>> {
context.logger.logDebug(`Loading gz file from binary content`);
try {

Check warning on line 82 in libs/extensions/std/exec/src/archive-interpreter-executor.ts

View workflow job for this annotation

GitHub Actions / CI

Delete `⏎`
const fs = new InMemoryFileSystem();
const archivedObject = zlib.gunzipSync(archiveFile.content);

Check warning on line 85 in libs/extensions/std/exec/src/archive-interpreter-executor.ts

View workflow job for this annotation

GitHub Actions / CI

Delete `······`
const extNameArchive = path.extname(archiveFile.name);
const fileName = path.basename(archiveFile.name, extNameArchive);
const extName = path.extname(fileName);

const mimeType =
inferMimeTypeFromContentTypeString(extName) ||
MimeType.APPLICATION_OCTET_STREAM;
const fileExtension =
inferFileExtensionFromFileExtensionString(extName) ||
FileExtension.NONE;
const file = new BinaryFile(
fileName,
fileExtension,
mimeType,
archivedObject,
);

Check warning on line 102 in libs/extensions/std/exec/src/archive-interpreter-executor.ts

View workflow job for this annotation

GitHub Actions / CI

Delete `······`
const addedFile = fs.putFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very confused how this works because I can only see one putFile call and no loops or iterations. Does this actually correctly expand gz archives with multiple files in them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding gzip is used to compress individual files. To compress multiple files with gzip, they have to be concated first, e.g to a .tar file. This I would implement in a different feature. Happy to further look into this, if you disagree.

InMemoryFileSystem.getPathSeparator() + fileName,
file,
);

assert(addedFile != null);

return R.ok(fs);
}

Check warning on line 111 in libs/extensions/std/exec/src/archive-interpreter-executor.ts

View workflow job for this annotation

GitHub Actions / CI

Delete `⏎···`
catch (error: unknown) {
return R.err({
message: `Unexpected Error ${
error instanceof Error ? error.message : JSON.stringify(err)
} occured during processing`,
diagnostic: { node: context.getCurrentNode(), property: 'name' },
});
}
}

private async loadZipFileToInMemoryFileSystem(
archiveFile: BinaryFile,
context: ExecutionContext,
Expand Down
19 changes: 15 additions & 4 deletions libs/extensions/std/exec/src/http-extractor-executor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
// SPDX-License-Identifier: AGPL-3.0-only

import { strict as assert } from 'assert';
import * as http from 'http';
import * as https from 'https';
import * as path from 'path';

Check warning on line 6 in libs/extensions/std/exec/src/http-extractor-executor.ts

View workflow job for this annotation

GitHub Actions / CI

There should be at least one empty line between import groups
import { http, https } from 'follow-redirects';

import * as R from '@jvalue/jayvee-execution';
import {
Expand Down Expand Up @@ -110,8 +109,9 @@
} else {
httpGetFunction = http.get;
}
const followRedirects = context.getPropertyValue('followRedirects', PrimitiveValuetypes.Boolean);
return new Promise((resolve) => {
httpGetFunction(url, (response) => {
httpGetFunction(url, { followRedirects: followRedirects }, (response) => {
const responseCode = response.statusCode;

// Catch errors
Expand All @@ -126,6 +126,17 @@
);
}

if (responseCode === 302) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

301 is also a redirect code I assume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to >= 301, as 300 is used for 'multiple choices'. The remaining 300 < response codes < 400 point to redirect location or are not used in get context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change it to >= 301, that makes it dependent on other code to ensure this does not handle 400+. You can change it to >=301 and <400 though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

resolve(
R.err({
message: `HTTP fetch was redirected with code ${
responseCode ?? 'undefined'
}. Redirects are either disabled or maximum number of redirects was exeeded.`,
diagnostic: { node: context.getOrFailProperty('url') },
}),
);
}

// Get chunked data and store to ArrayBuffer
let rawData = new Uint8Array(0);
response.on('data', (chunk: Buffer) => {
Expand All @@ -151,7 +162,7 @@
'url',
PrimitiveValuetypes.Text,
);
const url = new URL(urlString);
const url = new URL(response.responseUrl);
let fileName = url.pathname.split('/').pop();
if (fileName === undefined) {
fileName = url.pathname.replace('/', '-');
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
archiveType: {
type: PrimitiveValuetypes.Text,
docs: {
description: 'The archive type to be interpreted, e.g., `"zip"`.',
description: 'The archive type to be interpreted, e.g., `"zip" or "gz`.',

Check warning on line 22 in libs/extensions/std/lang/src/archive-interpreter-meta-inf.ts

View workflow job for this annotation

GitHub Actions / CI

Insert `⏎·············`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: 'The archive type to be interpreted, e.g., `"zip" or "gz`.',
description: 'The archive type to be interpreted, e.g., `"zip" or "gz".',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 'The archive type to be interpreted, e.g., "zip" or "gz".',

},
},
},
Expand Down
13 changes: 13 additions & 0 deletions libs/extensions/std/lang/src/http-extractor-meta-inf.ts
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,19 @@
}
},
},
followRedirects: {
type: PrimitiveValuetypes.Boolean,
defaultValue: true,
docs: {
description: 'Indicates, whether to follow redirects on get requests. If `false`, Redirects are disabled. Default `true`',

Check warning on line 154 in libs/extensions/std/lang/src/http-extractor-meta-inf.ts

View workflow job for this annotation

GitHub Actions / CI

Insert `⏎·············`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redirects are not disabled, just not followed ;).

Suggested change
description: 'Indicates, whether to follow redirects on get requests. If `false`, Redirects are disabled. Default `true`',
description: 'Indicates, whether to follow redirects on get requests. If `false`, redirects are not followed. Default `true`',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested

examples: [
{
code: 'url: "tinyurl.com/4ub9spwz" \n followRedirects: true',
description: 'Specifies the URL to fetch the data from and allows redirects.',

Check warning on line 158 in libs/extensions/std/lang/src/http-extractor-meta-inf.ts

View workflow job for this annotation

GitHub Actions / CI

Insert `⏎·················`
},
],
},
},
},

// Input type:
Expand Down
18 changes: 18 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 7 additions & 4 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,24 @@
"lint": "nx run-many --target lint --max-warnings 0",
"test": "nx run-many --target test",
"generate": "nx run language-server:generate",
"example:cars": "nx run interpreter:run -d example/cars.jv -dg peek",
"example:gtfs": "nx run interpreter:run -d example/gtfs-static-and-rt.jv -dg peek",
"example:vehicles": "nx run interpreter:run -d -e DB_HOST=localhost -e DB_PORT=5432 -e DB_USERNAME=postgres -e DB_PASSWORD=postgres -e DB_DATABASE=postgres example/electric-vehicles.jv -dg peek"
"example:cars": "nx run interpreter:run -d example/cars.jv",
"example:materials": "nx run interpreter:run -d example/materials.jv",
"example:gtfs": "nx run interpreter:run -d example/gtfs-static-and-rt.jv",
"example:vehicles": "nx run interpreter:run -d -e DB_HOST=localhost -e DB_PORT=5432 -e DB_USERNAME=postgres -e DB_PASSWORD=postgres -e DB_DATABASE=postgres example/electric-vehicles.jv"
},
"private": true,
"dependencies": {
"@docusaurus/core": "2.4.1",
"@docusaurus/preset-classic": "2.4.1",
"@docusaurus/theme-mermaid": "2.4.1",
"@mdx-js/react": "^1.6.22",
"@types/follow-redirects": "^1.14.1",
"assert": "^2.0.0",
"chalk": "^4.1.2",
"clsx": "^1.2.1",
"commander": "^8.0.0",
"fast-csv": "^4.3.6",
"follow-redirects": "^1.15.2",
"fp-ts": "^2.13.1",
"gtfs-realtime-bindings": "^1.1.1",
"jszip": "^3.10.1",
Expand Down Expand Up @@ -94,4 +97,4 @@
"got": "^11.8.5"
}
}
}
}
Loading