Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/#392_Unpack_gz_files_and_#391follow_redirects #405

Merged
merged 22 commits into from
Jul 28, 2023
Merged

Conversation

A-M-A-X
Copy link
Contributor

@A-M-A-X A-M-A-X commented Jul 21, 2023

[FEATURE] HTTPExtractor should follow redirects #391

  • Added 'gz' file-type to Archive-Interpreter
    • Implemented with zlib
    • Added Testfile thermoelectricMaterials.jv

[FEATURE] Unpack gz files #392

@georg-schwarz
Copy link
Member

@f3l1x98 can you provide guidance on how the failing example file can be turned into a unit test?

Copy link
Contributor

@rhazn rhazn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you did not tag me in this to review but it popped up in my email. I left some spontaneous commments. For future work, it would be good to have only one PR per issue please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need new examples in the Jayvee repository right now, it would be better to move this to the SWC repo.

Also please don't provide examples that do not work. These files are meant as documentation for users of Jayvee, not to test implementation ;).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need new examples in the Jayvee repository right now, it would be better to move this to the SWC repo.

return R.err({
message: `Archive is not a zip-archive`,
diagnostic: { node: context.getCurrentNode(), property: 'name' },
});
}

private async loadGzFileToInMemoryFileSystem(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should duplicate a lot of the loadZipFileToInMemoryFileSystem logic. I assume only the extraction of files would be different, putting them into our in memory filesystem etc is all repeated logic.

It would be better to implement it that way, e.g., by extracting the shared logic of putting files into our in-memory file system into one function and reduce these two functions to just extracting the relevant archives and then using the new function to put files into a file system.

archivedObject,
);

const addedFile = fs.putFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very confused how this works because I can only see one putFile call and no loops or iterations. Does this actually correctly expand gz archives with multiple files in them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding gzip is used to compress individual files. To compress multiple files with gzip, they have to be concated first, e.g to a .tar file. This I would implement in a different feature. Happy to further look into this, if you disagree.

@@ -19,7 +19,7 @@ export class ArchiveInterpreterMetaInformation extends BlockMetaInformation {
archiveType: {
type: PrimitiveValuetypes.Text,
docs: {
description: 'The archive type to be interpreted, e.g., `"zip"`.',
description: 'The archive type to be interpreted, e.g., `"zip" or "gz`.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: 'The archive type to be interpreted, e.g., `"zip" or "gz`.',
description: 'The archive type to be interpreted, e.g., `"zip" or "gz".',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 'The archive type to be interpreted, e.g., "zip" or "gz".',

type: PrimitiveValuetypes.Boolean,
defaultValue: true,
docs: {
description: 'Indicates, whether to follow redirects on get requests. If `false`, Redirects are disabled. Default `true`',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redirects are not disabled, just not followed ;).

Suggested change
description: 'Indicates, whether to follow redirects on get requests. If `false`, Redirects are disabled. Default `true`',
description: 'Indicates, whether to follow redirects on get requests. If `false`, redirects are not followed. Default `true`',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested

@@ -126,6 +126,17 @@ export class HttpExtractorExecutor extends AbstractBlockExecutor<
);
}

if (responseCode === 302) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

301 is also a redirect code I assume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to >= 301, as 300 is used for 'multiple choices'. The remaining 300 < response codes < 400 point to redirect location or are not used in get context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change it to >= 301, that makes it dependent on other code to ensure this does not handle 400+. You can change it to >=301 and <400 though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

@rhazn
Copy link
Contributor

rhazn commented Jul 21, 2023

Linting is failing for this PR btw, you can run the linter locally with npm run lint.

@f3l1x98
Copy link
Contributor

f3l1x98 commented Jul 21, 2023

@f3l1x98 can you provide guidance on how the failing example file can be turned into a unit test?

If help is needed/wanted, yes.
Though we currently do not have any tests yet, where this could be tested.
I think this would be part of either the test for the CSVInterpreterBlockExecutor or End-to-End tests, but would have to look into it.

@rhazn
Copy link
Contributor

rhazn commented Jul 26, 2023

@A-M-A-X (in case you want me to review this again, you'll need to re-request the review on the top right, little icon next to my name).

@A-M-A-X A-M-A-X requested a review from rhazn July 28, 2023 07:41
Copy link
Contributor

@rhazn rhazn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes to the function still but otherwise looks good 👍

}
}

private createBinaryAndPutFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two problems with this:

  • Methods that have an "and" in their name are almost always to large and can be split up in an obvious way ;). In this case, I'd split this into createBinary and putFile theoretically.
  • Changing objects that are parameters to a function as a sideeffect is often confusing and leads to errors. Here I would not change the inmemoryfilesystem as sideeffect of the function.

Combined, I'd rename the method to createFileFromArchive (or similar) and just have line 120 to 128 in it, return the new BinaryFile. Then habe the fs.putfile and assert outside of the function on it's own, I don't think they need their own function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Moved fs.putFile and assertion outside of function.
  • Renamed function to createFileFromArchive and have it return the new BinaryFile.
  • Removed now unused parameters.

@A-M-A-X A-M-A-X requested a review from rhazn July 28, 2023 11:09
@rhazn
Copy link
Contributor

rhazn commented Jul 28, 2023

Wait

@rhazn
Copy link
Contributor

rhazn commented Jul 28, 2023

Ah, nvm I am blind. It's fine, you can merge this into dev.

@A-M-A-X
Copy link
Contributor Author

A-M-A-X commented Jul 28, 2023

@rhazn I think I am missing write acess to do the merge myself, or am I blind as well?

@rhazn
Copy link
Contributor

rhazn commented Jul 28, 2023

Ah yeah might be on me. It's fine, I can merge these finally.

@rhazn rhazn merged commit a9abd0b into jvalue:dev Jul 28, 2023
2 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants