-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/magic desc mime type #21
Feature/magic desc mime type #21
Conversation
Hi, For example:
When writing a karton service, you'd have basically no way to listen for such tasks - you can't use wildcards in headers. Do you have any specific usecase for this? Maybe moving magic to payload would be a better idea? |
Sounds like moving |
Yup, |
95f2493
to
59fab15
Compare
payload.update(expected["payload"]) | ||
|
||
res = self.run_task(task) | ||
self.assertTasksEqual(res, [Task(expected["headers"], payload)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite simple and cool, but I'm not sure if hardcoding libmagic output in test cases is a good idea - it changes between versions. These tests are quite broken on my system (file v5.40). Examples:
AssertionError: 'PDF document, version 1.4, 1 pages' != 'PDF document, version 1.4'
- PDF document, version 1.4, 1 pages
? ---------
+ PDF document, version 1.4
: Incorrect value of payload.magic
AssertionError: 'ASCII text, with very long lines (32088), with no line terminators' != 'ASCII text, with very long lines, with no line terminators'
- ASCII text, with very long lines (32088), with no line terminators
? --------
+ ASCII text, with very long lines, with no line terminators
AssertionError: 'Zip archive data, at least v2.0 to extract, compression method=deflate' != 'Zip archive data, at least v2.0 to extract'
- Zip archive data, at least v2.0 to extract, compression method=deflate
? ----------------------------
+ Zip archive data, at least v2.0 to extract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is to be expected. Something like libmagic
behaves differently, sometimes even within the same version (e.g. with local magic files, etc.). That's why in my experience we choose a frame of reference which is used for the test execution. In this case, this frame of reference is the CI/CD pipeline which runs the tests (based on Ubuntu latest in this case). Then everyone uses that frame of reference for development. This is proven to work and guarantees stable builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAK from me. Tests that rely on a specific environment are a bad idea. Apart from annoyances, such as inability to run the tests on a development machine, there are some other problems:
- Using ubuntu-latest for testing basically guarantees that the environment will be different at some point. When GitHub changes latest from 20.04 to 21.04, tests will probably break unexpectedly and require fixing.
- Our CI is not and should not be the only testing environment (the more testing we get, the better). At this point we already have a downstream - nixpkgs that runs classifier tests in their CI infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand your point of view. So what do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the tests require some more consideration I would suggest splitting them off to a separate PR.
We could merge the MIME / magic additions now and take some more time designing the implementation of test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"writing the tests later" == "writing the tests never". Maybe we can fix it somehow after all?
Copied from slack: Hmm, can we pin libmagic version somehow? And ship a statically compiled binary? ping @chivay (cert.pl) . The upside is that the service becomes more "stable" and works in the same way everywhere Another way to fix it would be to use fuzzy matching for |
Superseded by #23 |
This change set adds the magic description and mime type to the classifier result.
To support easier test writing the artifacts have been augmented with a .json file which contains the expected classification result. Makes for much easier test case writing.