hashlib.sha1 giving incorrect results #194

goatchurchprime · 2017-10-04T16:51:15Z

MicroPython v1.9.2-273-g56f18602 on 2017-09-25; ESP32 module with ESP32
Type "help()" for more information.
>>> import binascii, hashlib
>>> hash = hashlib.sha1(b'mango')
>>> binascii.hexlify(hash.digest()) 
b'b19f233f1f6b38edc052ea9de39f3fd0c78fefba'

But on normal Python it's:

Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58) 
Type "help", "copyright", "credits" or "license" for more information.
>>> import binascii, hashlib
>>> hash = hashlib.sha1(b'mango')
>>> binascii.hexlify(hash.digest()) 
b'934aae49f648ed870c9c421829f4cece6643cf86'

Oddly, the value on version "MicroPython v1.9.2-270-g14fb53e0 on 2017-09-11; ESP32 module with ESP32" as reported in issue #178 is correct, so there has been a regression.

The text was updated successfully, but these errors were encountered:

goatchurchprime · 2017-10-04T17:05:26Z

Verified that it's still a problem on "MicroPython v1.9.2-275-gd0e11f76 on 2017-10-04; ESP32 module with ESP32"

goatchurchprime · 2017-10-04T17:11:13Z

Doh! I can't even go back and download the last known stable version at: http://micropython.org/resources/firmware/esp32-20170911-v1.9.2-270-g14fb53e0.bin They old versions are all destroyed

nickzoic · 2017-10-04T23:02:57Z

Thanks for letting us know. I'll go back and build some older versions and see where they diverged.
But yes, I can confirm that this is a problem, I get the exact same results as you (on v1.9.2-275-gd0e11f7)

nickzoic · 2017-10-05T00:22:57Z

f0561ab changes CONFIG_MBEDTLS_HARDWARE_SHA in the process of updating to a newer esp-idf.
Reverting it and rewinding to espressif/esp-idf@4ec2abb fixes the problem.

MrSurly · 2017-10-05T00:28:02Z

Two questions: What's it using for SHA if that is unset, and why is it wrong?

dpgeorge · 2017-10-05T00:48:18Z

That's a bit worrying that the software SHA is broken (or seems to be)... as I said in that commit, when updating the IDF I used the default mbedtls settings. If you look at components/mbedtls/Kconfig in the IDF then it suggests that software SHA is better due to a hardware limitation.

@nickzoic can you check if adding #define CONFIG_MBEDTLS_HARDWARE_SHA 1 to our sdkconfig.h makes the SHA1 correct for this example?

projectgus · 2017-10-05T01:19:53Z

You need to call mbedtls_shaXX_starts(ctx, xx) after each call mbedtls_shaXX_init() to seed the internal state of the SHA context. This is an mbedTLS API requirement (init only zeroes the context structure.)

This isn't a problem for the hardware accelerated version, as we ignore the software initial context state.

(I have been bit by this API limitation before which is why I went and looked for it in moduhashlib.c when I saw this bug.)

Hardware SHA is faster in some circumstances (short lived hash contexts with minimal concurrency) and slower in others (long lived hash contexts with concurrency). This is because there's only one SHA engine per hash type, so we "read out" the hardware hash context to software if another context needs it.

The situation with multiple long lived hash contexts tends to happen in TLS sessions, which is why we've turned it off by default.

nickzoic · 2017-10-05T03:04:48Z

Hmmm, thanks for the info @projectgus, it sounds like a quick fix anyway.
@dpgeorge, if you're busy with other things I'll push a PR tonight ...

PS: If the hardware engine switches out to the software engine when concurrency occurs, how come it is slower than just using the software engine? Does the copy out of hash context take a long time?

dpgeorge · 2017-10-05T03:06:52Z

Thanks @projectgus that's very useful to know, not only for the esp32.

@nickzoic please feel free to push a fix for this.

projectgus · 2017-10-05T03:21:54Z

If the hardware engine switches out to the software engine when concurrency occurs, how come it is slower than just using the software engine? Does the copy out of hash context take a long time?

Actually, sorry, what I said before is wrong and based on a previous revision of the SHA concurrency model. Now a hardware SHA context just stays in hardware for as long as it's needed there and any other contexts run in software.

What I'm basing the summary I gave on is running some fairly high level measurements (with CPU 240MHz) where TLS sessions were slower with hardware SHA but simple benchmarks were faster.

I believe most of the overhead is copying data in/out of the accelerator registers which are in DPORT memory. Reading DPORT is currently particularly limiting in dual core mode because of a hardware bug (we have to stall the other CPU). It may be simply that having multiple hardware accelerators enabled and in use at once (ie a TLS handshake) is too limiting because of this, it's better to let both cores continue working smoothly.

In single core mode or at lower CPU speeds than 240MHz then the hardware accelerators are probably faster in all cases.

I'd like to look into this more at some point in the future... :)

nickzoic · 2017-10-05T11:17:07Z

OK, so microseconds after I pushed a fix for this I accidentally did this and discovered something a little weird:

>>> import binascii, hashlib
>>> hash = hashlib.sha1(b'mango')
>>> binascii.hexlify(hash.digest()) 
b'934aae49f648ed870c9c421829f4cece6643cf86'
>>> binascii.hexlify(hash.digest()) 
b'ee9539bd14ed9a35ee6fe86a228cc34a02805f32'

whereas in Python 3.5.3, the behaviour seems a bit less weird:

>>> import binascii, hashlib
>>> hash = hashlib.sha1(b'mango')
>>> binascii.hexlify(hash.digest()) 
b'934aae49f648ed870c9c421829f4cece6643cf86'
>>> binascii.hexlify(hash.digest()) 
b'934aae49f648ed870c9c421829f4cece6643cf86'

(same thing in sha256)

dpgeorge · 2017-10-05T11:38:03Z

Running digest() more than once is also not supported by esp8266 or unix uPy (see the tests/extmod/uhashlib_sha256.py test). So if it's not an easy fix then the behaviour can be left as-is for now.

nickzoic · 2017-10-05T11:41:36Z

(aha, it's doing the padding step as part of the _finish call, so that's getting done twice and scrambling the results. The only way around it would be to keep a copy of that vstr output buffer around and then you'd only have to call _finish once ...)

For now I just pushed it in with the smallest possible change: I hadn't realized that was known behaviour so I guess we can close this after all.

goatchurchprime · 2017-10-05T13:07:13Z

Can it always throw an exception rather than give a wrong answer?

It's an amazingly difficult thing to debug when sha() gives a wrong answer because you only consider it after eliminating all other possibilities.

I hit this one because I was writing a python implementation of websockets which didn't work and was reduced to portsniffing successful connections to find out what was going wrong between the http headers.

nickzoic · 2017-10-06T01:15:06Z

Yeah, I'd rather it worked like the CPython version and just return the same answer every time, but throwing an exception would be a lesser evil than returning wrong results.

Thanks for bringing it to our attention, we've pushed a fix and its now available for download as esp32-20171005-v1.9.2-276-ga9517c04.bin

goatchurchprime · 2017-10-10T13:15:16Z

I've checked and it's giving the right answer on the ESP32 (and even makes my websocket work).

The second calling to digest() does give a different answer, and the update() doesn't work as expected (after digest() is called). Same goes for the ESP8266, but with a different spurious answer on the second calling of digest().

https://docs.python.org/3/library/hashlib.html says of digest() "Return the digest of the data passed to the update() method so far..." which suggests multiple calls are possible

Definitely, if it is not otherwise possible, digest() should put the hash object in a state where any subsequent function call causes an exception.

I don't know of the exact policy, but after this bug should not a sanity check on the answer be included in the tests here?
https://github.com/micropython/micropython-esp32/blob/43d7a9d2c92d1a98a1f4ab8b4e1c4a1091e20df3/tests/extmod/uhashlib_sha1.py

nickzoic · 2017-10-10T23:58:14Z

Yeah, I'd like to push nicer behaviour upstream to main micropython.
Perhaps just by calling mbed_tls_sha1_clone before mbed_tls_sha1_finish.
That would allow multiple calls to digest and partial digests and bring it into line with CPython hashlib and the principle of least surprise :-)

nickzoic · 2017-10-11T02:51:23Z

I don't know of the exact policy, but after this bug should not a sanity check on the> answer be included in the tests here?

Yes. I'll put in a PR for that too ASAP.

…s often as you want. This same techique would work with extmod/moduhashlib.c easily enough

nickzoic closed this as completed in a9517c0 Oct 5, 2017

nickzoic reopened this Oct 5, 2017

nickzoic added a commit to nickzoic/micropython-esp32 that referenced this issue Oct 11, 2017

Fix micropython#194 ... hashes are partial and digest can be called a…

d414116

…s often as you want. This same techique would work with extmod/moduhashlib.c easily enough

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hashlib.sha1 giving incorrect results #194

hashlib.sha1 giving incorrect results #194

goatchurchprime commented Oct 4, 2017 •

edited

Loading

goatchurchprime commented Oct 4, 2017

goatchurchprime commented Oct 4, 2017

nickzoic commented Oct 4, 2017 •

edited

Loading

nickzoic commented Oct 5, 2017

MrSurly commented Oct 5, 2017

dpgeorge commented Oct 5, 2017

projectgus commented Oct 5, 2017

nickzoic commented Oct 5, 2017

dpgeorge commented Oct 5, 2017

projectgus commented Oct 5, 2017 •

edited

Loading

nickzoic commented Oct 5, 2017 •

edited

Loading

dpgeorge commented Oct 5, 2017

nickzoic commented Oct 5, 2017 •

edited

Loading

goatchurchprime commented Oct 5, 2017

nickzoic commented Oct 6, 2017

goatchurchprime commented Oct 10, 2017

nickzoic commented Oct 10, 2017

nickzoic commented Oct 11, 2017 via email

hashlib.sha1 giving incorrect results #194

hashlib.sha1 giving incorrect results #194

Comments

goatchurchprime commented Oct 4, 2017 • edited Loading

goatchurchprime commented Oct 4, 2017

goatchurchprime commented Oct 4, 2017

nickzoic commented Oct 4, 2017 • edited Loading

nickzoic commented Oct 5, 2017

MrSurly commented Oct 5, 2017

dpgeorge commented Oct 5, 2017

projectgus commented Oct 5, 2017

nickzoic commented Oct 5, 2017

dpgeorge commented Oct 5, 2017

projectgus commented Oct 5, 2017 • edited Loading

nickzoic commented Oct 5, 2017 • edited Loading

dpgeorge commented Oct 5, 2017

nickzoic commented Oct 5, 2017 • edited Loading

goatchurchprime commented Oct 5, 2017

nickzoic commented Oct 6, 2017

goatchurchprime commented Oct 10, 2017

nickzoic commented Oct 10, 2017

nickzoic commented Oct 11, 2017 via email

goatchurchprime commented Oct 4, 2017 •

edited

Loading

nickzoic commented Oct 4, 2017 •

edited

Loading

projectgus commented Oct 5, 2017 •

edited

Loading

nickzoic commented Oct 5, 2017 •

edited

Loading

nickzoic commented Oct 5, 2017 •

edited

Loading