Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add static hash helper methods #17590

Closed
GSPP opened this issue Jun 12, 2016 · 30 comments · Fixed by #39372
Closed

Add static hash helper methods #17590

GSPP opened this issue Jun 12, 2016 · 30 comments · Fixed by #39372
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Security
Milestone

Comments

@GSPP
Copy link

GSPP commented Jun 12, 2016

Edit by @GrabYourPitchforks on 20 Jan 2020: Formal API proposal written at https://github.com/dotnet/corefx/issues/9369#issuecomment-576445733.

Computing a hash code currently requires this code:

        byte[] hashBytes;
        using (var hashAlgo = SHA256.Create())
            hashBytes = hashAlgo.ComputeHash(bytes);

Or the more awkward HashAlgorithm.Create("SHA256").

This code is alright. It's not the end of the world. But I think it should be slimmer than that:

var hashBytes = SHA256.ComputeHash(bytes);

Benefits:

  1. This is more terse.
  2. It's an expression as opposed to a statement. That makes it more composable. It can be a subexpression for example in a LINQ query.
  3. No way to forget resource cleanup or mess up code quality otherwise.
  4. SHA256.ComputeHash can look at it's input size and dynamically pick the fastest implementation. I found the following to be optimal through testing on Windows x64: estimatedDataLength <= 512 ? new SHA1Managed() : HashAlgorithm.Create("SHA1"). Apparently, using the Windows crypto API has quite some per-hash cost.

I request that static helper method be added to the framework. This seems like an attractive case for a community contribution.


Proposal:

Using this pattern on all of the HashAlgorithm-derived types that are not KeyedHashAlgorithm-derived types (MD5, SHA1, SHA256, SHA384, SHA512):

public partial class SHA256
{
    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, ReadOnlySpan<byte> destination) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, ReadOnlySpan<byte> destination, out int bytesWritten) => throw null;
}
@joshfree
Copy link
Member

Thanks @GSPP! @bartonjs take a look and tag this as appropriate.

@bartonjs
Copy link
Member

There are both good and bad sides to this idea.

The succinctness, composability, and reduced strain on the finalizer are all good. But "mess up code quality" is hard to gauge... if you're hashing a lot then it's better to hold the hash object across calls to avoid the setup/teardown overhead.

If we were to make it a static method then we need to make it thread safe (by the general rules of the framework), which means one of

  • Writing it as a using statement, each call doing setup/teardown
  • Writing it with thread-local statics, holding each object forever
  • Writing it with a lock statement
  • Writing it using Interlocked.Exchange to swap it in and out of a static field, leaving one hasher (per algorithm) open forever.

None of those are hard, or necessarily bad, but they all have different tradeoffs; and I'm not sure we'd be able to determine which one is best for the framework; other than guessing arbitrarily.

And, of course, there's an instance v static naming problem. The good name (ComputeHash) is already taken as a public instance method on a base class. We could redefine it with new, but I'm not sure it can really be done in a way that helps more than it confuses. (Or maybe we do this all the time and I've just never noticed).

(And as for the switching between the managed and native implementations, that could be done on the .NET Framework implementation, but in .NET Core we don't have a managed implementation).

So... given that we've broken IncrementalHash out into its own class, I think this definitely makes sense to have been the original implementation model. I'm just not sure that it can really be worked in to the existing API without creating more confusion than benefit.

(Maybe a new class which has each algorithm as a method?

public static class Hash
{
    public static byte[] SHA256(byte[] data);
    // And the other overloads

    ...
}

)

@GSPP
Copy link
Author

GSPP commented Jun 13, 2016

I like the new class idea.

There could be an overload for Stream and, more questionably, an overload for string, Encoding. Hashing strings is common and often done wrong because the encoding aspect is not understood. This overload would gently inform the user that the encoding is an important choice.

I did not think of sharing the object. I like that a lot! That would increase efficiency to nearly optimal levels.

I consider it bad practice to reuse crypto objects because that's error potential in very sensitive code. That's even more reason to add the static functions. If the framework does it correctly callers can rely on it and forget about playing caching tricks.

Now the advice can be: "If you want to hash anything, just use the Hash class." That's super simple guidance with almost no potential of misuse. I like that a lot.

@morganbr
Copy link
Contributor

I've always been a fan of adding convenience methods for the very common situation of having all the data you intend to hash (or do other crypto operations with). I'd rather not add more classes since we already have classes people know to use; adding static methods should generally be fine (maybe something like Sha256.Hash(byte[])?).

As @GSPP pointed out, as the framework, we can experiment with the policies you floated and find something that generally works reasonably well. If somebody finds a situation that isn't good for them, they're free to manage their own instances. Among your suggestions, I like attempting to do some kind of sharing for general efficiency. As long as we're not holding on to a ton of hashers, I think it's fine to have them open for a while... they don't use that many resources.

@bartonjs bartonjs removed their assignment Oct 3, 2016
@karelz
Copy link
Member

karelz commented Nov 16, 2016

We need API proposal.

@GSPP
Copy link
Author

GSPP commented Aug 12, 2018

API proposal:

Let's add static methods to the individual hash classes, e.g.

var hashBytes = SHA256.ComputeHash(bytes);

I'd not add a new class Hash (or similar name) with usage like Hash.SHA256(...). Each hash type is logically isolated. User code might add more hash algorithms that would not be represented on this Hash class leading to confusion.

Libraries providing more hash algorithms can simply implement the static method pattern set forth by the framework.

We could cache hash algorithm instances. One issue here is that permanent resource usage might result from performing just a single or a few hash operations.

The cache could either be a ThreadStatic field or a global pool. I suspect that the overhead of any global pool would destroy too much performance.

Here is a sample implementation for a thread static solution:

internal static class HashAlgorithmCache<T>
{
 static T managedInstance;
 public static T GetManagedInstance(Func<T> factory) =>
  return managedInstance ?? (managedInstance = factory());

 static T unmanagedInstance;
 public static T GetUnmanagedInstance(Func<T> factory) =>
  return unmanagedInstance ?? (unmanagedInstance = factory());

 public static T GetInstance(
   int? expectedDataLength,
   Func<T> managedFactory,
   Func<T> unmanagedFactory)
 {
  bool useManaged = expectedDataLength == null || expectedDataLength.Value <= 512;
  return
   useManaged ? GetManagedInstance(managedFactory) :
   GetUnmanagedInstance(unmanagedFactory);
 }
}

class SHA256
{
 public static byte[] ComputeHash(byte[] input)
 {
   var expectedDataLength = input.Length;
   var hashAlgo = HashAlgorithmCache<SHA256>.GetInstance(
     expectedDataLength,
     () => new SHA256Managed(),
     () => HashAlgorithm.Create("SHA256"));
   return hashAlgo.ComputeHash(bytes);
 }
}

The main advantage is simplicity and performance. The main downside is potentially permanent resource usage from those cached algorithms. There are two instances per thread which can result in hundreds of objects.

An alternative here is to simply not cache anything and create a fresh instance each time.

There should be overloads for accepting Span<byte>, Stream and possibly pipelines if that is possible from a dependency standpoint.

@GrabYourPitchforks
Copy link
Member

On Win10+, the OS exposes a one-shot p/invoke API BCryptHash. You don't even need to open or cache an algorithm handle; you can pass the well-known constant BCRYPT_SHA256_ALG_HANDLE (for SHA256) in as the first parameter and let the OS take care of everything on your behalf. Pseudohandles for other hash routines are defined in bcrypt.h.

On other OSes, we could have a thread-static WeakReference<HashAlgorithm> if we're concerned about holding on to these objects for too long. That would allow the GC to reclaim them.

@GrabYourPitchforks
Copy link
Member

API proposal from the thread, more formally stated:

namespace System.Security.Cryptography
{
    public class SHA256 : HashAlgorithm
    {
        /*
         * new proposed static API
         * uses 'new' keyword since instance member HashAlgorithm.ComputeHash(byte[]) exists
         */
        public static new byte[] ComputeHash(byte[] data);

        /*
         * new proposed static API
         * computes the hash of the source data and writes it to the destination buffer.
         * returns number of bytes written; throws if destination too small.
         */
        public static int ComputeHash(ReadOnlySpan<byte> source, Span<byte> destination);
    }

    // above static methods also added to classes MD5, SHA1, SHA384, SHA512
}

@bartonjs
Copy link
Member

As ComputeHash is an instance method, making a static method with the same name seems overly confusing (removed ready-for-review)

@bartonjs
Copy link
Member

It's actually worse than confusing, it's a recompile-breaking change. Anyone with a strongly typed SHA256 that currently invokes the instance method will get a compile error:

Error	CS0176	Member 'SHA256.ComputeHash(byte[])' cannot be accessed with an instance reference; qualify it with a type name instead

@GrabYourPitchforks
Copy link
Member

@bartonjs You have an alternative proposal that puts these as static members on a new static type, which would work around the issue. Even with the issues you point out it still seems like something we'd allow to come through API review since it's an actionable, concrete proposal.

@bartonjs
Copy link
Member

The proposal for a new type named Hash? Since we've spanified things since then, I think it would be

public static class Hash
{
     public static byte[] SHA256(byte[] input) => throw null;
     public static int SHA256(ReadOnlySpan<byte> input, Span<byte> destination) => throw null;

     //repeat for MD5, SHA1, SHA384, SHA512
}

Morgan's suggestion was to use Hash as the method name for the one-shot, which would be

public partial class SHA256
{
    public static byte[] Hash(byte[] input) => throw null;
    public static int Hash(ReadOnlySpan<byte> input, ReadOnlySpan<byte> destination) => throw null;
}

The latter generalizes better (anyone who adds a hash algorithm in a NuGet package, or whatever, can follow that pattern, but they can't add static methods on our Hash accelerator type).

Having done a lot of work lately on patterns and practices, I think I prefer it as methods on the existing classes, as long as Hash (or some other non-breaking-change verb) seems discoverable enough.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@bartonjs bartonjs added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Jul 7, 2020
@bartonjs bartonjs modified the milestones: 5.0.0, Future Jul 7, 2020
@bartonjs
Copy link
Member

bartonjs commented Jul 9, 2020

Video

Approved as proposed

public partial class MD5
{
    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA1
{
    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA256
{
    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA384
{
    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA512
{
    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 9, 2020
@bartonjs bartonjs modified the milestones: 5.0.0, Future Jul 9, 2020
@vcsjones
Copy link
Member

vcsjones commented Jul 9, 2020

public static int Hash(ReadOnlySpan<byte> source, ReadOnlySpan<byte> destination) => throw null;

I assume destination should be Span<byte>?

@bartonjs
Copy link
Member

bartonjs commented Jul 9, 2020

Grumble, grumble, copy pasta. Fixed.

@vcsjones
Copy link
Member

vcsjones commented Jul 9, 2020

@bartonjs okay, one more. There is already a member on HashAlgorithm named Hash.

The Hash methods will hide that property. (CS0108). We either need to use the new keyword or come up with a new name.

@bartonjs
Copy link
Member

bartonjs commented Jul 9, 2020

Well. That's a bad job on my part.

Any better name suggestion? I don't think the name masking will work, because of the legacy interaction with ICryptoTransform.

@vcsjones
Copy link
Member

vcsjones commented Jul 9, 2020

I don't think the name masking will work

Regardless of working or not, it is also a breaking change:

public static void Main() {
    var f = new Frob();
    _ = f.Hash;
}

public class Foo {
    public byte[]? Hash { get; set; }
}

public class Frob : Foo {
    // Uncomment and `Main` will not compile anymore.
    //public new byte[] Hash(byte[] cat) => Array.Empty<byte>();
}

Any better name suggestion?

... nothing with precedent. We could, though it's somewhat redundant, add the hash name?

public byte[] HashSHA256(...);
public bool HashMD5(...); // or whatever the "right" casing is.

There's also Digest, but that hasn't been used as a verb in the framework as far as I can see, and feels like more using new terminology that might confuse developers just to avoid a name collision.

The alternative is we put the methods on a different class or new class. IncrementalHash seems wrong because it isn't... incremental... I don't think there is anything existing that would work.

I'm starting to warm up to the idea of a new class: during API review I asked 'why not on HashAlgorithm?' to which you gave a good answer about inheritance. But we still have that problem on SHA256 (and others) since they are also abstract. Consider a developer that has done:

public sealed class MyHsmBackedSHA256 : SHA256
{
    // impl
}

Calling MyHsmBackedSHA256.Hash(someBytes) will call our implementation, not theirs (unless they shadow it). This wouldn't exactly be a new problem, as the Create factory methods exhibit the same problem. Whoever implemented MyHsmBackedSHA256 would need to shadow ours, and maybe it's in an infrequently updated nuget package?

So this was a very long way of me to say ¯\_(ツ)_/¯

@vcsjones
Copy link
Member

vcsjones commented Jul 9, 2020

re-reading through the proposal, I will now suggest / endorse:

namespace System.Security.Cryptography {
    public static class Hash {
        public static byte[] HashMD5(byte[] source) => throw null;
        public static byte[] HashMD5(ReadOnlySpan<byte> source) => throw null;
        public static int HashMD5(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
        public static bool TryHashMD5(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
    
        public static byte[] HashSHA1(byte[] source) => throw null;
        public static byte[] HashSHA1(ReadOnlySpan<byte> source) => throw null;
        public static int HashSHA1(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
        public static bool TryHashSHA1(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;

        public static byte[] HashSHA256(byte[] source) => throw null;
        public static byte[] HashSHA256(ReadOnlySpan<byte> source) => throw null;
        public static int HashSHA256(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
        public static bool TryHashSHA256(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;

        public static byte[] HashSHA384(byte[] source) => throw null;
        public static byte[] HashSHA384(ReadOnlySpan<byte> source) => throw null;
        public static int HashSHA384(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
        public static bool TryHashSHA384(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;

        public static byte[] HashSHA512(byte[] source) => throw null;
        public static byte[] HashSHA512(ReadOnlySpan<byte> source) => throw null;
        public static int HashSHA512(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
        public static bool TryHashSHA512(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
    }
}

@vcsjones
Copy link
Member

vcsjones commented Jul 9, 2020

And, while I'm at it, can throw in bonus API:

namespace System.Security.Cryptography {
    public static class Hash {
        public static byte[] HashName(HashAlgorithmName hashAlgorithmName, byte[] source) => throw null;
        // needs better name. Compute? Just can't be hash unless class name changes.
        // etc
    }
}

and maybe we don't even need the MD5, SHA1, dedicated methods anymore.

Using a new class at least makes it easier to use HashAlgorithmName, and I feel it would be a shame to not support it. These one-shot APIs would probably be useful within the framework itself in a few places, but we tend to have a HashAlgName and writing if.. if.. if.. if.. if.. if... in a bunch of places feels tedious, and more places to update when / if we get more hash algorithms (SHA3, or SHA512256).

@bartonjs
Copy link
Member

A standalone Hash class, using the algorithm names as verbs, would certainly solve the problem in-framework, but doesn't create an extensible pattern, so there's nothing for someone to really follow if they wanted to "bring back" RIPEMD160, or wanted to provide historical algorithms the framework never had, like MD4.

you gave a good answer about inheritance. But we still have that problem on SHA256 (and others) since they are also abstract.

Assuming that the derived type of SHA256 is still implementing SHA256, there's room for "but I thought SHA256CryptoServiceProvider.SomeVerb(data) used CAPI...", but it's always SHA256. That's better (in my head) than "SHA256.SomeVerb(algName, data)" actually being MD5. (You could also say the same thing about RSACryptoServiceProvider, but we made that type "just (mostly) work" on non-Windows, showing sometimes "right answer" is good enough.)

"OneShot" is what I'd call this functionality for an internal method, or perhaps "StaticHash". "OneShot" doesn't feel right for public API, and while "StaticHash" is still a little on-the-nose it does avoid combining "Hash" from the instance members and type name with "Digest" as the synonym verb.

Another reasonable thing is "StatelessHash", separating it from ComputeHash which has a stateful mix with the object (Disposed or not, and poor interaction with the TransformBlock method). (Or "HashStateless", but that feels less good in my gut)

@vcsjones
Copy link
Member

Okay, if we want to stick with statics on existing algs, I am not terribly concerned with the name. Could do any of yours, or SingleHash, InstantHash, etc. or just throw the word Data in there somewhere, HashData. That might have the benefit of being generic enough that folks don’t read too much in to the name.

@GSPP
Copy link
Author

GSPP commented Jul 10, 2020

Maybe HashBuffer? That would signal that a single buffer is being processed. This is in contrast to the other more general APIs (TransformBlock, ...).

Alternatives: HashBytes, GetHash.

@bartonjs
Copy link
Member

bartonjs commented Jul 10, 2020

Video

We discussed this again, and accepted HashData.

public partial class MD5
{
    public static byte[] HashData(byte[] source) => throw null;
    public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
    public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA1
{
    public static byte[] HashData(byte[] source) => throw null;
    public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
    public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA256
{
    public static byte[] HashData(byte[] source) => throw null;
    public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
    public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA384
{
    public static byte[] HashData(byte[] source) => throw null;
    public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
    public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}
public partial class SHA512
{
    public static byte[] HashData(byte[] source) => throw null;
    public static byte[] HashData(ReadOnlySpan<byte> source) => throw null;
    public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
    public static bool TryHashData(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
}

@bartonjs bartonjs added api-needs-work API needs work before it is approved, it is NOT ready for implementation api-approved API was approved in API review, it can be implemented and removed api-approved API was approved in API review, it can be implemented api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Jul 10, 2020
@vcsjones
Copy link
Member

I think I can squeeze this one in over the weekend.

@bartonjs
Copy link
Member

Regarding the suggestion of "HashBytes": there are existing methods named (Try)HashData... admittedly, they're all protected; but there aren't any named "HashBytes", so "HashData" is more platform-consistent.

@bartonjs
Copy link
Member

I think I can squeeze this one in over the weekend.

If you want to PAL through to native one-shots, that'd be best. If you want to go quicker with [ThreadStatic], that's fine too.

@vcsjones
Copy link
Member

If you want to PAL through to native one-shots, that'd be best

That's the intention.

We don't really have a native PAL for CNG (do we?) and CNG's one-shot BCryptHash is Windows 10+, so maybe both?

@bartonjs
Copy link
Member

We don't really have a native PAL for CNG (do we?)

Well, we have the PAL (split build for the OS), we just don't carry a native shim.

@vcsjones vcsjones mentioned this issue Jul 15, 2020
2 tasks
@vcsjones
Copy link
Member

vcsjones commented Jul 15, 2020

New APIs are in. Work left to be done:

  • Cache results of EVP_sha1, EVP_sha256, etc in HashAlgorithmToEvp to avoid a P/Invoke call.
  • Optimize CNG implementation to not allocate a stateful hash object.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants