Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Atomic<T> (corefx) #17975

Open
benaadams opened this issue Aug 1, 2016 · 69 comments
Open

Proposal: Atomic<T> (corefx) #17975

benaadams opened this issue Aug 1, 2016 · 69 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Threading
Milestone

Comments

@benaadams
Copy link
Member

Atomic

Background

x64 has been able to do 128bit/16 byte Interlocked CompareExchanges for a while and its a cpu requirement to be able install Window 8.1 x64 and Windows 10 x64. Its also available on MacOS/OSX and Linux. Its availability would allow for more interesting lock-less algorithms.

Was trying to think if there was an easy fallback for https://github.com/dotnet/corefx/issues/10478 but since its struct-based there is nothing to lock.

So additionally it would be good to have a type Atomic which mirrors the C++ 11 atomic type, adds some .NET magic and can fallback to locks when the data width is not supported on the platform.

Proposed Api

Interface, struct, class, extenstions (and some reference manipulation types)

namespace System.Threading.Atomics
{
    public interface IAtomic<T>  where T : IEquatable<T>
    {
        T Read();
        T Read(MemoryOrder order);
        void Write(T value);
        void Write(T value, MemoryOrder order);

        T Exchange(T value);
        T Exchange(T value, MemoryOrder order);

        bool CompareExchangeWeak(T value, T comperand);
        bool CompareExchangeWeak(T value, T comperand, MemoryOrder order);

        bool CompareExchangeStrong(T value, T comperand);
        bool CompareExchangeStrong(T value, T comperand, MemoryOrder order);

        // MemoryOrder variants skipped for brevity

        // Unsafe transformations, bypass the atomicity
        T UnsafeTransform(AtomicTransform<T> transformation);
        T UnsafeTransform(AtomicTransformParam<T> transformation, T val);

        // Atomic transformations, Func should be pure and repeatable in case of retry

        // Pure transform
        T Transform(Func<T, T> transformation);
        T Transform<TState>(Func<T, TState, T> transformation, TState state);

        // Conditional transforms e.g. Increment but only while < N
        T Transform(Func<T, T> transformation, Func<T, bool> condition);
        T Transform<TState>(Func<T, T> transformation, Func<T, TState, bool> condition, TState state);

        // Same data transform, apply if value is what is expected
        T Transform(Func<T, T> transformation, T comperand);
        T Transform<TState>(Func<T, TState, T> transformation, TState state, T comperand);
    }

    public delegate T AtomicTransform<T>(ref T input);
    public delegate T AtomicTransformParam<T>(ref T input, T val);

    public enum MemoryOrder
    {
        Relaxed,
        Consume,
        Acquire,
        Release,
        AcquireRelease,
        SequentiallyConsistent
    }
}

Atomic struct

[Disallow(Stack,Boxing)]
public struct ValueAtomic<T> : IAtomic<T> where T : IEquatable<T>
{
    private T _data;
    // allocated if not supported lock-free natively
    private object _lock;

    [JitIntrinsic]
    public static bool IsLockFree { get; }

    public ValueAtomic(T value)
    {
        _data = value;
        _lock = IsLockFree ? null : new object();
    }

    public T Read();
    public T Read(MemoryOrder order);
    public void Write(T value);
    public void Write(T value, MemoryOrder order);
    public T Exchange(T value);
    public T Exchange(T value, MemoryOrder order);
    public bool CompareExchangeWeak(T value, T comperand);
    public bool CompareExchangeWeak(T value, T comperand, MemoryOrder order);
    public bool CompareExchangeStrong(T value, T comperand);
    public bool CompareExchangeStrong(T value, T comperand, MemoryOrder order);
    public unsafe T UnsafeTransform(AtomicTransform<T> transformation)
        => transformation(ref _data);
    public unsafe T UnsafeTransform(AtomicTransformParam<T> transformation, T val)
        => transformation(ref _data, val);
    public T Transform(Func<T, T> transformation);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state);
    public T Transform(Func<T, T> transformation, Func<T, bool> condition);
    public T Transform<TState>(Func<T, T> transformation, Func<T, TState, bool> condition, TState state)
    {
        //T current = Read();
        //while (condition(current, state))
        //{
        //    T next = transformation(current);
        //    T prev = Interlocked.CompareExchange(ref _data, next, current);
        //    if (prev.Equals(current))
        //    {
        //        return next;
        //    }
        //    current = prev;
        //}
    }
    public T Transform(Func<T, T> transformation, T comperand);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state, T comperand);

    public static implicit operator T(ValueAtomic<T> atom) => atom.Read();
}

Atomic class (struct wrapper)

public class Atomic<T> : IAtomic<T> where T : IEquatable<T>
{
    private ValueAtomic<T> _atom;
    public static bool IsLockFree => ValueAtomic<T>.IsLockFree;

    Atomic(T value)
    {
        _atom = new ValueAtomic<T>(value);
    }

    public T Read()
        => _atom.Read();
    public T Read(MemoryOrder order)
        => _atom.Read(order);
    public void Write(T value)
        => _atom.Write(value);
    public void Write(T value, MemoryOrder order)
        => _atom.Write(value, order);
    public T Exchange(T value)
        => _atom.Exchange(value);
    public T Exchange(T value, MemoryOrder order)
        => _atom.Exchange(value, order);
    public bool CompareExchangeWeak(T value, T comperand)
        => _atom.CompareExchangeWeak(value, comperand);
    public bool CompareExchangeWeak(T value, T comperand, MemoryOrder order)
        => _atom.CompareExchangeWeak(value, comperand, order);
    public bool CompareExchangeStrong(T value, T comperand)
        => _atom.CompareExchangeStrong(value, comperand);
    public bool CompareExchangeStrong(T value, T comperand, MemoryOrder order)
        => _atom.CompareExchangeStrong(value, comperand, order);
    public unsafe T UnsafeTransform(AtomicTransform<T> transformation)
        => _atom.UnsafeTransform(transformation);
    public unsafe T UnsafeTransform(AtomicTransformParam<T> transformation, T val)
        => _atom.UnsafeTransform(transformation, val);
    public T Transform(Func<T, T> transformation)
        => _atom.Transform(transformation);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state) 
    => _atom.Transform(transformation, state);
    public T Transform(Func<T, T> transformation, Func<T, bool> condition)
        => _atom.Transform(transformation, condition);
    public T Transform<TState>(Func<T, T> transformation, Func<T, TState, bool> condition, TState state)
        => _atom.Transform(transformation, condition, state);
    public T Transform(Func<T, T> transformation, T comperand)
        => _atom.Transform(transformation, comperand);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state, T comperand) 
        => _atom.Transform(transformation, state, comperand);

    public static implicit operator T(Atomic<T> atom) => atom.Read();
}

Numeric Extensions

public static class AtomicNumericExtensions
{
    // For byte, short, ushort, uint, int, long, ulong, single, double

    public static int Add(this Atomic<int> atom, int value);
    public static int Subtract(this Atomic<int> atom, int value);
    public static int Multiply(this Atomic<int> atom, int value);
    public static int Divide(this Atomic<int> atom, int value);

    // For byte, short, ushort, uint, int, long, ulong

    public static int Increment(this Atomic<int> atom);
    public static int Increment(this Atomic<int> atom, int max);

    public static int Decrement(this Atomic<int> atom);
    public static int Decrement(this Atomic<int> atom, int min);

    public static int And(this Atomic<int> atom, int value);
    public static int Or(this Atomic<int> atom, int value);
    public static int Xor(this Atomic<int> atom, int value);
    public static int Not(this Atomic<int> atom);
}

Bool Extensions

public static class AtomicBoolExtensions
{
    public static bool TestAndSet(this Atomic<bool> atom);
    public static bool Clear(this Atomic<bool> atom);
}

Atomic Flagged References

public struct FlaggedReference<TRef>
    where TRef : class
{
    TRef Reference { get; set; }
    bool Flag { get; set; }
}

public static class AtomicFlaggedReferenceExtensions
{
    public static bool TestAndSet<TRef>(this Atomic<FlaggedReference<TRef>> atom);
    public static bool TestAndSet<TRef>(
                        this Atomic<FlaggedReference<TRef>> atom,
                        TRef expectedReference);
    public static bool Clear<TRef>(this Atomic<FlaggedReference<TRef>> atom);
    public static bool Clear<TRef>(
                        this Atomic<FlaggedReference<TRef>> atom,
                        TRef expectedReference);
}

Atomic Versioned References

public struct VersionedReference<TRef>
    : IEquatable<VersionedReference<TRef>>
    where TRef : class
{
    TRef Reference { get; set; }
    long Version { get; set; }

    public bool Equals(VersionedReference<TRef> other)
        => ReferenceEquals(Reference, other.Reference) 
            && Version == other.Version;

    public static implicit operator TRef(VersionedReference<TRef> atom) => atom.Reference;
}

public static class AtomicVersionedReferenceExtensions
{
    public static VersionedReference<TRef> Increment<TRef>(
                    this Atomic<VersionedReference<TRef>> atom)
                    where TRef : class;
    public static VersionedReference<TRef> Increment<TRef>(
                    this Atomic<VersionedReference<TRef>> atom,
                    TRef expectedReference)
                    where TRef : class;
    public static VersionedReference<TRef> UpdateIncrement<TRef>(
                    this Atomic<VersionedReference<TRef>> atom,
                    VersionedReference<TRef> newRefExpectedVersion)
                    where TRef : class;
}

Atomic Tagged References

public struct TaggedReference<TRef, TTag>
    where TRef : class 
    where TTag : struct
{
    TRef Reference { get; set; }
    TTag Tag { get; set; }
}

public static class AtomicTaggedReferenceExtensions
{
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TaggedReference<TRef, TTag> newTaggedReference,
                    TRef expectedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TRef newReference,
                    TRef expectedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TTag newTag,
                    TRef expectedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TaggedReference<TRef, TTag> newTaggedReference,
                    TTag expectedTag);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TRef newReference,
                    TTag expectedTag);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TTag newTag,
                    TTag expectedTag);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TRef newReference,
                    TaggedReference<TRef, TTag> expectedTaggedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TTag newTag,
                    TaggedReference<TRef, TTag> expectedTaggedReference);
    // essentially CompareExchange
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TaggedReference<TRef, TTag> newTaggedReference,
                    TaggedReference<TRef, TTag> expectedTaggedReference);
}

Atomic Double Reference

public struct DoubleReference<TRef> : IEquatable<DoubleReference<TRef>>
    where TRef : class
{
    TRef ReferenceLeft { get; set; }
    TRef ReferenceRight { get; set; }

    public bool Equals(DoubleReference<TRef> other)
        => ReferenceEquals(ReferenceLeft, other.ReferenceLeft) &&
            ReferenceEquals(ReferenceRight, other.ReferenceRight);
}

public static class DoubleReferenceExtensions
{
    public static DoubleReference<TRef> ExchangeLeft<TRef>(
                    this Atomic<DoubleReference<TRef>> atom,
                    TRef expectedRight) where TRef : class;
    public static DoubleReference<TRef> ExchangeRight<TRef>(
                    this Atomic<DoubleReference<TRef>> atom,
                    TRef expectedLeft) where TRef : class;
}

Transforms

The transforms give the flexibility to compose more complex atomic data structures; for example the int Add, Increment and Increment to limit can be implemented as

public static int Add<TAtomic>(this TAtomic atom, int value) 
    where TAtomic : IAtomic<int>
{
    return atom.UnsafeTransform(
        (ref int current, int inc) 
            => Interlocked.Add(ref current, inc), value);
}

public static int Increment<TAtomic>(this TAtomic atom) 
    where TAtomic : IAtomic<int>
{
    return atom.UnsafeTransform((ref int v) => Interlocked.Increment(ref v));
}

public static int Increment<TAtomic>(this TAtomic atom, int value, int max) 
    where TAtomic : IAtomic<int>
{
    return atom.Transform((current) => current + 1, (c, m) => c <= m, max);
}

Or an entirely new type that hadn't previously supported atomic updates, with new operations

public static Decimal Multiply<TAtomic>(this TAtomic atom, Decimal value)
   where TAtomic : IAtomic<Decimal>
{
    return atom.Transform((current, m) => current * m, value);
}

When the struct was 16 bytes _data would need to be 16 byte aligned to allow lock free use with LOCK CMPXCHG16b where available. Might be easier to enforce alignment than with https://github.com/dotnet/corefx/issues/10478?

VersionedReference and FlaggedReference should be 16 byte aligned in Atomic (don't need to be outside), as should TaggedReference when the struct is <= 16 bytes.

@davidfowl
Copy link
Member

@benaadams
Copy link
Member Author

benaadams commented Aug 1, 2016

Maybe a CompareExchange variant that takes a Func<T, T>?

bool Transform<TVal>(Func<TVal, TVal> transformation, T comperand) where TVal : struct, T;

Added

@benaadams benaadams changed the title Atomic<T> Atomic<T> (corefx) Aug 1, 2016
@benaadams benaadams changed the title Atomic<T> (corefx) Proposal: Atomic<T> (corefx) Aug 2, 2016
@joshfree
Copy link
Member

joshfree commented Aug 2, 2016

cc @sergiy-k

@clrjunkie
Copy link

I think in this case it’s ok skip the interface name suffix convention so things don’t read weird.

Consider changing ValueAtomic to AtomicValue.

Furthermore, the api really abstracts a memory location consider renaming to AtomicVariable
(I think atomic value generally means something that can’t be further divided)

Consider adding 'Extensions' suffix to Extension Method classes.

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

I think in this case it’s ok skip the interface name suffix convention so things don’t read weird. Consider changing ValueAtomic to AtomicValue.

There are 3 types, IAtomic, Atomic (class) and ValueAtomic (struct). The ValueAtomic follows the naming of ValueTuple and ValueTask for the struct based versions of Tuple and Task.

IAtomic is so generic constraints can be used for the two types in the extension methods.

Ideally I'd like ValueTask to be a heap+ref return only (array of ValueTask) struct. It would have a similar problem to SpinLock in stack space (e.g. no purpose and error though copying).

the api really abstracts a memory location

It can represent any type at any size; just will fallback to a lock when it can't be covered by a CAS or equivalent (e.g. TSX/LLSC).

Consider adding 'Extensions' suffix to Extension Method classes.

Done

@GSPP
Copy link

GSPP commented Aug 3, 2016

An alternative design would be to add the following:

  1. Add all these methods in the form of static methods.
  2. Support specifying memory alignment for structs.

That's more flexible than demanding that a certain type must be used.

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

Add all these methods in the form of static methods ... That's more flexible than demanding that a certain type must be used.

Doesn't allow fallbacks; if I had a 16 byte struct it may support CAS on x64, but not x86, a 64 byte struct may support an atomic update with TSX on Intel Skylake but not on earlier cpus, AMD or ARM.

As a user I just want an atomically updating data structure without worrying how its done; whether CAS, LLSC, Transactional memory etc.

As a power user I may want to know if IsLockFree and use a different strategy if not.

With the Atomic type set; I could define a 256 byte struct and have it happily be atomically updated by just enclosing it in the type.

Atomic<struct256byte> LargeStruct = new Atomic<struct256byte>();
var val = LargeStruct.Read();
val.x = 15;
val.name = "www";
LargeStruct.Write(val);

An if that became an lock-free operation at some point on a platform the framework could be updated to take advantage of it and no user code would need to change.

@clrjunkie
Copy link

The ValueAtomic follows the naming of ValueTuple and ValueTask for the struct based versions of Tuple and Task.

Why? How are these related to this api?

It can represent any type at any size; just will fallback to a lock when it can't be covered by a CAS or equivalent (e.g. TSX/LLSC).

So it abstracts a location in memory.. seems natural to call it as such (e.g AtomicVariable)

p.s

Not planning to over argue about it... food for your thought

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

The ValueAtomic follows the naming of ValueTuple and ValueTask for the struct based versions of Tuple and Task.

Why? How are these related to this api?

As there are two types:

Atomic<decimal> objectAtomicDecimal = new Atomic<decimal>();
ValueAtomic<decimal> valueTypeAtomicDecimal = new ValueAtomic<decimal>();

The class type (heap) to be passed as parameter; the valuetype to be embedded in class (heap) or used in arrays (heap).

@clrjunkie
Copy link

Aha... didn't pay quite attention to the Atomic class (wrapper)

@clrjunkie
Copy link

@benaadams

BTW.. I don't recall encountering code where an atomic variable was passed around as an argument for anything other then passing it to an interlocked method... even while reading through java code I only recall noticing their atomic abstraction used at the field level. Have you encountered any usage that justifies the 'class version'? Might worth posting an example here to be included in the api doc's as "sample usage". Genuinely interested.

@GSPP
Copy link

GSPP commented Aug 3, 2016

There's no need for a copy of the struct to put it on the heap. The framework has Box<T> for putting anything on the heap.

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

@clrjunkie Unless it could be enforced that a stack copy of the ValueAtomic couldn't be taken, I'd see Atomic as the go to type; that a user would use first due to the simpler name and it would carry less coding risk.

Say you had an atomic subtract extension for decimal which took a minimum value:

static ValueTuple<bool, decimal> Subtract<TAtomic>(this TAtomic atom,
    decimal value,
    decimal min)
    where TAtomic : IAtomic<decimal>;

Then you took a function local copy:

class Account
{
    ValueAtomic<decimal> _balance = new ValueAtomic<decimal>();

    public bool Withdraw(decimal amount, out string message)
    {
        var balance = _balance; // uh oh - different copy

        var result = balance.Subtract(amount, 0);

        var success = result.Item1;
        var currentBalance = result.Item2;

        if (success)
        {
            message = $"{amount} was withrawn from your account and you have {currentBalance} remaing";
        }
        else
        {
            message = $"You only have {currentBalance} which is insufficent to withrawn {amount}"
                        + $"as that would leave you with {currentBalance - amount}";
        }

        return success;
    }
}

You may be confused why the actual _balance had not changed. If you were using the class version the results would be as expected.

So I'd prefer the struct version; but the class version comes with less hazards - thus offer both.

even while reading through java code I only recall noticing their atomic abstraction used at the field level

Does Java have value types? Everything is mostly an object type?

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

The framework has Box<T> for putting anything on the heap.

Box<Atomic<decimal>> balance = new Box<Atomic<decimal>>(1000);
// ...
balance.Value.Subtract(10, 0);

Is just unhappy code... So you'd probably go:

Box<Atomic<decimal>> objectBalance = new Box<Atomic<decimal>>(1000);
// ...
var balance = objectBalance.Value;
balance.Subtract(10, 0);

And now you are in trouble for the reason mentioned in previous comment as you are operating on a different instance.

@tenor
Copy link

tenor commented Aug 3, 2016

I did something similar with InterlockedBoolean which is a great convenience structure for storing atomic boolean values.

However, it can't be passed around since it's an ordinary struct. Perhaps the compiler can issue an error/warning if an instance of the proposed value is passed around.

I also think naming these types InterlockedValue and InterlockedObject are more congruent with .NET's InterlockedXXX naming style.

@clrjunkie
Copy link

@benaadams

Actually I started to think about it the other way around: go with class, drop struct, exactly because of boxing penalty.. and I don't see the need for a huge array of atomics and the boxing conversion penalty of both IAtomic where T : IEquatable can have a negative impact overall.

And why would anyone do this:

var balance = _balance; // uh oh - different copy

and not work with the _balance directly to extract the wrapped value via the Read method?

You mentioned that the motivation for the class version was for passing it as parameter, and my understanding was that your intention was to reduce the copy size. Java currently has only reference types and my point was that I didn't see those passed around anyway so I didn't understand the motivation compared to a field allocated struct.

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

@clrjunkie people take different approaches to things, and its to leave the flexibility open. Depends whether you are taking an object orientated approach and adding 12 bytes plus a GC handle to every object or a data orientated approach for fast batch processing where you'd stream through arrays of data.

ref returns for example fit more with a data orientated approach. And you could ref return an ValueAtomic from the array

boxing conversion penalty

Shouldn't be any boxing? There's an extra pass though concrete call to the struct which should be inlinable.

@clrjunkie
Copy link

@benaadams

var balance = objectBalance.Value;
balance.Subtract(10, 0);

but 'balance' is the value how can it have 'Subtract' ?

@benaadams
Copy link
Member Author

but 'balance' is the value how can it have 'Subtract' ?

If Box<T> was used it would be the Atomic

@clrjunkie
Copy link

clrjunkie commented Aug 3, 2016

@benaadams

If Box was used it would be the Atomic

You mean in case the user did var balance = objectBalance, but there is no reason to do so other then by mistake and there can be many other mistakes...

Looks like another reason why having a struct version is bad.

Shouldn't be any boxing?

I meant the "box" problem in general but as soon as you starting invoking methods that call the atomic through the base interface methods, notability IEquatable the unboxing penalty will start showing.

Why do we need this?

@clrjunkie
Copy link

@benaadams

If Box<T> was used it would be the Atomic

How can the Atomic be returned and not T ?

public class Atomic<T> : IAtomic<T> where T : IEquatable<T>
private ValueAtomic<T> _atom;

Atomic(T value)
{
  _atom = new ValueAtomic<T>(value);
} 

public T Read() => _atom.Read();
public struct ValueAtomic<T> : IAtomic<T> where T : IEquatable<T>
private T _data;

public ValueAtomic(T value)
{
    _data = value;
}
public T Read();

@clrjunkie
Copy link

clrjunkie commented Aug 3, 2016

@benaadams

I think you consider the large collection (was array) scenario to be very important, no problem.. what's the use-case? Maybe it involves comparing or lookup and the unboxing effect will introduce a negative effect overall?

@clrjunkie
Copy link

@benaadams

people take different approaches to things, and its to leave the flexibility open.

Are you advocating that each class in the framework should also have a struct version to satisfy every potential need or programing approach?

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

If Box was used it would be the Atomic

How can the Atomic be returned and not T ?

Was in answer to @GSPP suggestion about using Box<T> instead.

The value-type Atomic should be "non-copyable". So casting it to object, to IAtomic, assigning to a local or assigning another heap variable to it should all be compile errors; as they will take a copy and not modify the original. Passing as a ref return or modifying in place should be fine.

Kind of like Span<T> is stack only; so you can't cast it to object or take a heap reference. Equally SpinWait should probably be stack only like Span<T> as more than one thing shouldn't spin on it, and SpinLock should have the same constraints the value-type Atomic as as soon as a copy is taken of it is no longer the same lock.

The IAtomic interface is for the generic constraint to apply so one set of extension methods can apply to both types and so the atomic type can be used with test IoC, DI if someone wants to create a different type that behaves functionally the same etc.

I think you consider the large collection (was array) scenario to be very important, no problem.

Going back to the Java comparison it has AtomicReferenceArray for similar.

However, with the struct type you could use a normal .NET array of ValueAtomic<object>[100] for similar effect; or project though a ReadOnlySpan<ValueAtomic<object>> for extra safety.

what's the use-case?

If you were organising data in a column store manner and say the data type was Decimal (16 bytes) then if you wanted to do a Sum:

ValueType: for loop over array, proceed in 32 byte skips, 2 per cache line (16 byte decimal + 8 byte null lock pointer + alignment)
ObjectType: for loop over array, proceed in 8 byte skips (pointer to Atomic), 8 per cache line, dereference (potentially scattered data, otherwise) proceed in 48 byte skips 1.5 per cache line (12 byte object header + 16 byte decimal + 8 byte null lock pointer + alignment)

So the value type would be faster.

And if its any array of references then using a object type atomic would mean to use the object would require a double deference. Array->Atomic->Object->Function

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

Are you advocating that each class in the framework should also have a struct version to satisfy every potential need or programing approach?

No; just certain building blocks, that represent extensions over simple types. If you are dealing with small types like int, decimal, reference, when adding extra features to it you want to be as light as possible else the object overhead dominates; though there are reasons for both.

Task<T>->ValueTask<T>
Tuple<T1,T2,..>->ValueTuple<T1,T2,..> (not sure about Tuple)
lock(object)->SpinLock
T[]->Span<T>/ReadOnlySpan<T>
Atomic<T>->ValueAtomic<T>
etc

However, I do think it would be excellent if escape analysis was done on function allocated local objects and they were allocated on the stack if they didn't escape rather than the heap; but that's another story...

@clrjunkie
Copy link

@benaadams

If you were organising data in a column store manner and say the data type was Decimal (16 bytes) then if you wanted to do a Sum:

But your doing 'Sum'... you would probably need to lock some range to get a consistent point in time snapshot, how does atomicity come into play here per data item??

@benaadams
Copy link
Member Author

benaadams commented Aug 3, 2016

how does atomicity come into play here per data item

Maybe poor example, call it EstimatedSum :)

Preventing torn reads; admittedly you could get the same effect by using Interlocked.CompareExchange and having every entry an object; or if you were using object pooling a 128bit Interlocked.CompareExchange which is kind of where this started as a more general form.

@ghost
Copy link

ghost commented Aug 29, 2016

@benaadams, (old news) MSVCRT as of VS2015 Update 3 has also implemented full set of C++11, 85%+ of C++14 and ~30% of C++17, 95% of C99 and 18% of C11 (all ISO standards). So std::atomic is available there as well: https://msdn.microsoft.com/en-us/library/hh567368.aspx#atomics 😉

@kouvel
Copy link
Member

kouvel commented Oct 24, 2016

@whoisj:

The Windows operating system exposes these constructs, which means it's very possible to implement them on Intel and ARM.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686360(v=vs.85).aspx
Looks like they support quite a number of "interlocked" operations, up to 128 bits.

It looks like InterlockedCompareExchange128 is only available on x64 on Windows:
https://msdn.microsoft.com/en-us/library/26td21ds.aspx
https://msdn.microsoft.com/en-us/library/bb514094.aspx

Anything involving mutating a "reference structure" (ie any thing that is or contains a reference) cannot be treated as flat memory, therefore there are severe limiting factors here.

What is the issue with mutating a struct containing a reference? Why can it not be treated as flat memory?

@kouvel
Copy link
Member

kouvel commented Oct 24, 2016

I like the proposal, but I think it would be good to provide only a value type atomic (have Atomic<T> be a value type) and let the user determine how to store and use that data correctly so that multiple threads can share it. That would simplify it a bit and eliminate the interface, which I feel mostly wraps boxing, only marginally improves the usability, and may even add some confusion.

It looks like we would need the following:

  • At JIT time, per expansion of ValueAtomic, determine if a sufficiently-sized interlocked compare-exchange is available on the platform
  • At JIT time, enforce alignment on ValueAtomic if interlocked operations will be used. If a lock will be used, alignment is not necessary.
  • For ValueAtomic types that can use interlocked operations, a CompareExchange would be replaced by the JIT with the instruction. IsLockFree could also behave as an intrinsic for these types.

@benaadams, are you aware of any real-world scenarios that would greatly benefit from this feature? For instance, are there scenarios that are blocked from a performance perspective due to the inability of being able to do interlocked operations on structs?

CC @jkotas @gkhanna79 @cmckinsey @vancem

@kouvel kouvel removed their assignment Oct 24, 2016
@benaadams
Copy link
Member Author

benaadams commented Oct 24, 2016

@kouvel there are a number of lock free algorithms that are blocked from not having it.

So for example the lock-free Concurrent collections and the threadpool queues can't be fully non-allocating and use pooled nodes as they would suffer from the ABA problem. With the 128 bit CAS then you can add a state/tag/counter to the pointer to resolve this.

A 128 bit CAS by itself https://github.com/dotnet/corefx/issues/10478 will only work on certain CPU classes; so the Atomic is better as it provides a lock fallback so can work on all platforms; though obviously less than ideal on some. For example it may be preferable to allocate and still be lock free if locking is involved - hence the static IsLockFree bool; which allows you to choose the algorithm if that is important.

@vancem
Copy link
Contributor

vancem commented Oct 24, 2016

There is a fair bit of detail here, and I have not really given it too much thought, but I have some observations / concerns.

  1. It is VERY easy to get low-lock code correct, so ideally uses of Atomic would be RARE. Thus the 'benefit' part of a cost-benefit analysis is not going to be high (since relatively few users benefit). This argues AGAINST doing anything tricky (e.g. special JIT support etc). It also argues to keep the implementation 'light' (simple/small). This argues against the IAtomic interface unless we can clearly articulate the extra value it adds.

  2. It is unclear how the alignment restrictions of a CAS128 would be handled. To guarantee alignment, we need the GC heap to be as aligned (because objects move around and they have to continue to be aligned) The more aligned the GC heap is, the more 'waste' you have, so we really don't want to align the GC heap more than we have to (e.g. pointer sized). It is unclear how to resolve this tradeoff.

  3. Given that I am arguing that simplicity is king we have to compare this against the 'trivial' solution of defining an Int128 and defining

Interlocked.InterlockedCompareExchange(ref Int128, Int128, Int128)

Note that even if we did decide to do implement Atomic, we probably want implement it by providing low level support (like the Interlocked operation above), and then implement Atomic on top of it. This keeps the runtime small. If we can't do that (that is we need magic), we need to think about it because by issue (1) above, this is probably not worth implementing much magic.

Having said all this, I am aware that most other languages/systems do implement something very much like Atomic, so you can argue that this just filling a competitive hole. However I would like this hole filled with as little magic as possible. Ideally it is a class library sitting on top of very little magic support in CoreCLR.

@benaadams
Copy link
Member Author

@vancem https://github.com/dotnet/corefx/issues/10478 is for 128 bit InterlockedCompareExchange; but then you still need alignment for the Int128s which can't be guaranteed for arbitrary structs and has no fallback for when there is either no cpu support (x86, some arm) or platform support (not implemented in the clr yet)

@vancem
Copy link
Contributor

vancem commented Oct 25, 2016

Yes, it seems that magic is needed to solve the alignment issues for CAS128, which makes its cost higher and thus its cost-benefit proposition worse. Right now we don't really have a thought-through design for CAS128, and it 'feels' like it is probably too expensive for its value (so far at least...).

If we want to proceed with this, we need to clearly articulate the cost (how we would solve the alignment issue and how expensive that would be), as well as the value (how much faster would these lock-free algorithms be from what we have today.

P.S: (slightly off topic, but is relevant in deciding how much better CAS128 algorithms are from Cas64 algorithms.

For what it is worth in garbage collected systems when you are playing with GC references (which you typically are) the ABA problem is sometimes not as problematic, because, unlike native code, a reference can't change identity (can't be deleted and that same memory morphed into a different object), as long as there is an object reference pointing at it (which you typically naturally have in the code doing the update).

For example normally implementing a linked list stack with PUSH and POP operations would suffer from the ABA problem if POP was implemented in the 'obvious' way with CAS. However that implementation DOES work in a GC environment if the list never tries to reuse link cells because the linked list element being POPed can't be recycled (and thus mutated) while the POP operation has a reference to it.

@benaadams
Copy link
Member Author

@vancem yes, I mean you can't do object pooling + CAS on x64 in .NET currently; you can only do GC + CAS on x64.

You could in theory do object pooling + CAS on x86 in .NET though I haven't investigated that particularly as its not relevant to my domain (need Vector ops too much)

@benaadams
Copy link
Member Author

@GrabYourPitchforks
Copy link
Member

What's the current status on this? If it's unlikely to be done in the near future I'm going to reopen the proposal for the extra bitwise APIs on Interlocked given that there are immediate use cases for it and we're already making changes to that type.

@kouvel
Copy link
Member

kouvel commented Jul 10, 2018

It seems unlikely to happen in the near future. Interlocked should probably have the equivalent bare operations despite this type anyway.

@MBurtsev
Copy link

A year has passed but nothing has moved )

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@stephentoub stephentoub modified the milestones: 5.0, Future Feb 25, 2020
@stephentoub stephentoub removed the untriaged New issue has not been triaged by the area owner label Feb 25, 2020
@zachsawcs
Copy link

An array backed concurrent stack such as the Stroustrup one would need the CAS128 instruction to avoid ABA. Our problem with the LinkedList based concurrent stack implementation is that we have several million long lived items, and they get popped and pushed over time but the ones lower in the stack generally don't change too much. These items don't get cleaned up until Gen2 runs and the few millions that haven't changed still need its pointers traced by the GC. Stroustrup's implementation creates a lot of short lived objects if you have contention but these will never get past ephemeral gens. If CAS128 is available, Stroustrup's algo could be a good replacement for the current ConcurrentStack implementation,

@ayende
Copy link
Contributor

ayende commented Apr 24, 2020

@zachsawcs Do you have reference for that impl?

@zachsawcs
Copy link

zachsawcs commented Apr 24, 2020

http://www.stroustrup.com/lock-free-vector.pdf

I implemented it from the paper above, but ran into all sorts of trouble with C# - Interlocked.CompareExchange was the first problem preventing me from creating a generic typed ConcurrentStack. I managed to workaround that by doing unsafe cast (Unsafe.As) with a factory for the different types.

But the biggest problem was C# just doesn't support 128-bit compare exchange, preventing me from solving the ABA problem for 64-bit data types. There's currently no way to implement such an algo in C#.

@samsosa
Copy link

samsosa commented Apr 24, 2020

Related: #31911

@timcassell
Copy link

timcassell commented Jul 27, 2022

This would help ManualResetValueTaskSourceCore where it is theoretically possible for a race condition to cause the wrong state object to be used (even though that would be user error anyway). And the extra null check could be removed.

// We need to set the continuation state before we swap in the delegate, so that
// if there's a race between this and SetResult/Exception and SetResult/Exception
// sees the _continuation as non-null, it'll be able to invoke it with the state
// stored here. However, this also means that if this is used incorrectly (e.g.
// awaited twice concurrently), _continuationState might get erroneously overwritten.
// To minimize the chances of that, we check preemptively whether _continuation
// is already set to something other than the completion sentinel.
object? oldContinuation = _continuation;
if (oldContinuation == null)
{
_continuationState = state;
oldContinuation = Interlocked.CompareExchange(ref _continuation, continuation, null);
}

[Edit] Or I guess #31911 would solve the same problem.

@bamo-volue
Copy link

bamo-volue commented Dec 14, 2022

Any update on this?
Lack of a threadsafe lockfree wrapper type (e.g. over bool), that is intuitive to work with, seems to me like a significant defect in the language, considering even c++ has had std::atomic since c++11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Threading
Projects
None yet
Development

No branches or pull requests