Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cdac] Add data stream library #99442

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions docs/design/features/data-stream.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
## .NET runtime data stream

The .NET runtime data stream is mechanism for the runtime to encode information about itself in a way that is accessible to diagnostic tools. This enables de-coupling of the tooling (for example, the DAC) from the details of a specific version of the runtime.

Data Streams consist of three concepts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Data Streams consist of three concepts.
Data Streams consist of four concepts.


1. A collection of type descriptions.

2. A collection of instances (i.e., pointers) of types described in (1).

3. A collection of value blobs.

4. A versioning scheme that permits evolution of (1), (2), and (3).

The data streams model begins with a header that captures the minimum needed data—`data_stream_context_t`. The header contains a mechanism, `magic` field, for confirming the memory is what we expect _and_ also serves to indicate the endianness of the target process. The endianness of the target is important for scenarios where the reader is running on another machine. The header also contains data stream versioning and statically allocated stream count.

### Streams and Blocks (ver.1)

The stream, `data_stream_t`, is an opaque data structure to the user and is an implementation detail. The only indication of changes to it are captured in the version value contained on the header. The current design will be described and is considered version "1".

A stream is a singly linked list of uniform sized blocks that are allocated on demand. The stream itself is a small type used to hold the head of the list of blocks, the `curr` pointer, and a pointer to the data stream header, `cxt`.

The core of the stream is the block, `data_block_t`. This data structure is a contiguous allocation with (4) pointer slots—`begin`, `end`, `pos` and `prev`. The `begin`, `end` and `prev` pointers are set on allocation and never change. The internal block allocation scheme is commonly called a "bump allocator". The `pos` pointer represents the current spot in the range between `begin` and `end`. Blocks are filled in reverse order (`end` to `begin`) to ensure reading of a stream is always performed in reverse chronological order.

Both `pos` value on the `data_block_t` and `curr` on the `data_stream_t` are both updated atomically and expected to be lock-free.

Within each block an entry data structure, `stream_entry_t`, is used to quickly and safely add new entries. An entry consists of a field, `offset_next`, to hold the relative offset from the current entry to the next. This offset concept makes reading easy since once the entire block is read from the target no further memory reads are needed to walk the block.

The simplicity of the streams and blocks makes reading from another process simple.

#### Types (ver.1)

The collection of types are all recorded in the first stream in the `data_stream_context_t` type's `streams` field.

Types are expressed with minimal data to efficiently version and read from a target process. Type definitions start with an identifying tuple—`type` (numeric ID), `version` and `name`. The tuple's design facilitates creation of a map look-up on the reader side and a way to evolve the definition safely on the target side.

The layout of a type is expressed by the size, in bytes, and a collection of relevant field offsets and their type, `field_offset_t`. Field offset count is computed by reading in two pointer sized values and then dividing the remaining space by the size of the `field_offset_t` data structure. Both of these components are needed to satisfy the evolution and reading efficiency goals. The size allows the reader to read an entire type in one operation and the field offsets need not be exhaustive if they provide no utility on the reader side.

An example of the current memory layout of a type entry is below.

```
| type_details_t* | # Pointer in target process
| size_t | # Total size, in bytes, of the type
| field_offset_t 1 | # First field offset
| ... |
| field_offset_t N | # Last field offset
```

#### Instances (ver.1)

All streams, other than the first stream in the `data_stream_context_t` type's `streams` field, which is used for types, contain instances.

Instances are defined as a numeric ID and a valid pointer in the target process. The numeric ID is expected to exist in one of the type identifier tuples defined above.

An example of the current memory layout of an instance entry is below.

```
| uint16_t | # Type numeric ID
| intptr_t | # Target process memory
```

### Target usage

Consumption of data streams should start with a mechanism for defining the type identity tuple that can be shared between the target and reader.

A `data_stream_context_t` instance should be allocated, statically or dynamically, in a manner where its address is discoverable by the reader process. The `data_stream_context_t` instance must be initialized and static stream count and block sizes defined. There must be at least a single stream size for use in the type definitions. It is expected that the type's stream has a block size that is sufficient to hold all type defintions without an additional allocation.

Type versions or names are not used directly by the target process. The target process records these values as an indication for the reader only.

**NOTE** Registration of types should be done prior to any recording of instances. It is assumed that all sharable types are known statically.

After type registration is performed, streams can be aquired by components in the target process and typed instances inserted into the stream. Adding an instance to a stream is considered thread safe. The typing of an instance should be done via a numeric ID.

### Reader usage

The reader should first define a series of type names that it is able to interpret and consume. These type names should match the names defined by the target. These names could be used to map types in the target process with their numeric ID and version. The reader is not expected to have any hardcoded numeric type IDs as these are subject to change between target versions.

The reader is expected to be resiliant in recieving an unknown version of a type and gracefully interpret it. Two examples of graceful interpretation are describing it as `"Unknown"` or printing its memory address in the target process.

After acquiring the target process's `data_stream_context_t` address, it should be validated and endianness computed.

The first time the data stream is read, the target processes types are enumerated in reserve chronological order (Last In/First Out) and this data may be used throughout the lifetime of the target process. During the enumeration of types the following can be done:

* Creation of a fast mapping from name to numeric ID.

* Creation of look-up map to type details (e.g., field offsets).

* Validation of supported type versions.

After type enumeration is complete, instance streams can be enumerated and interpreted. The size contained within the type description allows the reader to read in the entire type and then use field offsets to poke into that memory. The reading in of the entire data type helps with efficiency and acts as a versioning resiliance mechanism. Adding new fields to a type, without changing the version, need not represent a breaking change.

### Design FAQs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These streams appear to be a redesign of a NativeAOT contract that was already created several years ago. I'm not tied to the NativeAOT design but I do think we should either state why the new design elements are preferable to what already exists, or we should better align with the existing design.

At first glance this is what the comparison looks like to me:

DotNetRuntimeDebugHeader and data_stream_context_t
NativeAOT has DotNetRuntimeDebugHeader (RDH), this design has data_stream_context_t. The purpose and fields are similar but with minor variations in layout that make them incompatible. data_context_t has both a size and a version, RDH has version only which implies the size. RDH has an explicit minor version allowing for back-compatible changes, I'm guessing data_stream_context size could also be used as a similar mechanism by increasing the size while keeping the version the same. NativeAOT doesn't define stream as a general purpose linked list of blocks, but rather it defines a DebugTypeEntry array and a GlobalValueEntry array that will hold similar data as streams[0] and streams[2] respectively. data_stream_context uses an extra indirection for the streams array but it appears to have a statically known size so I'm not sure the extra indirection is needed?

streams
In NativeAOT the type and global value information are represented as standard C arrays of a fixed type (either DebugTypeEntry or GlobalValueEntry). There is one contiguous memory allocation for each, they are null terminated to indicate the size to the reader, and all elements are fixed size so there are no offsets to the next stream element.

type and field offset information
NativeAOT is using a sequence of DebugTypeEntry structs, one per field (perhaps the struct would have been more aptly named DebugFieldEntry). Type and fields are referenced by name. There is no equivalent of the per-type ordinal or versioning information in this design.

global value information
NativeAOT uses GlobalValueEntry which references each global by name and associates it with an address. In this design name is replaced by an int16 type. I'm not sure if this type is intended to be a free-form ordinal or it specifically references the same type ordinals used in the type stream. If it is a reference to the type stream that suggests that multiple globals of the same type can't be distinguished?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NativeAOT uses GlobalValueEntry which references each global by name and associates it with an address. In this design name is replaced by an int16 type. I'm not sure if this type is intended to be a free-form ordinal or it specifically references the same type ordinals used in the type stream. If it is a reference to the type stream that suggests that multiple globals of the same type can't be distinguished?

I think type is a bad name. A better name is "role". In this design, the fields of structs and globals have a role which is the int16_t that indexes an entry in the role stream. Each role has a version. And the role together with the version imply something about the physical layout of the entity that has that role. So yes: if there are two globals in the runtime that have the same underlying physical representation, they will have two separate roles in the typestream. (Likewise, if a single struct has two fields with the same physical representation, they will have two separate roles)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining 👍 , it sounds like that is a standard ordinal similar to what would be found in a dll global export table which makes sense to me. I wouldn't be confused if you called it 'role' though 'ordinal' is the term I'm most familiar with for that concept.

At the broader scale IMO some aspects of the new design are better, some aspects of the old design are better, and some aspects seem different + equally effective. I'm not sure how you all feel about it but right now I'd lean towards something like:

  • Keep the DotNetRuntimeDebugHeader, likely with a bump to major version 2
  • don't use streams to encode global type/instance/value lists, instead use flat arrays
  • Replace DebugTypeEntry and GlobalValueEntry with structs that reference by ordinal to make the data more compact
  • I'm skeptical about the per-type version numbers, it feels unnecessarily fine-grained. We should probably have a discussion about how we plan to handle versioning generally.

Copy link
Member

@jkotas jkotas Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a discussion about how we plan to handle versioning generally.

+100

We have been using the data contract idea for native AOT for past two years, so we can get some idea about how the contract versions. I have created a diff of the contract between .NET 7 and .NET 9 P1: https://gist.github.com/jkotas/301687d631c373cfd5fe4f9d18cc801f/revisions .

The observation is that the field offsets alone change rarely. If they change, it is for the "dumping ground" datastructures like Module. The changes in algorithms (that are often accompanies with changes in the set and purpose of the fields) are more common.

Our design has been oriented too much around types and offsets. I think the design needs to be oriented more about the data contracts required to achieve tasks. Here are some examples:

  • CodeMap - contract that describes how to go from IP to (start IP, offset and internal runtime method handle)
  • MethodDescToken - contract that describes how to go from internal runtime method handle to (Metadata blob, token)
  • ECMAMetadata - contract that describes how to interpret metadata blob (this is one is easy - it is written down in ECMA spec)
  • GCHeap - contract that describes how to enumerate all objects on GCHeap
  • GCHeapInternal - extension of GCHeap contract that describes internal GC data structures that can be used to validate GC Heap self-consistency (!VerifyHeap command in SOS)

Each of these contracts should have a revision that describes the algorithm. The contract can be parametrized by global variable addresses, field offsets, bit masks, etc. Field offsets or bit masks are only needed for cases that are likely to change. They can be implied by the contract revision most of the time. It is a trade-off between flexibility and compactness. We will learn over time where to strike the right balance.

The set of contracts supported by the runtime instance, including the contract revision and the contract parameters should be attached to DotNetRuntimeDebugHeader. I do not have strong opinion about the encoding as long as it is reasonably compact and cheap to payload into a minidump.


---

**Q1** Why are streams allowed to grow?

**A1** Consider the case where a data structure in the target process has a specific use case but the reader has either stricter or looser requirements. An example would be a thread pool used in the target process. This structure would ideally only be concerned with current threads in the target process, exited threads having been removed. However, the reader process likely has a need for knowing when a thread instance has exited to update its own internal state. A possible solution is to fully query the thread pool data structure each time. However, if instead entries for created and deleted threads are recorded in a stream, the reader only needs to know the delta as opposed to querying the thread pool each time. The logic follows for any data structure that contains objects with transient lifetimes.
Copy link
Member

@noahfalk noahfalk Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should flip the question around a bit. Lets assume by definition that streams are data structures that support growth. Do we expect a global list of types and a global list of instances are scenarios that need growth? Both NativeAOT and CoreCLR have similar lists, neither of them currently support growth, and it has never been an issue. From my POV trying to use streams for that purpose seems to be introducing extra complexity in the data contract where a flat array would suffice.

Where growth appears to enter the picture is the assumption that we should use the same list both to encode global singletons (examples) and also dynamically created objects of interest (such as all Threads). I don't believe we are getting any benefit by combining them though.

However, if instead entries for created and deleted threads are recorded in a stream, the reader only needs to know the delta as opposed to querying the thread pool each time.

I don't think we should be maintaining a list of every deleted thread because the potential size of that list is unbounded. I'd advocate that memory usage in this feature must remain close to what it currently is regardless of app behavior. If an app wants to create and destroy threads continuously (threadpool with variable load may do just that) then that app needs to keep working.

It seems like there is an underlying assumption in the design that streams are a generally good data structure that will be used repeatedly for all sorts of different data and I don't agree with that premise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there is an underlying assumption in the design that streams are a generally good data structure that we will be used repeatedly for all sorts of different data and I don't agree with that premise.

The streams idea may be a useful concept for certain contracts (in particular contracts oriented around live-debugging). I agree with Noah that the streams idea does not seem to be applicable for the scenarios that we are going after first (#99298).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be maintaining a list of every deleted thread because the potential size of that list is unbounded.

We are not intending to do this. It was an example (perhaps poor) of the potential flexibility. I'd agree we currently don't have a specific initial use case for that flexibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not seem to be applicable for the scenarios that we are going after first (#99298).

I think this is a slight confusion and we keep focusing on this, which I don't fully understand. Note that the data_stream library itself was written as a contract that is not concerned with the contents. It can be dynamic or static - one initializes with a series of sizes (dynamic) the other could accept a large block of memory (static, but not implemented). The decision here to abstract that away in a library was precisely to avoid conflating dynamic vs static concerns. If we had avoided the data stream abstraction, it likely would force us down assumptions about the static nature, this design avoids that mistake with the abstraction. @elinor-fung mentions this above as well, it is an implementation detail that has costs we may not want to accept right now, however in the current form we have more degrees of freedom with the abstraction than without.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand how the streams fit into the minidump scenarios.

If we go with the streams idea, what are going to be the streams saved into the minidump?

Copy link
Member

@noahfalk noahfalk Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Elinor] I'd agree we currently don't have a specific initial use case for that flexibility.

Any objections to waiting until we do have such a use case before introducing it?

[Aaron] I think this is a slight confusion and we keep focusing on this, which I don't fully understand... The decision here to abstract that away in a library was precisely to avoid conflating dynamic vs static concerns.

The reason I focus on it is because abstracting it behind a library API doesn't resolve all the concerns:

  • One of the major outputs of this feature is a specification for a data format. That format spec is intended to allow anyone to create their own parser for this data. The existence of parsing code in one library doesn't preclude developers from needing to work directly with the underlying data.
  • The choice of data structures directly impacts performance characteristics such as application size on disk, application size in memory, dump size on disk, and data access time.
  • A growing stream vs. a static stream have different rules for when the data needs to be re-queried. This means callers will still care which one they are dealing with regardless of the code that does a point-in-time enumeration.


---

**Q2** Why are the contents of a stream immutable?

**A2** Having streams that are mutable means the reader _must_ always re-read the full stream to validate for updates. If the contents of a stream are instead immutable _and_ in reverse chronological order (LIFO), then entries for "deleted" or "invalidated" data are possible, which enables readers to consume deltas and reduce cross-process inspection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least some of these data streams need to included in the minidump payload. How are we going to decide which data streams need to be included in the minidump payload?

Also, we need to keep an eye on the overhead that the data streams add to the raw minidump size so that it does not get out of control.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to keep an eye on the overhead that the data streams add to the raw minidump size so that it does not get out of control.

Maybe we should have an explicit design goal around this, like that the data streams won't grow the minidump size by more than X (X can be either relative - like 5% and/or absolute like 5kB).

---
3 changes: 3 additions & 0 deletions src/coreclr/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,9 @@ if(CLR_CMAKE_TARGET_WIN32)
add_subdirectory(gc/sample)
endif()

# Data stream object library that will be linked in to the CLR
add_subdirectory(${CLR_SRC_NATIVE_DIR}/datastream ${CLR_ARTIFACTS_OBJ_DIR}/datastream/${CLR_CMAKE_TARGET_OS}.${CLR_CMAKE_TARGET_ARCH}.${CMAKE_BUILD_TYPE})

#-------------------------------------
# Include directory directives
#-------------------------------------
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/components.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ add_component(iltools)
add_component(nativeaot)
add_component(spmi)
add_component(debug)
add_component(cdac)

# Define coreclr_all as the fallback component and make every component depend on this component.
# iltools and paltests should be minimal subsets, so don't add a dependency on coreclr_misc
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/dlls/mscoree/coreclr/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ set(CORECLR_LIBRARIES
interop
coreclrminipal
gc_pal
datastream
)

if(CLR_CMAKE_TARGET_WIN32)
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/vm/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,7 @@ set(VM_SOURCES_WKS
customattribute.cpp
custommarshalerinfo.cpp
autotrace.cpp
debug_stream.cpp
dllimport.cpp
dllimportcallback.cpp
dynamicinterfacecastable.cpp
Expand Down Expand Up @@ -410,6 +411,7 @@ set(VM_HEADERS_WKS
customattribute.h
custommarshalerinfo.h
autotrace.h
debug_stream.h
diagnosticserveradapter.h
dllimport.h
dllimportcallback.h
Expand Down
5 changes: 5 additions & 0 deletions src/coreclr/vm/ceemain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@
#include "jithost.h"
#include "pgo.h"
#include "pendingload.h"
#include "debug_stream.h"

#ifndef TARGET_UNIX
#include "dwreport.h"
Expand Down Expand Up @@ -822,6 +823,10 @@ void EEStartupHelper()
InitializeDebugger(); // throws on error
#endif // DEBUGGING_SUPPORTED

// Initialize the debug stream in the runtime.
if (!debug_stream::init())
IfFailGo(E_FAIL);

#ifdef PROFILING_SUPPORTED
// Initialize the profiling services.
hr = ProfilingAPIUtility::InitializeProfiling();
Expand Down
20 changes: 20 additions & 0 deletions src/coreclr/vm/debug_stream.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

#include "debug_stream.h"
#include <minipal/utils.h>
#include <stdio.h>

namespace
{
data_stream_context_t g_data_streams;
}

bool debug_stream::init()
{
size_t sizes[] = { 4096, 8192, 2048 };
if (!dnds_init(&g_data_streams, ARRAY_SIZE(sizes), sizes))
return false;

return true;
}
14 changes: 14 additions & 0 deletions src/coreclr/vm/debug_stream.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

#ifndef DEBUG_STREAM_H
#define DEBUG_STREAM_H

#include <datastream/data_stream.h>

namespace debug_stream
{
bool init();
}

#endif // DEBUG_STREAM_H
19 changes: 19 additions & 0 deletions src/native/datastream/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
set(CMAKE_INCLUDE_CURRENT_DIR ON)

set(DATASTREAM_SOURCES
data_stream.c
data_stream.h
)

add_library(datastream OBJECT ${DATASTREAM_SOURCES})

add_library_clr(datastreamlib SHARED ${DATASTREAM_SOURCES})
target_compile_options(datastreamlib PRIVATE -DBUILD_SHARED_LIBRARY)

if(CLR_CMAKE_TARGET_WIN32)
install(FILES $<TARGET_OBJECTS:datastream> DESTINATION datastream COMPONENT cdac RENAME datastream.obj)
else()
install(FILES $<TARGET_OBJECTS:datastream> DESTINATION datastream COMPONENT cdac RENAME datastream.o)
endif()

install_clr(TARGETS datastreamlib DESTINATIONS . COMPONENT cdac)
Loading
Loading