Propose an "Auto-Instrumentation SIG" #87

bhs · 2019-06-03T19:59:36Z

I know that @tedsuo and @SergeyKanzhelev are interested in this. Happy to seed the group with other folks as well.

I'm trying to avoid the word "Agent" since I know it has different meanings to different people. For what it's worth, the first order of business for the SIG would be to define the terms more crisply, but I'm basically imagining software that's linked in post-compilation and provides portable OpenTelemetry-compatible instrumentation; ideally in a clean, pluggable, well-factored manner, though all in due time. :)

Also, I'm happy to find a name other than "auto-instrumentation" as long as it's clear / unambiguous.

I know that Ted and Sergey are interested in this. Happy to seed the group with other folks as well.

SergeyKanzhelev · 2019-06-03T20:05:52Z

I was going to propose a javaagent SIG. Do you believe it's the same people who will be interested in auto-code-injestion across languages? I was thinking that perhaps TC can help define cross-language principles

bhs · 2019-06-03T21:03:26Z

Meta-note: I didn't create the gitter room yet (wanted to wait for approval on the PR and naming).

As for having cross-language stuff happen in the TC: I guess it just comes down to whether it's the same group of people and should be the same set of docs/meetings. I was imagining this being slightly less "central" in that we wouldn't be meddling with APIs or OpenTel data formats in this SIG and thus it can be decoupled. Happy to have that debate, though.

bhs · 2019-06-04T16:07:00Z

Trying to move this forward...

@open-telemetry/technical-committee: I thought it worthwhile to have a cross-language SIG for the auto-instrumentation agents because there are a lot of different ways to think about them "philosophically," and we will drive ourselves a bit crazy if the different languages adopt different approaches entirely.

E.g., should these agents mainly be "installers" that take existing OpenTelemetry instrumentation and bind it to the running process? Or should they do black-box instrumentation of libraries using other techniques? Should they be configurable "from the outside" to grab function parameters/etc? How pluggable should they be, and what sort of plugin architecture makes sense given the rest of OpenTelemetry? Should they aim to make portable OpenTel API calls, or should they aim to simply emit OpenTel wire formats? Or both?

Also, agents are tricky to get right and we might want different folks participating in the meetings than the regular TC members who may or may not have the relevant expertise.

Separately, I'd love to make a decision about how we're moving forward in the next 48h or so... the more I think about this project, the more I think these auto-instrumentation agents are important for the end-to-end value prop of OpenTelemetry as a "product".

Thanks!

tigrannajaryan · 2019-06-04T17:29:51Z

these auto-instrumentation agents are important for the end-to-end value prop of OpenTelemetry as a "product".

@bhs This is a good observation. I believe agents are very important for out-of-the box product experience. Even if that does not necessarily mean auto-instrumentation of user's apps, once you deploy the agents you can expect them to start collecting host metrics which is already valuable.

Also, agents are tricky to get right and we might want different folks participating in the meetings than the regular TC members who may or may not have the relevant expertise.

I'd be happy to participate (I've been the tech lead for LogInsight Agent in the past - not a trace/metric collector, but still hopefully useful experience).

tigrannajaryan · 2019-06-04T17:46:52Z

@bhs

E.g., should these agents mainly be "installers" that take existing OpenTelemetry instrumentation and bind it to the running process?
Or should they do black-box instrumentation of libraries using other techniques?

Are you thinking about a monkey-patching technique that doesn't touch the application code, e.g. assuming the app is using a dynamic library swap the library with equivalent but instrumented version? Not sure how practical this approach would be.

Should they be configurable "from the outside" to grab function parameters/etc?

For LI Agent we supported remote configuration, i.e. you would define the config on the server/backend, the agent would connect to the server and pull the config (for security reasons this was controllable from the agent/host side and could be enabled/disabled). The downside is it increases the security surface significantly and may not be neccessary for infrastructures where bulk configuration deployment is naturally supported via other means.

How pluggable should they be, and what sort of plugin architecture makes sense given the rest of OpenTelemetry?

I was thinking about compile-time plugging. You would have a "core" that would have the base functionality and include certain receivers and exporters (e.g. OpenTel/OpenCensus formats, probably some others). If you (as an end user) wanted to add your own receiver you can easily build on top of the "core" by creating your own "agent" that simply imports that "core" agent and registers your own receiver/exporter factories before starting the core. You would then build and deploy your custom agent similarly to the standard one.

(see some related thoughts here open-telemetry/opentelemetry-collector#12 and general extensibility vision here https://github.com/open-telemetry/opentelemetry-service/blob/master/docs/VISION.md)

Should they aim to make portable OpenTel API calls, or should they aim to simply emit OpenTel wire formats? Or both?

I think they should emit the wire format specified by the config file (OpenTel being the default), which is what OpenCensus agent is doing now.

tigrannajaryan · 2019-06-04T17:49:00Z

README.md

+
+"Auto-Instrumentation" refers to efforts to install OpenTelemetry instrumentation and otherwise extract OpenTelemetry-compatible data from processes without direct code modification. The Auto-Instrumentation SIG will meet weekly at a time TBD.
+
+You can also join us on [the auto-instrumentation channel](https://gitter.im/open-telemetry/auto-instrumentation) in OpenTelemetry gitter.


The channel seems to be private, doesn't load for me. Is it invite-only?

I mentioned this above. I haven't created the channel yet... I will do so if this PR is approved.

bhs · 2019-06-04T17:50:46Z

@tigrannajaryan this is exactly the sort of discussion I would like to have in the actual SIG, but not in the PR talking about whether we create the SIG. :)

The larger point I'm making is that many/most of these decisions are not language-specific, and so we should have a central place where we determine the spec and go forward from there. Otherwise we will have the same debate over and over again in the N languages, and/or we will end up with divergent models across the N languages.

The decision we're trying to make right now: should we create a single SIG to determine the spec / "ground rules" for the various auto-instrumentation efforts across the N languages? If so, we'll approve this PR and get the right people involved to formalize that spec.

tigrannajaryan · 2019-06-04T17:56:25Z

this is exactly the sort of discussion I would like to have in the actual SIG, but not in the PR talking about whether we create the SIG. :)

@bhs Makes sense.
Whether it is a separate SIG or one of the existing one, I am happy to contribute, ping me when the discussions happen :-)

tigrannajaryan · 2019-06-04T21:51:59Z

@bhs can you please clarify what would be the relation of this new SIG with the already existing SIG for Agent/Collector that is listed here? https://github.com/open-telemetry/community#agentcollector

Is this the same SIG or a new one?

yurishkuro · 2019-06-04T21:52:29Z

I am conflicted about this. (Blackbox) auto-instrumentation feels like an area even less explored than whitebox instrumentation. That is to say, even though commercial vendors have been doing it for years, I haven't seen a lot of information published publicly about how they are doing it. It's possible they have found common cross-language patterns (another question whether they are willing to share them). But it's also possible that there are many different ways of doing that. So my concern is with the goal of this SIG to "create a specification".

If, on the other hand, the goal is to discuss these cross-language patterns and concerns, and maybe produce a white paper / recommendation, then that would be great. If a formally formed SIG helps people to do that, I am all in favor.

yurishkuro · 2019-06-04T21:54:48Z

@tigrannajaryan agent/collector are backend components, they receive but don't produce telemetry.

tigrannajaryan · 2019-06-04T21:59:22Z

agent/collector are backend components, they receive but don't produce telemetry.

@yurishkuro that's correct, but nothing prevents agent to produce host-level telemetry (metrics). I believe it will be very useful.

If eventually we add support for logs the agent can also monitor syslog, journald and /var/log and collect/send the logs to the backend with no instrumentation needed. This way the agent becomes a producer of very valuable telemetry.

yurishkuro · 2019-06-04T22:04:58Z

This way the agent becomes a producer of very valuable telemetry.

But not the telemetry from a given application. I believe this SIG is explicitly focused on blackbox instrumentation of applications, which has unique challenges compared to simply collecting host-level telemetry from sources that already produce it. But I'll let @bhs to respond.

bhs · 2019-06-04T23:07:37Z

@yurishkuro:

I believe this SIG is explicitly focused on blackbox instrumentation of applications

Not exactly, actually... the model I am personally the most excited about tends more towards something like https://github.com/opentracing-contrib/java-specialagent . I.e., it's possible to be agent-like in that there are zero source-code modifications, but still rely heavily on whitebox instrumentation where it's available. Of course it's possible to mix and match these two approaches, at least to a degree. In the SpecialAgent example above, it's really more like an automated installer of existing whitebox plugins... a sort of hybrid (hence "Special") approach.

Anyway, these are yet more of the things we would discuss in the SIG. :) I just want to be clear that "auto-instrumentation" and "whitebox instrumentation" are not mutually exclusive.

pavolloffay · 2019-06-05T09:17:25Z

Does this need to be a separate SIG? The auto-instrumentation will differ from language to language and most likely it will be maintained by people working on a specific language.

AloisReitbauer · 2019-06-05T10:19:02Z

We would be willing to contribute to a SIG. Obviously, we have been using agents forever and think they make a lot of sense.

If the SIG should be successful we need language/runtime providers in there as well. Usually, a lot of functionality requires specific features from the runtime like agent loading, code/binary loading interception hooks.

For the SIG I would propose to define what we want to work on. We are also interested in a well-definend coexistence scenario between special agents and auto-instrumentation with code-based instrumentation

mariusoe · 2019-06-05T14:35:19Z

Hello,
me and the team I'm in are also interested in working in such a SIG and contributing our experience.

@yurishkuro

(Blackbox) auto-instrumentation feels like an area even less explored than whitebox instrumentation.

Based on OpenCensus, we're currently building a Java agent for the purpose of automatically injecting instrumentations into a blackbox system (inspectIT Ocelot). Besides of this, we have been developing Java agents for some years now, thus, we have quite a lot of experience in this topic and are interested in contributing our experience.

We are also seeing some points here that we have also discussed in our team and tackled in Ocelot, like the point mentioned before of using an "agent approach" in combination with "whitebox instrumentation".

SergeyKanzhelev · 2019-06-05T17:23:17Z

I realize we want all languages to have some way of code injection. I'd suggest, hovewer, to start with Java. Just to scope it down. And later we can generalize it to the level of a cross-language discussion.

Is anybody on this thread interested in code injection and not interested in Java?

If this is fine, I think it's a good idea to kick this SIG off. @bhs any reason to close it in 48 hours as you mentioned? It is really important to start API SIGs now and there are clearly people who will participate in both. Will next week be a good time for a next meeting or there are some pressing factors?

trask · 2019-06-05T18:08:30Z

Hi! I'd love to participate in this, have been instrumenting Java for a long time (https://github.com/glowroot/glowroot), and recently started (also) working on Microsoft's Java agent.

mtwo · 2019-06-05T21:41:07Z

I think that it makes a lot of sense to have a SIG for automatic instrumentation, starting with Java. While I'd argue that this functionality should be packaged with the existing sidecar functionality that we're porting over from OpenCensus (so that users download a single binary), the two sets of functionality should be developed in separate workstreams.

Thus we'd have a SIG for auto instrumentation (starting with Java) and a SIG for sidecars (existing OC agent and collector). Thoughts?

bhs · 2019-06-05T21:50:38Z

@SergeyKanzhelev re closing this PR, I wasn't clear on what you're suggesting... that we close this PR (without merging) and proceed to have the auto-instrumentation discussion in the context of the existing Java SIG? Or that we merge this PR, create the SIG, then start with Java? Sorry to be unclear.

@AloisReitbauer:

For the SIG I would propose to define what we want to work on.

I primarily want the SIG to write down a set of constraints/goals for "official" OpenTelemetry auto-instrumentation efforts (I'm trying to avoid the word "agent" since OpenCensus has an agent that's a completely different thing – more like a sidecar), then to prioritize and help organize the various per-language efforts.

I agree with numerous people here that we will need language+runtime expertise on a per-language basis before actually writing code. The cross-language SIG would not be involved in this conversation unless cross-language patterns emerge.

@mtwo re your comment:

Thus we'd have a SIG for auto instrumentation (starting with Java) and a SIG for sidecars (existing OC agent and collector). Thoughts?

Fine with me, sure. I don't really understand the "packaging" comment, though... they really have different purposes. I see the layering as Auto-instrumentation | API | SDK | <raw data> | sidecar | <obs system>, and so packaging the top of that stack with something at the end seems odd.

SergeyKanzhelev · 2019-06-05T21:53:55Z

@bhs I was suggesting to have java auto-instrumentation SIG and generalize it later in cross-language auto-instrumentation. Unless we have a large group of people interested in different language now.

I created a poll for the kick off meeting: https://doodle.com/poll/f9egdg3n2tfy24kg for the next week.

I didn't create any late evening options. Please advice if it is needed.

mtwo · 2019-06-05T22:04:59Z

I don't really understand the "packaging" comment

Nevermind, I was thinking that we could distribute the auto-instrumentation functionality as a part of the OC agent / sidecar. However this isn't feasible if it's being passed as a javaagent param to the JVM. Ignore that part of my comment :)

bhs · 2019-06-05T22:16:05Z

@SergeyKanzhelev it may be risky to start by immediately digging into Java, as there are many parties (even just on this thread) who already have some sort of Java agent which they are inevitably – and understandably – somewhat attached to... that, in turn, can lead to a scenario where participants end up creating rationalizations for their own approach rather than thinking about what really makes the most sense for OpenTelemetry as a project. My hope in starting with a cross-language spec is that we could establish what some of those more strategic goals are before digging into a bake-off of N different existing OSS Java agent projects.

Yet another approach: I can write up that high-level spec about the goals as just a plain-old document in one of the OpenTel repos, and we can debate this stuff on that PR. Once we have alignment around the goals, we could dig in to the Java stuff with a clearer sense of our agreed-upon objectives. But there would be no cross-language "agent" or "auto-instrumentation" SIG, just the spec doc.

tedsuo · 2019-06-06T14:51:39Z

I'm sure the initial meeting will be Java-centric, given the shape of the community, but I would prefer that we discuss this topic at a higher level, and start from a cross-language perspective.

I believe it's possible to factor out the issue of auto-intrumentation into individual problems, and discuss what they would mean for a project like OpenTelemetry. "Agent" is simultaneously an overly broad and overly specified term, so it would be helpful to understand our goals before launching into implementations.

Moreso than the APIs – where we were trying to merge two existing projects – the agent issue could use a proper design process, starting with a gathering of requirements. :)

safris · 2019-06-06T15:39:04Z

I agree @tedsuo, the higher level discussions would let us define a clear scope for "auto-instrumentation" on different platforms. The way "auto-instrumentation" is done in Java is very specific to Java, and would not necessarily translate to other platforms (even from an architectural level). I'll join the kick-off meeting next week.

bhs · 2019-06-07T17:39:17Z

Alright... so, in re this PR, I think we should leave it open until we've had the initial call that Sergey proposed. The self-scheduling link that Sergey created is here:

https://doodle.com/poll/f9egdg3n2tfy24kg

I also created a basic agenda doc that we can use for the first meeting we're scheduling above ^^. Please add or suggest edits as anyone sees fit, keeping in mind that we shouldn't (IMO) dig deeply into technical minutia on the first call:

https://docs.google.com/document/d/1ix0WtzB5j-DRj1VQQxraoqeUuvgvfhA6Sd8mF5WLNeY/edit#heading=h.2frn4dvil09r

tylerbenson · 2019-06-07T23:15:34Z

Just to chime in here...

I think it would be valuable to have a preliminary cross-language group to establish the goals and architecture before diving in to the separate language implementations.

Some things that might make sense for the cross-language discussion and for each language group to consider:

If we write auto instrumentation directly against the OTel API, what happens if the user compiled/linked against an incompatible version?
How can we ensure our instrumentation works across incompatible versions? (eg additive - servlet 2/3, or conflicting netty 4/4.1)
How do we make it easy to test instrumentation correctness across various libraries and versions?

SergeyKanzhelev · 2019-06-10T17:17:08Z

Out of all voted - tomorrow 1PM-2PM pacific works for everybody.

Scheduled:

Join Microsoft Teams Meeting

rochdev · 2019-06-10T17:40:48Z

@SergeyKanzhelev Should we standardize on a video conferencing tool? I feel it makes meetings easier when a different tool is not used every time since you can focus on the meeting and not on learning the tool. Either Zoom or Hangouts would be the most commonly used in general.

SergeyKanzhelev · 2019-06-10T17:43:14Z

@rochdev I don't have access to create either Zoom or Hangout meetings =). I was planning to follow up with CNCF on using their Zoom subscription. I'm OK with Zoom.

tsloughter · 2019-06-10T18:15:10Z

Would dynamic instrumentation fall into this category? As in, on a running node, or set of nodes, define points for new spans to be created and finished. Because this is a common practice with Erlang's various tracers to do, even in production, for real time investigating, I figured being able to also say, 'include these in any OpenTelemetry traces that come through these code paths as well, could be beneficial.

mtwo · 2019-06-10T21:56:34Z

I'm adding this meeting (copying Sergey's link) to the public calendar

bhs · 2019-06-13T23:53:44Z

Earlier this week, many folks on this thread had our discussion about “auto-instrumentation” / “agents” / “zero-source-code-modification instrumentation” (these are all the same thing, just with different words). Those on the call thought it would be helpful to try to document a list of requirements we could use to help make our efforts as consistent as possible across languages… eventually we’d like this to be a PR, but for now the consensus is that a google doc will be easier to iterate on.

Anyway, here’s a first stab at it: https://docs.google.com/document/d/1sovSQIGdxXtsauxUNp4qUMEIJZzObdukzPT52eyPCHM/edit#heading=h.obofcqujudb8

To be clear, this is a work-in-progress proposal/draft and I’m 100% open to feedback about any of it. Thanks in advance!

bhs · 2019-06-17T22:31:10Z

Just a final ping on this thread to see if anyone wants to weigh in on https://docs.google.com/document/d/1sovSQIGdxXtsauxUNp4qUMEIJZzObdukzPT52eyPCHM/edit#heading=h.obofcqujudb8 before I turn it into a PR... def easier to resolve comments in a google doc than a GitHub PR, so please do make any suggestions/etc sooner rather than later. Thanks.

mtwo · 2019-06-18T15:35:35Z

Missed the message from five days ago - taking a look now!

mtwo · 2019-06-18T16:33:03Z

LGTM

lizthegrey · 2019-07-11T18:23:48Z

My concern specifically is about overloading the term "automatic instrumentation" - there's a difference between "full manual creation of trace spans", "link in this library as a dependency and everything will automatically work", and "no code change needed and things will automatically work". The latter two I'd say are both kinds of "automatic" instrumentation.

Can we be clear that this SIG pertains specifically to the bytecode approach rather than encompassing all "automatic instrumentation"?

(this discussion now ongoing in both open-telemetry/oteps#5 and open-telemetry/oteps#7)

yurishkuro · 2019-07-12T17:38:21Z

The latter two I'd say are both kinds of "automatic" instrumentation.

I agree, that's why I think they belong to the same RFC. In fact, open-telemetry/oteps#5 is called "zero-touch" in quotes. I think it's wording should be relaxed slightly -- comment.

bhs · 2019-07-17T04:36:53Z

Given that https://github.com/open-telemetry/rfcs/blob/master/0002-telemetry-without-manual-instrumentation.md is merged, I'm inclined to close this issue... the main reason I initially wanted a SIG was to create some cross-language requirements, and that's done.

Are there people who still want a cross-language auto-instrumentation SIG at this point? If not, I will close this by the end of the week.

bhs · 2019-07-19T05:19:14Z

Are there people who still want a cross-language auto-instrumentation SIG at this point? If not, I will close this by the end of the week.

🏏 🏏 🏏 🏏

(closing the PR)

Propose an "Auto-Instrumentation SIG"

e5f5a7e

I know that Ted and Sergey are interested in this. Happy to seed the group with other folks as well.

bhs requested a review from SergeyKanzhelev June 3, 2019 19:59

tigrannajaryan reviewed Jun 4, 2019

View reviewed changes

bhs mentioned this pull request Jul 1, 2019

RFC for "zero-touch telemetry" requirements open-telemetry/oteps#5

Merged

bhs closed this Jul 19, 2019


		"Auto-Instrumentation" refers to efforts to install OpenTelemetry instrumentation and otherwise extract OpenTelemetry-compatible data from processes without direct code modification. The Auto-Instrumentation SIG will meet weekly at a time TBD.

		You can also join us on [the auto-instrumentation channel](https://gitter.im/open-telemetry/auto-instrumentation) in OpenTelemetry gitter.

Propose an "Auto-Instrumentation SIG" #87

Propose an "Auto-Instrumentation SIG" #87

Conversation

bhs commented Jun 3, 2019

SergeyKanzhelev commented Jun 3, 2019

bhs commented Jun 3, 2019

bhs commented Jun 4, 2019

tigrannajaryan commented Jun 4, 2019

tigrannajaryan commented Jun 4, 2019

tigrannajaryan Jun 4, 2019

Choose a reason for hiding this comment

bhs Jun 4, 2019

Choose a reason for hiding this comment

bhs commented Jun 4, 2019 • edited Loading

tigrannajaryan commented Jun 4, 2019

tigrannajaryan commented Jun 4, 2019

yurishkuro commented Jun 4, 2019

yurishkuro commented Jun 4, 2019

tigrannajaryan commented Jun 4, 2019

yurishkuro commented Jun 4, 2019

bhs commented Jun 4, 2019

pavolloffay commented Jun 5, 2019

AloisReitbauer commented Jun 5, 2019

mariusoe commented Jun 5, 2019

SergeyKanzhelev commented Jun 5, 2019

trask commented Jun 5, 2019 • edited Loading

mtwo commented Jun 5, 2019

bhs commented Jun 5, 2019

SergeyKanzhelev commented Jun 5, 2019

mtwo commented Jun 5, 2019

bhs commented Jun 5, 2019 • edited Loading

tedsuo commented Jun 6, 2019 • edited Loading

safris commented Jun 6, 2019

bhs commented Jun 7, 2019

tylerbenson commented Jun 7, 2019

SergeyKanzhelev commented Jun 10, 2019

rochdev commented Jun 10, 2019

SergeyKanzhelev commented Jun 10, 2019

tsloughter commented Jun 10, 2019

mtwo commented Jun 10, 2019

bhs commented Jun 13, 2019

bhs commented Jun 17, 2019

mtwo commented Jun 18, 2019

mtwo commented Jun 18, 2019

lizthegrey commented Jul 11, 2019 • edited Loading

yurishkuro commented Jul 12, 2019 • edited Loading

bhs commented Jul 17, 2019

bhs commented Jul 19, 2019 • edited Loading

bhs commented Jun 4, 2019 •

edited

Loading

trask commented Jun 5, 2019 •

edited

Loading

bhs commented Jun 5, 2019 •

edited

Loading

tedsuo commented Jun 6, 2019 •

edited

Loading

lizthegrey commented Jul 11, 2019 •

edited

Loading

yurishkuro commented Jul 12, 2019 •

edited

Loading

bhs commented Jul 19, 2019 •

edited

Loading