Include "number of rows affected" in the data captured for SQL commands #112

SergeyKleyman · 2019-07-05T15:25:57Z

Description of the issue

Some SQL commands return the number of rows affected by the command. For example
.NET's SqlCommand.ExecuteNonQuery.
It might be useful for users to get this number as part of the data captured for the relevant SQL commands. For example requested at https://discuss.elastic.co/t/include-record-count-on-read-or-write/189110

Proposed solution

We should add a property (for example number_of_rows_affected) of type number to context.db object in intake API for agents to send captured number.

If you have concerns about the proposed solution, let's discuss.

What we are voting on

@elastic/apm-agent-devs

If you agree that we should add this feature, please tick "Yes", and link to the child issue for your agent
If you think that we shouldn't add this feature, please tick "No". A comment with the explanation as to why you think so would be greatly appreciated.
If you don't care whether we add this feature or not, please tick "Indifferent"
If this feature is not applicable to your agent (for example your agent doesn't capture SQL commands that have "number of rows affected" as their result), please tick "N/A"

Agent	Yes	No	Indifferent	N/A	Link to agent issue
.NET					elastic/apm-agent-dotnet#360
Go					elastic/apm-agent-go#578
Java					elastic/apm-agent-java/issues/707
Node.js					elastic/apm-agent-nodejs#1709
Python					elastic/apm-agent-python#613
Ruby					elastic/apm-agent-ruby#574

The text was updated successfully, but these errors were encountered:

eyalkoren · 2019-07-07T07:55:20Z

@SergeyKleyman just verifying- this is about SQL Data Manipulation Language (DML) statements (INSERT, UPDATE, DELETE), right?

SergeyKleyman · 2019-07-07T09:00:31Z

@eyalkoren Yes, you are correct.

axw · 2019-07-08T01:28:11Z

SGTM, but can we shorten the name to just rows_affected?

SergeyKleyman · 2019-07-08T04:59:58Z

@axw If you think rows_affected's meaning is clear enough we can go with that name just as well.

felixbarny · 2019-07-18T12:36:34Z

Should rows_affected be 0, null or not present in case of DDL statements?

axw · 2019-07-18T12:49:10Z

My vote would be to omit.

macnibblet · 2019-10-07T06:02:48Z

Could we include the explain of slow traces from #84 ?

eyalkoren · 2019-10-16T06:53:15Z

We are currently implementing that in the Java agent.
@elastic/apm-server can we get your take on this addition? Seems we rushed into it without getting your feedback/approval on this addition...

simitt · 2019-10-16T07:03:19Z

I don't see an issue with adding it as proposed on the Intake API. IMO the question to clarify for the server is which field to use in ES. There is an open issue to standardize SQL information in ECS. Some beats are already collecting this information for various modules, but store the information under non-standardized fields.
The information does not need to be indexed and searchable right?

eyalkoren · 2019-10-16T07:12:43Z

@simitt Good point (as always 🙂), thanks! I wasn't aware of that.
Can we separate these concerns, so that we can go ahead and agree on the intake API and implement that?
Then you can decide what to do with this field- either wait for the ECS effort (if it is expected soon) or add wherever fits now and migrate with all the rest later.

simitt · 2019-10-16T07:20:41Z

span.context.db.rows_affected on the Intake API SGTM. I created an issue elastic/apm-server#2802 for Intake API + ES.

beniwohli · 2019-10-21T11:35:06Z

FYI, I started implementing this and immediately ran into issues. Some database drivers don't set the rowcount attribute on the cursor object when executing the query (which we instrument), but only after all rows have been fetched. For example with py-mssql, it looks something like this:

cursor.execute("SELECT * FROM test WHERE name LIKE 't%' ORDER BY id")
cursor.rowcount  # returns -1
cursor.fetchall()  # returns [(2, "two"), (3, "three")] from our test table
cursor.rowcount  # returns 2

Only execute is instrumented in the Python agent, so for some drivers, we'll never see the actual row count.

felixbarny · 2019-10-21T12:21:26Z

number_of_rows_affected is NOT the same as the number of returned rows in a SELECT statement, it's a different concern. number_of_rows_affected is mainly for DML (UPDATE/DELETE/INSERT).

We can certainly also discuss adding an additional field like number_of_rows_returned. The problem is, as you mentioned, you only know the number of returned rows after iterating over the whole result set. At least in Java more results are fetched from the database cursor in batches (aka fetch size) as the application iterates over the result set. So in Java, there's no way of knowing that a SELECT would have yielded 1000 results if the application only reads the first result and ignores the rest.

beniwohli · 2019-10-21T12:39:35Z

Ah, right. Python's DB-API2 (which all relevant database drivers implement) uses the same attribute rowcount for both: https://www.python.org/dev/peps/pep-0249/#rowcount

I'll update my implementation to only include the rows_affected value if the query is insert/update/delete.

beniwohli · 2019-10-24T13:42:49Z

Not that I hate naming discussions as much as anybody, but @estolfo brings up an interesting point in the Ruby implementation issue: there are other data stores for which the number of items (for the lack of a better term) affected might be interesting. Using rows_affected could be confusing in such a case.

Shortening to affected might be an option, but I fear that would just confuse everybody instead of just NoSQL folks. Any other ideas?

estolfo · 2019-10-24T13:47:51Z

The MongoDB server response field that the driver processes is actually n_modified/n_deleted. We could use that for inspiration and say n_affected.

felixbarny · 2019-10-24T15:46:31Z

I would prefer not using abbreviations. If n stands for number, we should consider number_affected. But that also sounds weird somehow 🤔

mikker · 2019-10-24T17:48:41Z

affected_count

axw · 2019-10-25T03:59:07Z

Argh :|

So...

SQL drivers typically just report "rows affected"
MongoDB reports documents modified/deleted
Elasticsearch reports documents updated/deleted

Is it helpful to combine them? If we did that, we would be throwing away information (affected = modified+deleted)

How about we keep rows_affected, but also introduce docs_modified and docs_deleted? (Or "documents_" if people prefer, but I think "docs_" is clear and concise.)

Qard · 2019-10-25T07:13:22Z

Whatever we decide on, I think it'd probably have to be optional.

As for joining or separating concerns of modified/deleted, I doubt you'd often see a single query do both, so the context could probably be derived from the query itself. It's harder to do aggregations on that way though.

axw · 2019-10-25T08:06:14Z

Whatever we decide on, I think it'd probably have to be optional.

👍 All fields we add to an existing object at this point are optional.

As for joining or separating concerns of modified/deleted, I doubt you'd often see a single query do both, so the context could probably be derived from the query itself. It's harder to do aggregations on that way though.

Multi-document transactions aren't terribly unusual, e.g. atomically deleting one document and creating another. In the same vein, Elasticsearch's Update By Query API can be used to modify or delete multiple documents.

beniwohli · 2019-10-25T08:47:08Z

Should we (re-)focus this issue on the SQL-case (rows_affected) and open another issue for docs_modified / docs_deleted?

axw · 2019-10-25T08:56:16Z

If everyone's on board with my suggestion, +1 to punting docs_* to another issue.

Qard · 2019-10-25T20:31:02Z

True. This would get very unclear on a transaction. 🤔

axw · 2019-10-28T01:13:28Z

Created #161. If anyone objects, please speak up. Thanks @estolfo and @beniwohli for identifying and flagging the issue!

simitt · 2019-12-20T09:19:12Z

There were some naming suggestions here like number_of_rows_affected, rows_affected, affected, number_affected, etc.

I think the last suggestion was rows_affected - is that what we settled on?

eyalkoren · 2019-12-22T04:33:54Z

Yes, it's rows_affected.

simitt · 2020-01-06T09:14:07Z

Implemented in the APM Server in elastic/apm-server#3095.

graphaelli · 2020-05-06T21:41:03Z

closing this out as intake is complete and all agents have either implemented or have an issue open

SergeyKleyman added enhancement New feature or request apm-agents poll labels Jul 5, 2019

SergeyKleyman mentioned this issue Jul 5, 2019

Include "number of rows affected" in the data captured for SQL commands elastic/apm-agent-dotnet#360

Open

eyalkoren mentioned this issue Jul 7, 2019

Include "number of rows affected" in the data captured for SQL DML commands elastic/apm-agent-java#707

Closed

axw mentioned this issue Jul 8, 2019

module/apmsql: record number of rows affected elastic/apm-agent-go#578

Closed

eyalkoren mentioned this issue Oct 16, 2019

Report affected rows for DML statements elastic/apm-agent-java#883

Merged

6 tasks

beniwohli mentioned this issue Oct 21, 2019

Capture rows_affected in DB-API2 integrations elastic/apm-agent-python#613

Closed

mikker mentioned this issue Oct 21, 2019

Include number of rows_affected if available elastic/apm-agent-ruby#574

Closed

axw mentioned this issue Oct 28, 2019

Capture "number of documents modified/deleted" for operations on document stores #161

Open

simitt mentioned this issue Dec 31, 2019

intake: Add rows_affected to span db information elastic/apm-server#3095

Merged

graphaelli mentioned this issue Apr 3, 2020

Capture rows_affected in db instrumentation elastic/apm-agent-nodejs#1709

Open

SylvainJuge mentioned this issue Apr 20, 2020

Remove calls to Statement.getUpdateCount elastic/apm-agent-java#1147

Merged

graphaelli closed this as completed May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include "number of rows affected" in the data captured for SQL commands #112

Include "number of rows affected" in the data captured for SQL commands #112

SergeyKleyman commented Jul 5, 2019 •

edited by graphaelli

Loading

eyalkoren commented Jul 7, 2019

SergeyKleyman commented Jul 7, 2019

axw commented Jul 8, 2019

SergeyKleyman commented Jul 8, 2019

felixbarny commented Jul 18, 2019

axw commented Jul 18, 2019

macnibblet commented Oct 7, 2019

eyalkoren commented Oct 16, 2019

simitt commented Oct 16, 2019

eyalkoren commented Oct 16, 2019

simitt commented Oct 16, 2019

beniwohli commented Oct 21, 2019

felixbarny commented Oct 21, 2019

beniwohli commented Oct 21, 2019

beniwohli commented Oct 24, 2019

estolfo commented Oct 24, 2019

felixbarny commented Oct 24, 2019

mikker commented Oct 24, 2019

axw commented Oct 25, 2019

Qard commented Oct 25, 2019

axw commented Oct 25, 2019

beniwohli commented Oct 25, 2019

axw commented Oct 25, 2019

Qard commented Oct 25, 2019

axw commented Oct 28, 2019

simitt commented Dec 20, 2019

eyalkoren commented Dec 22, 2019

simitt commented Jan 6, 2020

graphaelli commented May 6, 2020

Include "number of rows affected" in the data captured for SQL commands #112

Include "number of rows affected" in the data captured for SQL commands #112

Comments

SergeyKleyman commented Jul 5, 2019 • edited by graphaelli Loading

Description of the issue

Proposed solution

What we are voting on

eyalkoren commented Jul 7, 2019

SergeyKleyman commented Jul 7, 2019

axw commented Jul 8, 2019

SergeyKleyman commented Jul 8, 2019

felixbarny commented Jul 18, 2019

axw commented Jul 18, 2019

macnibblet commented Oct 7, 2019

eyalkoren commented Oct 16, 2019

simitt commented Oct 16, 2019

eyalkoren commented Oct 16, 2019

simitt commented Oct 16, 2019

beniwohli commented Oct 21, 2019

felixbarny commented Oct 21, 2019

beniwohli commented Oct 21, 2019

beniwohli commented Oct 24, 2019

estolfo commented Oct 24, 2019

felixbarny commented Oct 24, 2019

mikker commented Oct 24, 2019

axw commented Oct 25, 2019

Qard commented Oct 25, 2019

axw commented Oct 25, 2019

beniwohli commented Oct 25, 2019

axw commented Oct 25, 2019

Qard commented Oct 25, 2019

axw commented Oct 28, 2019

simitt commented Dec 20, 2019

eyalkoren commented Dec 22, 2019

simitt commented Jan 6, 2020

graphaelli commented May 6, 2020

SergeyKleyman commented Jul 5, 2019 •

edited by graphaelli

Loading