Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include "number of rows affected" in the data captured for SQL commands #112

Closed
SergeyKleyman opened this issue Jul 5, 2019 · 29 comments
Closed
Labels
apm-agents enhancement New feature or request poll

Comments

@SergeyKleyman
Copy link
Contributor

SergeyKleyman commented Jul 5, 2019

Description of the issue

Some SQL commands return the number of rows affected by the command. For example
.NET's SqlCommand.ExecuteNonQuery.
It might be useful for users to get this number as part of the data captured for the relevant SQL commands. For example requested at https://discuss.elastic.co/t/include-record-count-on-read-or-write/189110

Proposed solution

We should add a property (for example number_of_rows_affected) of type number to context.db object in intake API for agents to send captured number.

If you have concerns about the proposed solution, let's discuss.

What we are voting on

@elastic/apm-agent-devs

  • If you agree that we should add this feature, please tick "Yes", and link to the child issue for your agent
  • If you think that we shouldn't add this feature, please tick "No". A comment with the explanation as to why you think so would be greatly appreciated.
  • If you don't care whether we add this feature or not, please tick "Indifferent"
  • If this feature is not applicable to your agent (for example your agent doesn't capture SQL commands that have "number of rows affected" as their result), please tick "N/A"
Agent Yes No Indifferent N/A Link to agent issue
.NET
elastic/apm-agent-dotnet#360
Go
elastic/apm-agent-go#578
Java
elastic/apm-agent-java/issues/707
Node.js
elastic/apm-agent-nodejs#1709
Python
elastic/apm-agent-python#613
Ruby
elastic/apm-agent-ruby#574
@eyalkoren
Copy link
Contributor

@SergeyKleyman just verifying- this is about SQL Data Manipulation Language (DML) statements (INSERT, UPDATE, DELETE), right?

@SergeyKleyman
Copy link
Contributor Author

@eyalkoren Yes, you are correct.

@axw
Copy link
Member

axw commented Jul 8, 2019

SGTM, but can we shorten the name to just rows_affected?

@SergeyKleyman
Copy link
Contributor Author

@axw If you think rows_affected's meaning is clear enough we can go with that name just as well.

@felixbarny
Copy link
Member

Should rows_affected be 0, null or not present in case of DDL statements?

@axw
Copy link
Member

axw commented Jul 18, 2019

My vote would be to omit.

@macnibblet
Copy link

Could we include the explain of slow traces from #84 ?

@eyalkoren
Copy link
Contributor

We are currently implementing that in the Java agent.
@elastic/apm-server can we get your take on this addition? Seems we rushed into it without getting your feedback/approval on this addition...

@simitt
Copy link
Contributor

simitt commented Oct 16, 2019

I don't see an issue with adding it as proposed on the Intake API. IMO the question to clarify for the server is which field to use in ES. There is an open issue to standardize SQL information in ECS. Some beats are already collecting this information for various modules, but store the information under non-standardized fields.
The information does not need to be indexed and searchable right?

@eyalkoren
Copy link
Contributor

@simitt Good point (as always 🙂), thanks! I wasn't aware of that.
Can we separate these concerns, so that we can go ahead and agree on the intake API and implement that?
Then you can decide what to do with this field- either wait for the ECS effort (if it is expected soon) or add wherever fits now and migrate with all the rest later.

@simitt
Copy link
Contributor

simitt commented Oct 16, 2019

span.context.db.rows_affected on the Intake API SGTM. I created an issue elastic/apm-server#2802 for Intake API + ES.

@beniwohli
Copy link
Contributor

FYI, I started implementing this and immediately ran into issues. Some database drivers don't set the rowcount attribute on the cursor object when executing the query (which we instrument), but only after all rows have been fetched. For example with py-mssql, it looks something like this:

cursor.execute("SELECT * FROM test WHERE name LIKE 't%' ORDER BY id")
cursor.rowcount  # returns -1
cursor.fetchall()  # returns [(2, "two"), (3, "three")] from our test table
cursor.rowcount  # returns 2

Only execute is instrumented in the Python agent, so for some drivers, we'll never see the actual row count.

@felixbarny
Copy link
Member

number_of_rows_affected is NOT the same as the number of returned rows in a SELECT statement, it's a different concern. number_of_rows_affected is mainly for DML (UPDATE/DELETE/INSERT).

We can certainly also discuss adding an additional field like number_of_rows_returned. The problem is, as you mentioned, you only know the number of returned rows after iterating over the whole result set. At least in Java more results are fetched from the database cursor in batches (aka fetch size) as the application iterates over the result set. So in Java, there's no way of knowing that a SELECT would have yielded 1000 results if the application only reads the first result and ignores the rest.

@beniwohli
Copy link
Contributor

Ah, right. Python's DB-API2 (which all relevant database drivers implement) uses the same attribute rowcount for both: https://www.python.org/dev/peps/pep-0249/#rowcount

I'll update my implementation to only include the rows_affected value if the query is insert/update/delete.

@beniwohli
Copy link
Contributor

Not that I hate naming discussions as much as anybody, but @estolfo brings up an interesting point in the Ruby implementation issue: there are other data stores for which the number of items (for the lack of a better term) affected might be interesting. Using rows_affected could be confusing in such a case.

Shortening to affected might be an option, but I fear that would just confuse everybody instead of just NoSQL folks. Any other ideas?

@estolfo
Copy link
Contributor

estolfo commented Oct 24, 2019

The MongoDB server response field that the driver processes is actually n_modified/n_deleted. We could use that for inspiration and say n_affected.

@felixbarny
Copy link
Member

I would prefer not using abbreviations. If n stands for number, we should consider number_affected. But that also sounds weird somehow 🤔

@mikker
Copy link
Contributor

mikker commented Oct 24, 2019

affected_count

@axw
Copy link
Member

axw commented Oct 25, 2019

Argh :|

So...

  • SQL drivers typically just report "rows affected"
  • MongoDB reports documents modified/deleted
  • Elasticsearch reports documents updated/deleted

Is it helpful to combine them? If we did that, we would be throwing away information (affected = modified+deleted)

How about we keep rows_affected, but also introduce docs_modified and docs_deleted? (Or "documents_" if people prefer, but I think "docs_" is clear and concise.)

@Qard
Copy link

Qard commented Oct 25, 2019

Whatever we decide on, I think it'd probably have to be optional.

As for joining or separating concerns of modified/deleted, I doubt you'd often see a single query do both, so the context could probably be derived from the query itself. It's harder to do aggregations on that way though.

@axw
Copy link
Member

axw commented Oct 25, 2019

Whatever we decide on, I think it'd probably have to be optional.

👍 All fields we add to an existing object at this point are optional.

As for joining or separating concerns of modified/deleted, I doubt you'd often see a single query do both, so the context could probably be derived from the query itself. It's harder to do aggregations on that way though.

Multi-document transactions aren't terribly unusual, e.g. atomically deleting one document and creating another. In the same vein, Elasticsearch's Update By Query API can be used to modify or delete multiple documents.

@beniwohli
Copy link
Contributor

Should we (re-)focus this issue on the SQL-case (rows_affected) and open another issue for docs_modified / docs_deleted?

@axw
Copy link
Member

axw commented Oct 25, 2019

If everyone's on board with my suggestion, +1 to punting docs_* to another issue.

@Qard
Copy link

Qard commented Oct 25, 2019

True. This would get very unclear on a transaction. 🤔

@axw
Copy link
Member

axw commented Oct 28, 2019

Created #161. If anyone objects, please speak up. Thanks @estolfo and @beniwohli for identifying and flagging the issue!

@simitt
Copy link
Contributor

simitt commented Dec 20, 2019

There were some naming suggestions here like number_of_rows_affected, rows_affected, affected, number_affected, etc.

I think the last suggestion was rows_affected - is that what we settled on?

@eyalkoren
Copy link
Contributor

Yes, it's rows_affected.

@simitt
Copy link
Contributor

simitt commented Jan 6, 2020

Implemented in the APM Server in elastic/apm-server#3095.

@graphaelli
Copy link
Member

closing this out as intake is complete and all agents have either implemented or have an issue open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm-agents enhancement New feature or request poll
Projects
None yet
Development

No branches or pull requests