not seeing all rows when using Presto against YugaByte using Cassandra connector #312

kmuthukk · 2018-05-30T21:46:27Z

Steps to repro:

Launch a vanilla YugaByte pseudo-distributed cluster.

./bin/yb-ctl destroy
./bin/yb-ctl create

Create a YCQL (cassandra) table and insert some rows using cqlsh. Sample script:

CREATE KEYSPACE IF NOT EXISTS app;
USE app;

drop table if exists msg;
create table msg (userid int, msgid int, msgtext text, PRIMARY KEY ((userid), msgid));

insert into msg (userid, msgid, msgtext) values (1, 1, 'a');
insert into msg (userid, msgid, msgtext) values (1, 2, 'b');
insert into msg (userid, msgid, msgtext) values (1, 3, 'c');
insert into msg (userid, msgid, msgtext) values (1, 4, 'd');

insert into msg (userid, msgid, msgtext) values (2, 1, 'a');
insert into msg (userid, msgid, msgtext) values (2, 2, 'b');
insert into msg (userid, msgid, msgtext) values (2, 3, 'c');
insert into msg (userid, msgid, msgtext) values (2, 4, 'd');

insert into msg (userid, msgid, msgtext) values (3, 1, 'a');
insert into msg (userid, msgid, msgtext) values (3, 2, 'b');
insert into msg (userid, msgid, msgtext) values (3, 3, 'c');
insert into msg (userid, msgid, msgtext) values (3, 4, 'd');

insert into msg (userid, msgid, msgtext) values (4, 1, 'a');
insert into msg (userid, msgid, msgtext) values (4, 2, 'b');
insert into msg (userid, msgid, msgtext) values (4, 3, 'c');
insert into msg (userid, msgid, msgtext) values (4, 4, 'd');

Verify rows exist from cqlsh:

cqlsh:app> select * from msg;

 userid | msgid | msgtext
--------+-------+---------
      1 |     1 |       a
      1 |     2 |       b
      1 |     3 |       c
      1 |     4 |       d
      4 |     1 |       a
      4 |     2 |       b
      4 |     3 |       c
      4 |     4 |       d
      2 |     1 |       a
      2 |     2 |       b
      2 |     3 |       c
      2 |     4 |       d
      3 |     1 |       a
      3 |     2 |       b
      3 |     3 |       c
      3 |     4 |       d

(16 rows)

Configure & Launch Presto server against

% ~/presto-server-0.201/bin/launcher run

And, next, test from presto's CLI:

$ ./bin/presto --server localhost:8080 --catalog cassandra --schema app
WARNING: History file is not readable/writable: /Users/kannan/.presto_history. History will not be available during this session.
presto:app> describe msg;
 Column  |  Type   | Extra | Comment
---------+---------+-------+---------
 userid  | integer |       |
 msgid   | integer |       |
 msgtext | varchar |       |
(3 rows)

Query 20180530_203548_00002_sscxf, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [3 rows, 174B] [16 rows/s, 983B/s]

presto:app> select * from msg;
 userid | msgid | msgtext
--------+-------+---------
      1 |     1 | a
      1 |     2 | b
      1 |     3 | c
      1 |     4 | d
      4 |     1 | a
      4 |     2 | b
      4 |     3 | c
      4 |     4 | d
      2 |     1 | a
      2 |     2 | b
      2 |     3 | c
      2 |     4 | d
(12 rows)

Query 20180530_203550_00003_sscxf, FINISHED, 1 node
Splits: 20 total, 20 done (100.00%)
0:00 [12 rows, 12B] [84 rows/s, 84B/s]

Notice that from presto we are only seeing 12 of the 16 expected rows.

It seems that there is some inconsistency in the way the splits (for the tablets in the table) are handled/discovered and queried via the token() builtin. And we might be missing one of the splits..

The text was updated successfully, but these errors were encountered:

m-iancu · 2018-05-31T00:39:14Z

The issue is an incompatibility with Cassandra w.r.t. the handling token range queries and using the minimum token value (-9223372036854775808) as an upper bound.
In Cassandra, this condition is treated as a special case and includes all token hashes -- effectively interpreting the min value as a max value for this case.

Concretely, when Presto splits the token range into partitions it generates queries of the form:

SELECT ... WHERE ... AND token(..) > start_token AND token(..) <= end_token

For the last of the generated splits, the end_token value is not "max token value" (9223372036854775807) but "min token value" (-9223372036854775808) as the token space s a ring which wraps around.
That is why we return no results for that partition (explaining the missing rows), while Cassandra returns everything.

Summary: In Cassandra using min token value (INT64_MIN) as an upper bound is specially interpreted to include all token hashes (rather than none). This behavior is used, for instance, by Presto when splitting the entire token range into partitions. Test Plan: jenkins, ql-query-test.cc Reviewers: pritam.damania, robert Reviewed By: robert Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D4911

kmuthukk · 2018-05-31T04:02:17Z

Nice work @m-iancu in helping track this down and rolling out a fix so quickly.

m-iancu · 2018-05-31T06:26:49Z

Closed by b6248d5.

PG-320: Removing the query state code from the view.

kmuthukk assigned m-iancu May 30, 2018

kmuthukk added the kind/bug This issue is a bug label May 30, 2018

m-iancu closed this as completed May 31, 2018

ryan-ally mentioned this issue Nov 5, 2022

[Snyk] Fix for 1 vulnerabilities ryan-ally/yugabyte-db#51

Open

nyndyny mentioned this issue Nov 5, 2022

[Snyk] Fix for 1 vulnerabilities nyndyny/yugabyte-db#27

Open

ryan-ally mentioned this issue Dec 25, 2022

[Snyk] Fix for 1 vulnerabilities ryan-ally/yugabyte-db#102

Open

nyndyny mentioned this issue Dec 25, 2022

[Snyk] Fix for 1 vulnerabilities nyndyny/yugabyte-db#63

Open

nyndyny mentioned this issue Feb 2, 2024

[Snyk] Security upgrade eslint-loader from 2.2.1 to 4.0.2 nyndyny/yugabyte-db#225

Open

ryan-ally mentioned this issue Feb 3, 2024

[Snyk] Security upgrade eslint-loader from 2.2.1 to 4.0.2 ryan-ally/yugabyte-db#254

Open

jasonyb pushed a commit that referenced this issue Jun 11, 2024

Merge pull request #312 from ibrarahmad/PG-320

bcb1a3b

PG-320: Removing the query state code from the view.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not seeing all rows when using Presto against YugaByte using Cassandra connector #312

not seeing all rows when using Presto against YugaByte using Cassandra connector #312

kmuthukk commented May 30, 2018

m-iancu commented May 31, 2018 •

edited

Loading

kmuthukk commented May 31, 2018

m-iancu commented May 31, 2018

not seeing all rows when using Presto against YugaByte using Cassandra connector #312

not seeing all rows when using Presto against YugaByte using Cassandra connector #312

Comments

kmuthukk commented May 30, 2018

m-iancu commented May 31, 2018 • edited Loading

kmuthukk commented May 31, 2018

m-iancu commented May 31, 2018

m-iancu commented May 31, 2018 •

edited

Loading