Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Bad behavior in PL/pgSQL block statement with exception section on 'unique_violation' (code 23505) #12184

Closed
bllewell opened this issue Apr 15, 2022 · 4 comments
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@bllewell
Copy link
Contributor

bllewell commented Apr 15, 2022

Jira Link: DB-544
See also Bizarre behavior on attempt to handle 23505 error in PL/pgSQL

and: PL/pgSQL block fails to catch "unique_violation" in YSQL. Vanilla PG works as expected.

Description

Background

It can be argued that the reason that user-defined functions and procedures exist is to implement the functionality API that a client app needs from the database so that everything about tables and the SQL statements that operate on them is hidden from these clients. Following this approach brings huge security benefits. This hiding paradigm has implications for error handling.

Consider a procedure whose purpose is to insert a new single row where one, or more, of the columns have a unique constraint. It returns a success code.

  • The attempt might succeed cleanly. The return code is 'Success'.

  • The attempt might violate a constraint. This must be considered to be an expected (but regrettable) error outcome. The feedback to the client must say what went wrong in end-user terms like "This twitter handle is taken. Try again with a different candidate."

  • The attempt might cause an error that the programmer hasn't yet catered for—i.e. it might hit a bug. For example, a text value might exceed the declared capacity of the target column. There's always a possiblity of bugs. But the exception report must not leak to the client because it has all sorts of names that can give a hacker clues for designing attacks. But diagnosis must be possible. The paradigm therefore requires that an "others" handler must collect all the information needed for diagnosis and save it to an error log table. This will have an auto-populating surrogate PK called, for example, ticket. The feedbacl to the client here must simply say "There was an unexpected error. Please report this and mention ticket number NNNNN."

PostgreSQL supports this paradigm. See the code demo error-logging-demo.sql in the attached .zip. But the demo fails in YB-YB-2.13.0.1.

Basic positive test

First, try this in both PG and YB:

drop function if exists f(text) cascade;
drop table if exists t cascade;
create table t(k serial primary key, v varchar(5) not null);
create unique index t_v_unq on t(v);
insert into t(v) values ('dog'), ('cat'), ('frog');

create function f(new_v in text)
  returns table(z text)
  security definer
  language plpgsql
as $body$
begin
  z := 'Inserting "'||new_v||'"'; return next;
  begin
    insert into t(v) values (new_v);
    z := 'No exception'; return next;
  end;
end;
$body$;

\t on
select z from f('ant');
select z from f('dog');
select z from f('elephant');
select * from t order by k;
\t off

The outcome is the same in both PG and YB.

  • Inserting 'ant' reports « Inserting "ant" » and « No exception » and finishes silently.

  • Inserting 'dog' produces the error « duplicate key value violates unique constraint "t_v_unq" »

  • Inserting 'elephant' produces the error « value too long for type character varying(5) »

  • 'select * from t' shows that only 'ant' has been inserted—as expected.

Minimal code to provoke the bug

Add an exception section to the code above to format useful feedback:

drop function if exists f(text) cascade;
drop table if exists t cascade;
create table t(k serial primary key, v varchar(5) not null);
create unique index t_v_unq on t(v);
insert into t(v) values ('dog'), ('cat'), ('frog');

create function f(new_v in text)
  returns table(z text)
  security definer
  language plpgsql
as $body$
begin
  z := 'Inserting "'||new_v||'"'; return next;
  begin
    insert into t(v) values (new_v);
    z := 'No exception'; return next;
  exception
    when unique_violation then
      z := 'unique_violation'; return next;
    when others then
      z := 'others'; return next;
  end;
end;
$body$;

\t on
select z from f('ant');
select z from f('dog');
select z from f('elephant');
select * from t order by k;
\t off

This is the output in PG—as expected:

 Inserting "ant"
 No exception

 Inserting "dog"
 unique_violation

 Inserting "elephant"
 others

 1 | dog
 2 | cat
 3 | frog
 4 | ant

And this is the buggy output in YB:

 Inserting "ant"
 No exception

 Inserting "dog"
 No exception
 unique_violation

 Inserting "elephant"
 others

 1 | dog
 2 | cat
 3 | frog
 4 | ant

The "others" case behaves correctly. But, in the "unique_violation" case, execution blunders on even after the exception has happened. Only then does control pass to the exception handler.

Second order buggy behavior

Now try this. It models collecting more information to give user-friendly feedback when an expected (but regrettable) error happens and to log diagnostic info when an unexpected error happens.

drop function if exists f(text) cascade;
drop table if exists t cascade;
create table t(k serial primary key, v varchar(5) not null);
create unique index t_v_unq on t(v);
insert into t(v) values ('dog'), ('cat'), ('frog');

create function f(new_v in text)
  returns table(z text)
  security definer
  language plpgsql
as $body$
declare
  sqlstate_    text not null := '';
  constraint_  text not null := '';
  msg          text not null := '';
begin
  z := 'Inserting "'||new_v||'"'; return next;
  begin
    insert into t(v) values (new_v);
    z := 'No exception'; return next;
  exception
    when others then
      get stacked diagnostics
        sqlstate_ = returned_sqlstate,
        constraint_ = constraint_name,
        msg = message_text;
      if sqlstate_ = '23505' then
        z := 'unique_violation'; return next;
        z := 'constraint: '||constraint_; return next;
        z := 'message:    '||msg; return next;
      else
        z := 'others'; return next;
        z := 'sqlstate:   '||sqlstate_; return next;
        z := 'message:    '||msg; return next;
    end if;
  end;
end;
$body$;

\t on
select z from f('ant');
select z from f('dog');
select z from f('elephant');
\t off

This is the output in PG—as expected:

 Inserting "ant"
 No exception

 Inserting "dog"
 unique_violation
 constraint: t_v_unq
 message:    duplicate key value violates unique constraint "t_v_unq"

 Inserting "elephant"
 others
 sqlstate:   22001
 message:    value too long for type character varying(5)

And this is the buggy output in YB:

 Inserting "ant"
 No exception

 Inserting "dog"
 No exception
 unique_violation
 constraint: 
 message:    duplicate key value violates unique constraint "t_v_unq"

 Inserting "elephant"
 others
 sqlstate:   22001
 message:    value too long for type character varying(5)

We see the same "blundering on" error. And we see, further, that the name of the constraint that was violated (from get stacked diagnostics) is NULL.

This explains why error-logging-demo.sql doesn't work in YB as it should. Diff the attached error-logging-demo-yb.txt and error-logging-demo-yb.txt to see the problem.

@bllewell bllewell added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Apr 15, 2022
@bllewell
Copy link
Contributor Author

issue-12184.zip

@pkj415
Copy link
Contributor

pkj415 commented May 3, 2022

The issue occurs due to buffering of operations.

Explanation of buffering and an optimization -

YSQL buffers operations (writes specifically) unless it hits some condition that forces it to flush the buffer. One such condition is the completion of a txn, or a exception handling block in functions.

NOTE: writes for a statement (like INSERT/UPDATE) need not always be flushed before the statement returns in YSQL. The writes can be flushed with the next statement if it is part of the same txn. This is okay because, even before flushing, we would know the number of inserts/updates to be done and return that number to the user client (like INSERT x). If an error occurs in the rpc, it will anyway be caught in some later flush before the txn commits.

In a nutshell, YSQL makes quick progress on DMLs in a txn by returning the anticipated number of tuples updated in case of a success without even executing the writes and just storing them in a buffer. If there is a later read, or write on the same key, YSQL will flush buffered writes before making progress. This is okay because any errors that occur will be caught by the commit step since we flush the buffer on commit.

The optimization was introduced in https://phabricator.dev.yugabyte.com/D7782 with this line - RETURN_NOT_OK(dml_write->Exec(rows_affected_count != nullptr /* force_non_bufferable */));

How it affects the test case -

select z from f('dog'); will trigger the function which buffers the write rpc and moves on to execute z := 'No exception'; return next; . The rpc to INSERT is flushed after this when the exception block ends.

The solution might be simple - flush always before statements like z := 'No exception', but before we fixate onto it, we need to check what other cases need to be handled. Since, finding and reasoning all cases is hard because the possibilities are huge, the safe and correct fix is to flush always when a statement completes. But, the safe and correct fix will hurt performance - we don't be buffering even in cases where we can (this will hurt TPCC. Also, if I remember correctly, this simple optimization helped us get good gains in TPCC)

@bllewell
Copy link
Contributor Author

bllewell commented May 4, 2022

More compelling testcase

Here's a more compelling testcase. First, some background: an account of the semantics for a PL/pgSQL block statement with an exception section, at any level of nesting:

  • A savepoint is always (implicitly) created on entering the executable section.

  • If an exception occurs, then the point of execution moves immediately to the exception section and the search a matching handler begins.

  • If an exception occurs, then there’s an automatic (implicit) rollback to that savepoint. (The programmer is unaware of its name.) Of course, only transactional changes are rolled back. These include not only changes to table content but also, for example, the effects of the set statement to assign a value to a session parameter (like set timezone = 'UTC'). Notice that this rollback-to-savepoint occurs whether or not a matching handler is found in the exception section.

  • Notably, non-transactional changes, like to the value of a PL/pgSQL variable or to a row that, in a table function, has been buffered for output when the function completes, are not rolled back to the savepoint.

  • Any exceptions that occur during the declaration section or the exception section bubble up to the immediately enclosing block statement—or to the client if this happens in the outermost block. This model holds, by extension, for arbitrarily deep block-in-block nestings.

Now, the testcase. Notice that function f() has two modes that provoke, respectively either the string_data_right_truncation exception or the unique_violation exception.

\o PG-false.txt
\t on
select rpad(version(), 30);
\t off

drop table if exists t cascade;
create table t(k serial primary key, v varchar(3) not null unique);

drop function if exists f(boolean);
create function f(bug in boolean)
  returns table(z text)
  language plpgsql
as $body$
declare
  n int  not null := 0;
begin
  z := ''; return next;
  begin
    for i in 1..3 loop
      n := n + 1;
      z := n::text;
      return next;
      insert into t(v) values(z);
    end loop;
  exception
    when others then null;
  end;

  z := ''; return next;
  begin
    for i in 1..3 loop
      n := n + 1;
      z := case
             when      bug  and (i = 2) then '1'
             when (not bug) and (i = 2) then 'abcd'
             else                             n::text
           end;
      return next;
      insert into t(v) values(z);
    end loop;
  exception
    when string_data_right_truncation then
      z := 'string_data_right_truncation handled';
      return next;
    when unique_violation then
      z := 'unique_violation handled';
      return next;
  end;

  z := ''; return next;
  begin
    for i in 1..3 loop
      n := n + 1;
      z := n::text;
      return next;
      insert into t(v) values(z);
    end loop;
  exception
    when others then null;
  end;
end;
$body$;

\t on
select z from f(bug=>false);
\t off
select v from t order by k;
\o

Notice that the script starts with the \o metacommand. Make sure that you edit the filename before starting the script to reflect the environment (PG or YB) and the mode choice (true or false) for f(), thus

PG-false.txt
PG-true.txt
YB-false.txt
YB-true.txt

The formal is called bug because when it's false and provokes the string_data_right_truncation exception, the script behaves correctly in YB (i.e. YB-false.txt is identical to PG-false.txt). And when bug is true and provokes the unique_violation exception, the script shows a YB bug (i.e. YB-true.txt differs from PG-false.txt).

The bug is that, even though the point of execution does eventually move to the exception section, it does this only after blundering on to the next iteration of the loop and through all the non-transactional statements until the insert is next reached.

The attached issue-12184.zip contains the .sql script and the four spool files.

@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 25, 2022
pkj415 added a commit to pkj415/yugabyte-db that referenced this issue May 31, 2022
…ng statements with non-transactional side-effects

Summary:
YSQL buffers tserver operations (writes specifically) unless it hits some
condition that forces it to flush the buffer and wait for the response. Some
conditions that force waiting for a buffer's response are - completion of a txn,
completion of an exception handling block in plpgsql, a read operation, or a
write to a key which already has a buffered write.

With buffering, execution can move on to later statements unless a flush and
response wait is required based on the conditions similar to those mentioned
above. For example - with autocommit mode off, writes for a statement (like
INSERT/UPDATE) are not flushed until required. Instead, they are buffered.
This is okay because, even before flushing, YSQL knows the number of
inserts/updates to be done and returns that number to the user client (as
"INSERT x"). If an error occurs in the rpc, it will anyway be caught in some
later flush before the txn commits (this was an optimzation introdcued in D7782).

Allowing execution to move on to later statements without waiting for the actual
work to be done on the tablet servers helps improve performance by buffering and
reduce latency. But, the downside is that any database violations associated with
the buffered operations might be observed after execution has moved to later
statements.

The downside is not an issue in many cases -- such as the optimization mentioned
above to not wait for writes of a DML in a transaction to complete. But, this can
cause issues with exception blocks as seen in gh issue yugabyte#12184. Incorrect behaviour
is observed when an exception, that occurs due to some statement's rpcs, is seen
after some non-reversible side-effect(s) have also been executed. Non-reversible
side-effects are those which don't modify the database and hence can't be undone
when an exception is caught, or when a txn is rolled back.

E.g:
create table test (k int primary key, v int);
insert into test values (1, 1);
create function ...
returns table(z text)
begin
z := "before insert"
insert into test values (1, 1); -- operation is buffered, will result in duplicate key violation
return next; -- statement sends "before insert" to user. This is a non-transactional side effect of which can't be undone
exception
<some handler logic>
end;

The fix is: buffered operations should be flushed and waited for before
executing any statements with non-reversible side effects in plpgsql.

Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressBufferingInPlpgsql#testPgRegressBufferingInPlpgsql

Reviewers: dmitry, mtakahara

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D16809
@bllewell
Copy link
Contributor Author

bllewell commented Jul 26, 2022

Yet another testcase

issue-12184-xtra.zip

This testcase provides extra information thus:

The YB-vs-PG difference in behavior that manifests with the '23505' error ("unique_violation") does not manifest with other errors...

...at least to the extent that the tests for the four other errors listed below show. The testcase uses just the generic "others" handler and it uses "get stacked diagnostics" to get all the available information when an error is handled, thus:

exception when others then
  get stacked diagnostics
      v_sqlstate    = returned_sqlstate,
      v_schema      = schema_name,
      v_table       = table_name,
      v_column      = column_name,
      v_datatype    = pg_datatype_name,
      v_constraint  = constraint_name,
      v_message     = message_text,
      v_detail      = pg_exception_detail,
      v_hint        = pg_exception_hint,
      v_context     = pg_exception_context;

Some of the facts are meaningless for some errors. So, according to the error, some are set to the empty string. (Notice that the PG designers decided to use empty string rather than NULL for this. The targets are all declared NOT NULL to confirm that this rule is never broken.)

For the "23505' error ("unique_violation"), some of the non-empty facts that are seen in PG are left as empty strings in YB, thus:

  • "schema_name"
  • "table_name"
  • "constraint_name"
  • "pg_exception_detail"

(My earlier testcases for this issue didn't examine every value that "get stacked diagnostics" returns.)

When the facts that you observe in an "others" handler are recorded in, for example, a table for after-the-fact problem diagnosis, the fact that four key facts are missing hinders the analysis.

For these other tested errors:

  • 23502"not_null_violation"
  • '23503'"foreign_key_violation"
  • '22001'"string_data_right_truncation"
  • '23514'"check_violation"

The report of the "get stacked diagnostics" outcome is identical in YB and PG.

The buggy "blundering on" in the executable section after the statement that causes the '23505' error is NOT seen in this new testcase

This new testcase has "return next" statements, in the block statement's executable section, after the "insert" statement that causes the error. Their effect is not seen when an error occurs—also when it's '23505'—in accordance with the proper behavior.

The testcase itself

set client_min_messages = warning;

drop function if exists  insert_row(text)  cascade;
drop table    if exists  masters           cascade;
drop table    if exists  details           cascade;

\t on
select rpad(version(), 30) as version;

---------------------------------------------------------------------------

create table masters(
  mk text
    constraint masters_pk primary key);

create table details(
  mk text,
  dk int,
  v varchar(5)
    not null
    constraint details_v_chk check(length(v) > 1),

  constraint details_pk primary key(mk, dk),

  constraint details_fk foreign key(mk)
    references masters(mk)
    match full
    on delete cascade
    on update restrict,

  constraint details_mk_v_unq unique(mk, v));

---------------------------------------------------------------------------

create function insert_row(vals in text)
  returns table(z text)
  language plpgsql
as $body$
declare
  v_sqlstate    text not null := '';
  v_schema      text not null := '';
  v_table       text not null := '';
  v_column      text not null := '';
  v_datatype    text not null := '';
  v_constraint  text not null := '';
  v_message     text not null := '';
  v_detail      text not null := '';
  v_hint        text not null := '';
  v_context     text not null := '';

  stmt constant text not null :=
    format('insert into details(mk, dk, v) values%s', vals);
begin
  z := rpad( format('--- %s ', stmt) , 100, '-'); return next;
  execute stmt;
  z := '';   return next;
  z := 'OK'; return next;

exception when others then
  get stacked diagnostics
      v_sqlstate    = returned_sqlstate,
      v_schema      = schema_name,
      v_table       = table_name,
      v_column      = column_name,
      v_datatype    = pg_datatype_name,
      v_constraint  = constraint_name,
      v_message     = message_text,
      v_detail      = pg_exception_detail,
      v_hint        = pg_exception_hint,
      v_context     = pg_exception_context;

                               z := '';            return next;
                               z := v_sqlstate;    return next;
  if v_schema      != '' then  z := v_schema;      return next; end if;
  if v_table       != '' then  z := v_table;       return next; end if;
  if v_column      != '' then  z := v_column;      return next; end if;
  if v_datatype    != '' then  z := v_datatype;    return next; end if;
  if v_constraint  != '' then  z := v_constraint;  return next; end if;
  if v_message     != '' then  z := v_message;     return next; end if;
  if v_detail      != '' then  z := v_detail;      return next; end if;
  if v_hint        != '' then  z := v_hint;        return next; end if;
  if v_context     != '' then  z := v_context;     return next; end if;
end;
$body$;

---------------------------------------------------------------------------
delete from details;
delete from masters;
insert into masters(mk) values ('apple'), ('orange'); 

-- OK.
select z from insert_row($$('apple', 1, 'dog')$$);
select z from insert_row($$('apple', 2, 'cat')$$);

-- These each cause exception.
select z from insert_row($$('apple',  null, 'ant')$$);
select z from insert_row($$('apple', 3, null )$$);

select z from insert_row($$('pear', 1, 'dog')$$);
select z from insert_row($$('apple', 4, 'parrot')$$);

select z from insert_row($$('apple', 1, 'ant')$$);
select z from insert_row($$('apple', 5, 'dog')$$);

select rpad('-', 100, '-');
\t off

Results

The testcase was run using these three envs and correspondingly-named output files (using \o) were written:

  • YB-2.15.0.1 on MacOS Big Sur → yb-mac.txt
  • PG 11.9 on Ubuntu → pg-ubuntu.txt
  • PG 14.4 on MacOS Big Sur → pg-mac.txt

They are included in the attached issue-12184-xtra.zip.

Use your favorite GUI differ to compare them pairwise.

  • pg-mac.txt differs from pg-ubuntu.txt in one in essential detail: the quality of "message_text" has been improved from (for example) « null value in column "v" violates not-null constraint » to « null value in column "v" of relation "details" violates not-null constraint ».
  • It's therefore best to compare yb-mac.txt with pg-ubuntu.txt to avoid noise caused by the inessential PG 11 to PG 14 improvement.

Here's how pg-ubuntu.txt and yb-mac.txt differ:

pg-ubuntu.txt

--- insert into details(mk, dk, v) values('apple', 1, 'ant') ---------------------------------------
 
23505
u1
details
details_pk
duplicate key value violates unique constraint "details_pk"
Key (mk, dk)=(apple, 1) already exists.
SQL statement "insert into details(mk, dk, v) values('apple', 1, 'ant')"                            +
PL/pgSQL function insert_row(text) line 17 at EXECUTE

--- insert into details(mk, dk, v) values('apple', 5, 'dog') ---------------------------------------
 
23505
u1
details
details_mk_v_unq
duplicate key value violates unique constraint "details_mk_v_unq"
Key (mk, v)=(apple, dog) already exists.
SQL statement "insert into details(mk, dk, v) values('apple', 5, 'dog')"                            +
PL/pgSQL function insert_row(text) line 17 at EXECUTE

----------------------------------------------------------------------------------------------------

yb-mac.txt

--- insert into details(mk, dk, v) values('apple', 1, 'ant') ---------------------------------------
 
23505
duplicate key value violates unique constraint "details_pk"
SQL statement "insert into details(mk, dk, v) values('apple', 1, 'ant')"                            +
PL/pgSQL function insert_row(text) line 17 at EXECUTE

--- insert into details(mk, dk, v) values('apple', 5, 'dog') ---------------------------------------
 
23505
duplicate key value violates unique constraint "details_mk_v_unq"
SQL statement "insert into details(mk, dk, v) values('apple', 5, 'dog')"                            +
PL/pgSQL function insert_row(text) line 17 at EXECUTE

----------------------------------------------------------------------------------------------------

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jul 27, 2022
@pkj415 pkj415 closed this as completed Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants