Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays support #241

Open
bpintea opened this issue Apr 23, 2020 · 0 comments
Open

Arrays support #241

bpintea opened this issue Apr 23, 2020 · 0 comments

Comments

@bpintea
Copy link
Collaborator

bpintea commented Apr 23, 2020

Relates to elastic/elasticsearch#33204.

The array type is indeed standardised, though not [widely] supported. While JDBC has some support for arrays, ODBC lagged a lot behind the evolving SQL and still misses it -- the still to be released version 4.0 will introduce it (if such a release will still ever happen).

And since it's not part of the ODBC standard, even if a source defines a specific data type, no client will know how to recognise it as such.

There are generally three approaches out there:

  • offer a client-proprietary connector (sometimes through a simple expansion of the ODBC API);
  • expose virtual join tables for each non-scalar column; i.e. arrays or multisets (to cover nesting);
  • flatten the non-scalar types and expose it as textual type.

(Dedicated functions/syntax that flatten or index the array are offered as well, but these aren't useful for generic BI and query builders.)

Since any field being potentially an array in ES makes the 3rd option above turn every table column into a textual type. Furthermore, since finding out if a field contains multiple values - i.e. contains arrays or not - can only be discovered by inspecting the entire result set of a query makes the second option above impossible to implement practically.

Assuming the SQL API will always pass on the arrays in the answers, some possible driver implementation alternatives would be to implement user-controllable options to:

  • map certain SQL types to string (WVARCHAR);
  • map certain column names to string; both previous cases would concatenate arrays into a string representation;
  • have the driver pick a value by index for certain columns, or null if out of range; this assumes the user is interested in certain well known positions only (like in the logging use case).

Certain columns meaning a regex or ordinal position, plus potentially an index pattern.

One advantage of having this client-end flexibility is that each connection could be configured to behave differently, depending on the targeted index.

Support for arrays as parameters can be considered for completeness, but its current applicability with target applications is very limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant