Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] binary Attribute in Lucene full text fields does not work any more. #5431

Open
scheidelerl opened this issue Aug 29, 2024 · 9 comments
Labels
investigate issues being looked at

Comments

@scheidelerl
Copy link

scheidelerl commented Aug 29, 2024

Describe the bug
The use of binary attribute for fields in full text index does not work any more.

Expected behavior
The use of the binaryattribute should work.

To Reproduce
Try to index a field with as binary.

With binary:

xquery version "3.1";

module namespace t="http://exist-db.org/xquery/test";
(:  LIBRARIES  :)
declare namespace test="http://exist-db.org/xquery/xqsuite";
(:  NAMESPACES  :)
declare namespace array="http://www.w3.org/2005/xpath-functions/array";
declare namespace exist="http://exist.sourceforge.net/NS/exist";
declare namespace ft="http://exist-db.org/xquery/lucene";
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace xmldb="http://exist-db.org/xquery/xmldb";

(:  VARIABLES  :)
declare variable $t:XML :=
<div>
    <test>Adm. 1,10</test>
    <test>Bdm. 1,11</test>
    <test>Cdm. 1,12</test>
    <test>Edm. 1,1</test>
    <test>Fdm. 1,2</test>
    <test>Gdm. 1,3</test>
    <test>Zdm. 1,4</test>
    <test>Wdm. 1,5</test>
    <test>Odm. 1,6</test>
    <test>Ydm. 1,7</test>
    <test>Cdm. 1,8</test>
    <test>Vdm. 1,9</test>
    <test>Pdm. 1,13</test>
    <test>Edm. 1,14</test>
</div>;

declare variable $t:xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Full-text indexing with Lucene -->
    <lucene>
      <!-- Elements upon which to build an index. -->
      <text qname="test">
        <field name="sortable" expression="./string()" type="xs:string" binary="yes"/>
      </text>
    </lucene>
  </index>
</collection>;


(:  FUNCTIONS  :)
declare
    %test:setUp
function t:setup() {
    let $testCol    := xmldb:create-collection("/db", "test")
    let $indexCol   := xmldb:create-collection("/db/system/config/db", "test")
    return (
        xmldb:store("/db/test", "test.xml", $t:XML),
        xmldb:store("/db/system/config/db/test", "collection.xconf", $t:xconf),
        xmldb:reindex("/db/test")
      )
};


declare
    %test:tearDown
function t:tearDown() {
    xmldb:remove("/db/test"),
    xmldb:remove("/db/system/config/db/test")
};


declare
    %test:name('Sorted result.')
    %test:assertExists
    %test:assertXPath('count(doc("/db/test/test.xml")//test) eq count($result)')
    %test:assertError("err:XPTY0004")
function t:sorted-result() as xs:string* {
    let $options := map {
        'fields': ('sortable')
    }
    let $index := doc("/db/test/test.xml")/div[ft:query(., (), $options)]
    return 
    (

        let $values := ft:binary-field($index, "sortable","xs:string")
            where count($values gt 0 )
        for $field in $values
            order by $field ascending
        return 
        (
            $field
        )
    )
};

Without binary:

xquery version "3.1";

module namespace t="http://exist-db.org/xquery/test";
(:  LIBRARIES  :)
declare namespace test="http://exist-db.org/xquery/xqsuite";
(:  NAMESPACES  :)
declare namespace array="http://www.w3.org/2005/xpath-functions/array";
declare namespace exist="http://exist.sourceforge.net/NS/exist";
declare namespace ft="http://exist-db.org/xquery/lucene";
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace xmldb="http://exist-db.org/xquery/xmldb";

(:  VARIABLES  :)
declare variable $t:XML :=
<div>
    <test>Adm. 1,10</test>
    <test>Bdm. 1,11</test>
    <test>Cdm. 1,12</test>
    <test>Edm. 1,1</test>
    <test>Fdm. 1,2</test>
    <test>Gdm. 1,3</test>
    <test>Zdm. 1,4</test>
    <test>Wdm. 1,5</test>
    <test>Odm. 1,6</test>
    <test>Ydm. 1,7</test>
    <test>Cdm. 1,8</test>
    <test>Vdm. 1,9</test>
    <test>Pdm. 1,13</test>
    <test>Edm. 1,14</test>
</div>;

declare variable $t:xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Full-text indexing with Lucene -->
    <lucene>
      <!-- Elements upon which to build an index. -->
      <text qname="div">
        <field name="sortable" expression="./test/string()"/>
      </text>
    </lucene>
  </index>
</collection>;


(:  FUNCTIONS  :)
declare
    %test:setUp
function t:setup() {
    let $testCol    := xmldb:create-collection("/db", "test")
    let $indexCol   := xmldb:create-collection("/db/system/config/db", "test")
    return (
        xmldb:store("/db/test", "test.xml", $t:XML),
        xmldb:store("/db/system/config/db/test", "collection.xconf", $t:xconf),
        xmldb:reindex("/db/test")
      )
};


declare
    %test:tearDown
function t:tearDown() {
    xmldb:remove("/db/test"),
    xmldb:remove("/db/system/config/db/test")
};


declare
    %test:name('Sorted result.')
    %test:assertExists
    %test:assertXPath('count(doc("/db/test/test.xml")//test) eq count($result)')
    %test:assertError("err:XPTY0004")
function t:sorted-result() as xs:string* {
    let $options := map {
        'fields': ('sortable')
    }
    let $index := doc("/db/test/test.xml")/div[ft:query(., (), $options)]
    return 
    (

        let $values := ft:field($index, "sortable","xs:string")
            where count($values gt 0 )
        for $field in $values
            order by $field ascending
        return 
        (
            $field
        )
    )
};

Context (please always complete the following information)

  • Build: eXist-6.2.0
  • Java: 1.8.0_422
  • OS: Ubuntu 22.04.4 LTS - Linux 6.8.0-40-generic amd64

Additional context

  • How is eXist-db installed? JAR installer
@line-o line-o added the investigate issues being looked at label Aug 29, 2024
@line-o
Copy link
Member

line-o commented Aug 29, 2024

@scheidelerl Thank you for this complete issue report. I would like to know with which version of exist-db the above test-suite passes.

@line-o
Copy link
Member

line-o commented Aug 29, 2024

In order to read values of binary fields a new function was added ft:binary-field and I cannot see you using it. Maybe that is the issue?

@line-o
Copy link
Member

line-o commented Aug 29, 2024

see also https://exist-db.org/exist/apps/doc/lucene#retrieve-fields "Retrieving Field Content"

@scheidelerl
Copy link
Author

Hey,
thank you for the reply.
The eXist-db Version is the current build 6.2.0. Installed with the JAR Installer.
If I use ft:binary-field($index, ‘sortable’, ‘xs:string’) instead, which I think should be the intended way of using it, it doesn't work either.
In eXide the attribute binary show this linter error : [cvc-complex-type.3.2.2: Attribute 'binary' is not allowed to appear in element 'field']
The eXist log file does not say anything about it.
If I use the field without binary no problem occur.
If I try to apply the collection.xconf with eXide this error occurs: Failed to apply configuration: DocValuesField "sortable" appears more than once in this document (only one value is allowed per field)

@scheidelerl
Copy link
Author

scheidelerl commented Aug 29, 2024

If I use only doc("/db/test/test.xml")/div[ft:query(., ())] with the binary attribute in the field child, the result is empty, and ft:field($node as node(), $field as xs:string) and ft:binary-field($node as node(), $field as xs:string, $type as xs:string) throw this errors:

  • err:XPTY0004 It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in 2.5.4 SequenceType Matching. checking function parameter 1 in call ft:field($index, "sortable"): XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: ft:field($node as node(), $field as xs:string) item()*. Expected cardinality: exactly one, got 0.
  • err:XPTY0004 It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in 2.5.4 SequenceType Matching. checking function parameter 1 in call ft:binary-field($index, "sortable", "xs:string"): XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: ft:binary-field($node as node(), $field as xs:string, $type as xs:string) item()*. Expected cardinality: exactly one, got 0.

And I know what it means, it's self-explanatory.
But it means that it does not perform the full index because an error occurs. Which is not listed in the log or otherwise and is ultimately related to the attribute, because it works without it.

@line-o
Copy link
Member

line-o commented Aug 29, 2024

@scheidelerl you need to have some hits in order to sort them using the binary field values. I suspect that your call to ft:query returns an empty sequence. Can you check that?

@line-o
Copy link
Member

line-o commented Aug 29, 2024

In eXide the attribute binary show this linter error : [cvc-complex-type.3.2.2: Attribute 'binary' is not allowed to appear in element 'field']

It can very well be, that the schema was not updated to add the binary attribute.

@scheidelerl
Copy link
Author

I updated the test above and added a new one, one with and one without binary.
It works when I use the element directly, because binary seems to need a single value.
The hint was the error with the index apply in eXide.

This brings me to the following questions:

  1. Why is this not the case for normal fields, so that the behaviour is adaptable when I realize that I don't need to query certain values?
  2. Why is there no reference to this in the documentation and please don't tell me that it is sufficiently explained because the default value is specified as xs:string?
  3. Why the log does not show this as an error when I apply the index?
  4. Why I cannot declare type="xs:string*"to prevent this error?
  5. Why this works in 5.4.0?

!!!! → 6. What do I have to do if I only want to perform a query above the parent level and have several values in one field, but want to have faster access?

@line-o
Copy link
Member

line-o commented Aug 30, 2024

Why this works in 5.4.0?

As far as I know binary fields were added in version 6.2.0. That means it cannot work in version 5.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate issues being looked at
Projects
None yet
Development

No branches or pull requests

2 participants