Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] neural query explain not showing details for nested field #698

Open
yuye-aws opened this issue Apr 19, 2024 · 3 comments
Open
Assignees
Labels
Features Introduces a new unit of functionality that satisfies a requirement

Comments

@yuye-aws
Copy link
Member

yuye-aws commented Apr 19, 2024

What is the bug?

Searching for nested field works well. However, we cannot obtain the detailed explanation for search results GET {indexname}/_search?explain=true

How can one reproduce the bug?

First, create an index with nested field embedding, a sample document may look like:

{
    "text": "A Hybrid EP and SQP for Dynamic Economic Dispatch with Nonsmooth Fuel Cost Function Dynamic economic dispatch (DED) is one of the main functions of power generation operation and control. It determines the optimal settings of generator units with predicted load demand over a certain period of time. The objective is to operate an electric power system most economically while the system is operating within its security limits. This paper proposes a new hybrid methodology for solving DED. The proposed method is developed in such a way that a simple evolutionary programming (EP) is applied as a based level search, which can give a good direction to the optimal global region, and a local search sequential quadratic programming (SQP) is used as a fine tuning to determine the optimal solution at the final. Ten units test system with nonsmooth fuel cost function is used to illustrate the effectiveness of the proposed method compared with those obtained from EP and SQP alone.",
    "text_chunk_embedding": [
      {
        "knn": [...]
      },
      {
        "knn": [...]
      }
    ],
    "text_chunk": [
      "[CLS] a hybrid ep and sqp for dynamic economic dispatch with nonsmooth fuel cost function dynamic economic dispatch ( ded ) is one of the main functions of power generation operation and control. it determines the optimal settings of generator units with predicted load demand over a certain period of time. the objective is to operate an electric power system most economically while the system is operating within its security limits. this paper proposes a new hybrid methodology for solving ded. the proposed method is developed in such a way that a simple evolutionary programming ( ep ) is applied as a based level search, which can give a good direction to the optimal global region, and",
      "a local search sequential quadratic programming ( sqp ) is used as a fine tuning to determine the optimal solution at the final. ten units test system with nonsmooth fuel cost function is used to illustrate the effectiveness of the proposed method compared with those obtained from ep and sqp alone. [SEP]"
    ]
}

Then, use the explain query to search the document:

GET {indexname}/_search?explain=true
{
  "size": 1,
  "_source": {
    "excludes": "text_chunk_embedding"
  },
  "query": {
    "nested": {
      "score_mode": "avg",
      "path": "text_chunk_embedding",
      "query": {
        "neural": {
          "text_chunk_embedding.knn": {
            "model_id": "PDx55Y4BxByNDM4P0mdQ",
            "query_text": "Global-Locally Self-Attentive Dialogue State Tracker"
          }
        }
      }
    }
  }
}

Currently, the explanation for search results is

"_explanation": {
  "value": 0.021672908,
  "description": "Score based on 3 child docs in range from 6364 to 6366, using score mode Avg",
  "details": [
    {
      "value": 0.021672908,
      "description": "sum of:",
      "details": [
        {
          "value": 1,
          "description": "No Explanation",
          "details": []
        },
        {
          "value": 0,
          "description": "match on required clause, product of:",
          "details": [
            {
              "value": 0,
              "description": "# clause",
              "details": []
            },
            {
              "value": 1,
              "description": "_nested_path:text_chunk_embedding",
              "details": []
            }
          ]
        }
      ]
    }
  ]
}

What is the expected behavior?

The explain query should at least show score for each nested document like the BM25 query.

What is your host/environment?

Operating system, version.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

@yuye-aws yuye-aws added bug Something isn't working untriaged labels Apr 19, 2024
@yuye-aws
Copy link
Member Author

yuye-aws commented Apr 19, 2024

If I search with BM25 query:

GET {indexname}/_search?explain=true
{
  "size": 1,
  "query": {
    "match": {
      "text_chunk": "Global-Locally Self-Attentive Dialogue State Tracker"
    }
  }
}

The explanation is very detailed like

{
    "value": 18.182425,
    "description": "sum of:",
    "details": [
      {
        "value": 4.982006,
        "description": "weight(text_chunk:self in 20446) [PerFieldSimilarity], result of:",
        "details": [
          {
            "value": 4.982006,
            "description": "score(freq=2.0), computed as boost * idf * tf from:",
            "details": [
              {
                "value": 2.2,
                "description": "boost",
                "details": []
              },
              {
                "value": 3.1877272,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 351,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 8517,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              },
              {
                "value": 0.7103958,
                "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details": [
                  {
                    "value": 2,
                    "description": "freq, occurrences of term within document",
                    "details": []
                  },
                  {
                    "value": 1.2,
                    "description": "k1, term saturation parameter",
                    "details": []
                  },
                  {
                    "value": 0.75,
                    "description": "b, length normalization parameter",
                    "details": []
                  },
                  {
                    "value": 104,
                    "description": "dl, length of field (approximate)",
                    "details": []
                  },
                  {
                    "value": 181.63051,
                    "description": "avgdl, average length of field",
                    "details": []
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value": 10.799234,
        "description": "weight(text_chunk:attentive in 20446) [PerFieldSimilarity], result of:",
        "details": [
          {
            "value": 10.799234,
            "description": "score(freq=2.0), computed as boost * idf * tf from:",
            "details": [
              {
                "value": 2.2,
                "description": "boost",
                "details": []
              },
              {
                "value": 6.9098706,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 8,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 8517,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              },
              {
                "value": 0.7103958,
                "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details": [
                  {
                    "value": 2,
                    "description": "freq, occurrences of term within document",
                    "details": []
                  },
                  {
                    "value": 1.2,
                    "description": "k1, term saturation parameter",
                    "details": []
                  },
                  {
                    "value": 0.75,
                    "description": "b, length normalization parameter",
                    "details": []
                  },
                  {
                    "value": 104,
                    "description": "dl, length of field (approximate)",
                    "details": []
                  },
                  {
                    "value": 181.63051,
                    "description": "avgdl, average length of field",
                    "details": []
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value": 2.401184,
        "description": "weight(text_chunk:state in 20446) [PerFieldSimilarity], result of:",
        "details": [
          {
            "value": 2.401184,
            "description": "score(freq=1.0), computed as boost * idf * tf from:",
            "details": [
              {
                "value": 2.2,
                "description": "boost",
                "details": []
              },
              {
                "value": 1.9813391,
                "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details": [
                  {
                    "value": 1174,
                    "description": "n, number of documents containing term",
                    "details": []
                  },
                  {
                    "value": 8517,
                    "description": "N, total number of documents with field",
                    "details": []
                  }
                ]
              },
              {
                "value": 0.55086344,
                "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details": [
                  {
                    "value": 1,
                    "description": "freq, occurrences of term within document",
                    "details": []
                  },
                  {
                    "value": 1.2,
                    "description": "k1, term saturation parameter",
                    "details": []
                  },
                  {
                    "value": 0.75,
                    "description": "b, length normalization parameter",
                    "details": []
                  },
                  {
                    "value": 104,
                    "description": "dl, length of field (approximate)",
                    "details": []
                  },
                  {
                    "value": 181.63051,
                    "description": "avgdl, average length of field",
                    "details": []
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
}

@martin-gaievski
Copy link
Member

@yuye-aws neural search will not have detailed response for explain as it uses knn query under the hood, and knn query doesn't support explain. Here is the corresponding GH issue for this matter: opensearch-project/k-NN#875

@yuye-aws
Copy link
Member Author

@yuye-aws neural search will not have detailed response for explain as it uses knn query under the hood, and knn query doesn't support explain. Here is the corresponding GH issue for this matter: opensearch-project/k-NN#875

Sorry for taking long to respond. It seems quite likely that after this issue will automatically get resolved after opensearch-project/k-NN#875. Just out of curiosity, do we have an ongoing plan to resolve the k-NN issue?

@naveentatikonda naveentatikonda added Features Introduces a new unit of functionality that satisfies a requirement and removed bug Something isn't working labels Sep 18, 2024
@naveentatikonda naveentatikonda changed the title [BUG] neural query explain not showing details for nested field [FEATURE] neural query explain not showing details for nested field Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement
Projects
Status: Backlog
Development

No branches or pull requests

3 participants