Skip to content

Commit

Permalink
API: Add response filtering with filter_path parameter
Browse files Browse the repository at this point in the history
This change adds a new "filter_path" parameter that can be used to filter and reduce the responses returned by the REST API of elasticsearch.

For example, returning only the shards that failed to be optimized:
```
curl -XPOST 'localhost:9200/beer/_optimize?filter_path=_shards.failed'
{"_shards":{"failed":0}}%
```

It supports multiple filters (separated by a comma):
```
curl -XGET 'localhost:9200/_mapping?pretty&filter_path=*.mappings.*.properties.name,*.mappings.*.properties.title'
```

It also supports the YAML response format. Here it returns only the `_id` field of a newly indexed document:
```
curl -XPOST 'localhost:9200/library/book?filter_path=_id' -d '---hello:\n  world: 1\n'
---
_id: "AU0j64-b-stVfkvus5-A"
```

It also supports wildcards. Here it returns only the host name of every nodes in the cluster:
```
curl -XGET 'http://localhost:9200/_nodes/stats?filter_path=nodes.*.host*'
{"nodes":{"lvJHed8uQQu4brS-SXKsNA":{"host":"portable"}}}
```

And "**" can be used to include sub fields without knowing the exact path. Here it returns only the Lucene version of every segment:
```
curl 'http://localhost:9200/_segments?pretty&filter_path=indices.**.version'
{
  "indices" : {
    "beer" : {
      "shards" : {
        "0" : [ {
          "segments" : {
            "_0" : {
              "version" : "5.2.0"
            },
            "_1" : {
              "version" : "5.2.0"
            }
          }
        } ]
      }
    }
  }
}
```

Note that elasticsearch sometimes returns directly the raw value of a field, like the _source field. If you want to filter _source fields, you should consider combining the already existing _source parameter (see Get API for more details) with the filter_path parameter like this:

```
curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
{
  "hits" : {
    "hits" : [ {
      "_source":{"title":"Book #2"}
    }, {
      "_source":{"title":"Book #1"}
    }, {
      "_source":{"title":"Book #3"}
    } ]
  }
}
```
  • Loading branch information
tlrx committed May 26, 2015
1 parent 8052ca3 commit fce8d1f
Show file tree
Hide file tree
Showing 31 changed files with 1,986 additions and 66 deletions.
107 changes: 107 additions & 0 deletions docs/reference/api-conventions.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,113 @@ being consumed by a monitoring tool, rather than intended for human
consumption. The default for the `human` flag is
`false`.

[float]
=== Response Filtering

All REST APIs accept a `filter_path` parameter that can be used to reduce
the response returned by elasticsearch. This parameter takes a comma
separated list of filters expressed with the dot notation:

[source,sh]
--------------------------------------------------
curl -XGET 'localhost:9200/_search?pretty&filter_path=took,hits.hits._id,hits.hits._score'
{
"took" : 3,
"hits" : {
"hits" : [
{
"_id" : "3640",
"_score" : 1.0
},
{
"_id" : "3642",
"_score" : 1.0
}
]
}
}
--------------------------------------------------

It also supports the `*` wildcard character to match any field or part
of a field's name:

[source,sh]
--------------------------------------------------
curl -XGET 'localhost:9200/_nodes/stats?filter_path=nodes.*.ho*'
{
"nodes" : {
"lvJHed8uQQu4brS-SXKsNA" : {
"host" : "portable"
}
}
}
--------------------------------------------------

And the `**` wildcard can be used to include fields without knowing the
exact path of the field. For example, we can return the Lucene version
of every segment with this request:

[source,sh]
--------------------------------------------------
curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version'
{
"indices" : {
"movies" : {
"shards" : {
"0" : [ {
"segments" : {
"_0" : {
"version" : "5.2.0"
}
}
} ],
"2" : [ {
"segments" : {
"_0" : {
"version" : "5.2.0"
}
}
} ]
}
},
"books" : {
"shards" : {
"0" : [ {
"segments" : {
"_0" : {
"version" : "5.2.0"
}
}
} ]
}
}
}
}
--------------------------------------------------

Note that elasticsearch sometimes returns directly the raw value of a field,
like the `_source` field. If you want to filter _source fields, you should
consider combining the already existing `_source` parameter (see
<<get-source-filtering,Get API>> for more details) with the `filter_path`
parameter like this:

[source,sh]
--------------------------------------------------
curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
{
"hits" : {
"hits" : [ {
"_source":{"title":"Book #2"}
}, {
"_source":{"title":"Book #1"}
}, {
"_source":{"title":"Book #3"}
} ]
}
}
--------------------------------------------------


[float]
=== Flat Settings

Expand Down
4 changes: 4 additions & 0 deletions rest-api-spec/api/nodes.stats.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@
"options" : ["node", "indices", "shards"],
"default" : "node"
},
"filter_path": {
"type" : "list",
"description" : "A comma-separated list of fields to include in the returned response"
},
"types" : {
"type" : "list",
"description" : "A comma-separated list of document types for the `indexing` index metric"
Expand Down
4 changes: 4 additions & 0 deletions rest-api-spec/api/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@
"type" : "boolean",
"description" : "Specify whether query terms should be lowercased"
},
"filter_path": {
"type" : "list",
"description" : "A comma-separated list of fields to include in the returned response"
},
"preference": {
"type" : "string",
"description" : "Specify the node or shard the operation should be performed on (default: random)"
Expand Down
154 changes: 154 additions & 0 deletions rest-api-spec/test/nodes.stats/20_response_filtering.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
"Nodes Stats with response filtering":
- do:
cluster.state: {}

# Get master node id
- set: { master_node: master }

# Nodes Stats with no filtering
- do:
nodes.stats: {}

- is_true: cluster_name
- is_true: nodes
- is_true: nodes.$master.name
- is_true: nodes.$master.indices
- is_true: nodes.$master.indices.docs
- gte: { nodes.$master.indices.docs.count: 0 }
- is_true: nodes.$master.indices.segments
- gte: { nodes.$master.indices.segments.count: 0 }
- is_true: nodes.$master.jvm
- is_true: nodes.$master.jvm.threads
- gte: { nodes.$master.jvm.threads.count: 0 }
- is_true: nodes.$master.jvm.buffer_pools.direct
- gte: { nodes.$master.jvm.buffer_pools.direct.count: 0 }
- gte: { nodes.$master.jvm.buffer_pools.direct.used_in_bytes: 0 }

# Nodes Stats with only "cluster_name" field
- do:
nodes.stats:
filter_path: cluster_name

- is_true: cluster_name
- is_false: nodes
- is_false: nodes.$master.name
- is_false: nodes.$master.indices
- is_false: nodes.$master.jvm

# Nodes Stats with "nodes" field and sub-fields
- do:
nodes.stats:
filter_path: nodes.*

- is_false: cluster_name
- is_true: nodes
- is_true: nodes.$master.name
- is_true: nodes.$master.indices
- is_true: nodes.$master.indices.docs
- gte: { nodes.$master.indices.docs.count: 0 }
- is_true: nodes.$master.indices.segments
- gte: { nodes.$master.indices.segments.count: 0 }
- is_true: nodes.$master.jvm
- is_true: nodes.$master.jvm.threads
- gte: { nodes.$master.jvm.threads.count: 0 }
- is_true: nodes.$master.jvm.buffer_pools.direct
- gte: { nodes.$master.jvm.buffer_pools.direct.count: 0 }
- gte: { nodes.$master.jvm.buffer_pools.direct.used_in_bytes: 0 }

# Nodes Stats with "nodes.*.indices" field and sub-fields
- do:
nodes.stats:
filter_path: nodes.*.indices

- is_false: cluster_name
- is_true: nodes
- is_false: nodes.$master.name
- is_true: nodes.$master.indices
- is_true: nodes.$master.indices.docs
- gte: { nodes.$master.indices.docs.count: 0 }
- is_true: nodes.$master.indices.segments
- gte: { nodes.$master.indices.segments.count: 0 }
- is_false: nodes.$master.jvm

# Nodes Stats with "nodes.*.name" and "nodes.*.indices.docs.count" fields
- do:
nodes.stats:
filter_path: [ "nodes.*.name", "nodes.*.indices.docs.count" ]

- is_false: cluster_name
- is_true: nodes
- is_true: nodes.$master.name
- is_true: nodes.$master.indices
- is_true: nodes.$master.indices.docs
- gte: { nodes.$master.indices.docs.count: 0 }
- is_false: nodes.$master.indices.segments
- is_false: nodes.$master.jvm

# Nodes Stats with all "count" fields
- do:
nodes.stats:
filter_path: "nodes.**.count"

- is_false: cluster_name
- is_true: nodes
- is_false: nodes.$master.name
- is_true: nodes.$master.indices
- is_true: nodes.$master.indices.docs
- gte: { nodes.$master.indices.docs.count: 0 }
- is_true: nodes.$master.indices.segments
- gte: { nodes.$master.indices.segments.count: 0 }
- is_true: nodes.$master.jvm
- is_true: nodes.$master.jvm.threads
- gte: { nodes.$master.jvm.threads.count: 0 }
- is_true: nodes.$master.jvm.buffer_pools.direct
- gte: { nodes.$master.jvm.buffer_pools.direct.count: 0 }
- is_false: nodes.$master.jvm.buffer_pools.direct.used_in_bytes

# Nodes Stats with all "count" fields in sub-fields of "jvm" field
- do:
nodes.stats:
filter_path: "nodes.**.jvm.**.count"

- is_false: cluster_name
- is_true: nodes
- is_false: nodes.$master.name
- is_false: nodes.$master.indices
- is_false: nodes.$master.indices.docs.count
- is_false: nodes.$master.indices.segments.count
- is_true: nodes.$master.jvm
- is_true: nodes.$master.jvm.threads
- gte: { nodes.$master.jvm.threads.count: 0 }
- is_true: nodes.$master.jvm.buffer_pools.direct
- gte: { nodes.$master.jvm.buffer_pools.direct.count: 0 }
- is_false: nodes.$master.jvm.buffer_pools.direct.used_in_bytes

# Nodes Stats with "nodes.*.fs.data" fields
- do:
nodes.stats:
filter_path: "nodes.*.fs.data"

- is_false: cluster_name
- is_true: nodes
- is_false: nodes.$master.name
- is_false: nodes.$master.indices
- is_false: nodes.$master.jvm
- is_true: nodes.$master.fs.data
- is_true: nodes.$master.fs.data.0.path
- is_true: nodes.$master.fs.data.0.type
- is_true: nodes.$master.fs.data.0.total_in_bytes

# Nodes Stats with "nodes.*.fs.data.t*" fields
- do:
nodes.stats:
filter_path: "nodes.*.fs.data.t*"

- is_false: cluster_name
- is_true: nodes
- is_false: nodes.$master.name
- is_false: nodes.$master.indices
- is_false: nodes.$master.jvm
- is_true: nodes.$master.fs.data
- is_false: nodes.$master.fs.data.0.path
- is_true: nodes.$master.fs.data.0.type
- is_true: nodes.$master.fs.data.0.total_in_bytes
87 changes: 87 additions & 0 deletions rest-api-spec/test/search/70_response_filtering.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
"Search with response filtering":
- do:
indices.create:
index: test
- do:
index:
index: test
type: test
id: 1
body: { foo: bar }

- do:
index:
index: test
type: test
id: 2
body: { foo: bar }

- do:
indices.refresh:
index: [test]

- do:
search:
index: test
filter_path: "*"
body: "{ query: { match_all: {} } }"

- is_true: took
- is_true: _shards.total
- is_true: hits.total
- is_true: hits.hits.0._index
- is_true: hits.hits.0._type
- is_true: hits.hits.0._id
- is_true: hits.hits.1._index
- is_true: hits.hits.1._type
- is_true: hits.hits.1._id

- do:
search:
index: test
filter_path: "took"
body: "{ query: { match_all: {} } }"

- is_true: took
- is_false: _shards.total
- is_false: hits.total
- is_false: hits.hits.0._index
- is_false: hits.hits.0._type
- is_false: hits.hits.0._id
- is_false: hits.hits.1._index
- is_false: hits.hits.1._type
- is_false: hits.hits.1._id

- do:
search:
index: test
filter_path: "_shards.*"
body: "{ query: { match_all: {} } }"

- is_false: took
- is_true: _shards.total
- is_false: hits.total
- is_false: hits.hits.0._index
- is_false: hits.hits.0._type
- is_false: hits.hits.0._id
- is_false: hits.hits.1._index
- is_false: hits.hits.1._type
- is_false: hits.hits.1._id

- do:
search:
index: test
filter_path: [ "hits.**._i*", "**.total" ]
body: "{ query: { match_all: {} } }"

- is_false: took
- is_true: _shards.total
- is_true: hits.total
- is_true: hits.hits.0._index
- is_false: hits.hits.0._type
- is_true: hits.hits.0._id
- is_true: hits.hits.1._index
- is_false: hits.hits.1._type
- is_true: hits.hits.1._id

5 changes: 5 additions & 0 deletions src/main/java/org/elasticsearch/common/xcontent/XContent.java
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ public interface XContent {
*/
XContentGenerator createGenerator(OutputStream os) throws IOException;

/**
* Creates a new generator using the provided output stream and some filters.
*/
XContentGenerator createGenerator(OutputStream os, String[] filters) throws IOException;

/**
* Creates a new generator using the provided writer.
*/
Expand Down
Loading

0 comments on commit fce8d1f

Please sign in to comment.