Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If an ingest node changes the target index, the original index is still created. #36545

Closed
clement-tourriere opened this issue Dec 12, 2018 · 2 comments · Fixed by #39607
Closed
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v6.5.0

Comments

@clement-tourriere
Copy link

Elasticsearch version elasticsearch >= 6.5

Plugins installed: None

JVM version (11.0.1)

Date name processor creates prefixed index additionally to the date timed index.
This comportement has been introduced by this PR that creates the index before executing the pipeline: #32786

Steps to reproduce:

From a clean elasticsearch instance (from 6.5), just follow the example from date name processor documentation:

https://www.elastic.co/guide/en/elasticsearch/reference/current/date-index-name-processor.html

# Create the processor
curl -X PUT "localhost:9200/_ingest/pipeline/monthlyindex" -H 'Content-Type: application/json' -d'
{
  "description": "monthly date-time index naming",
  "processors" : [
    {
      "date_index_name" : {
        "field" : "date1",
        "index_name_prefix" : "myindex-",
        "date_rounding" : "M"
      }
    }
  ]
}
'

# Index a document

curl -X PUT "localhost:9200/myindex/_doc/1?pipeline=monthlyindex" -H 'Content-Type: application/json' -d'
{
  "date1" : "2016-04-25T12:02:01.789Z"
}
'


# my_index and myindex-2016-04-01 are created
curl -XGET "http://localhost:9200/_cat/indices?v"
@tlrx tlrx added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v6.5.0 labels Dec 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jakelandis
Copy link
Contributor

Discussed this issue in person today.

The issues isn't specific to the Date Index Name processor, rather any processor that can change the location of _index. Below is an alternative reproduction:

DELETE foo,bar
GET _cat/indices?v

PUT _ingest/pipeline/test1
{
  "processors": [
    {
      "set": {
        "field": "_index",
        "value": "foo"
      }
    }
  ]
}

PUT bar/_doc/1?pipeline=test1
{
  "a" : "b"
}

GET foo/_doc/1
GET bar/_doc/1
GET _cat/indices?v

^ Note that both foo and bar indices are created.

#32786 is correctly identified as the change which introduces this issue. That change was made to ensure that a default pipeline, when defined via an index template would correctly find the default pipeline for the very first document as described in #32758.

The fix for this needs to ensure it does not re-introduce #32758. A likely fix for this will be:
if there is not an explicit pipeline requested AND the (request) target index does not exist
then change the order back to the original pipeline execution before index creation,
and then explicitly find and read the index template that will match the index request and pull the default pipeline (if it exists) from the index template.

@jakelandis jakelandis changed the title Date index name processor creates an index from index name prefix If an ingest node changes the target index, the original index is still created. Mar 6, 2019
jakelandis added a commit that referenced this issue Mar 6, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (elastic#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates elastic#32786
Relates elastic#32758 
Closes elastic#36545
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (elastic#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates elastic#32786
Relates elastic#32758 
Closes elastic#36545
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (elastic#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates elastic#32786
Relates elastic#32758 
Closes elastic#36545
jakelandis added a commit that referenced this issue Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
jakelandis added a commit that referenced this issue Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
jakelandis added a commit that referenced this issue Mar 8, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v6.5.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants