Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] Add geo_line aggregation (#41612) #65442

Merged
merged 1 commit into from
Nov 24, 2020
Merged

Conversation

talevy
Copy link
Contributor

@talevy talevy commented Nov 24, 2020

A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.

A geo_line aggregation request would specify a geo_point field, as well
as a sort field. geo_point represents the values used in the LineString,
while the sort values will be used as the total ordering of the points.

the sort field would support any numeric field, including date.

{
	"query": {
		"bool": {
			"must": [
				{ "term": { "person": "004" } },
				{ "term": { "trajectory": "20090131002206.plt" } }
			]
		}
	},
	"aggs": {
		"make_line": {
			"geo_line": {
				"point": {"field": "location"},
				"sort": { "field": "timestamp" },
                                "include_sort": true,
                                "sort_order": "desc",
                                "size": 15
			}
		}
	}
}
{
    "took": 21,
    "timed_out": false,
    "_shards": {...},
    "hits": {...},
    "aggregations": {
        "make_line": {
            "type": "LineString",
            "coordinates": [
                [
                    121.52926194481552,
                    38.92878997139633
                ],
                [
                    121.52922699227929,
                    38.92876998055726
                ],
             ]
        }
    }
}

Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.

One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore

backport of #41612

A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.

A `geo_line` aggregation request would specify a `geo_point` field, as well
as a `sort` field. `geo_point` represents the values used in the LineString,
while the `sort` values will be used as the total ordering of the points.

the `sort` field would support any numeric field, including date.

```
{
	"query": {
		"bool": {
			"must": [
				{ "term": { "person": "004" } },
				{ "term": { "trajectory": "20090131002206.plt" } }
			]
		}
	},
	"aggs": {
		"make_line": {
			"geo_line": {
				"point": {"field": "location"},
				"sort": { "field": "timestamp" },
                                "include_sort": true,
                                "sort_order": "desc",
                                "size": 15
			}
		}
	}
}
```

```
{
    "took": 21,
    "timed_out": false,
    "_shards": {...},
    "hits": {...},
    "aggregations": {
        "make_line": {
            "type": "LineString",
            "coordinates": [
                [
                    121.52926194481552,
                    38.92878997139633
                ],
                [
                    121.52922699227929,
                    38.92876998055726
                ],
             ]
        }
    }
}
```

Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.

One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore

closes elastic#41649
@talevy talevy added :Analytics/Geo Indexing, search aggregations of geo points and shapes backport Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Nov 24, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@talevy talevy changed the title Add geo_line aggregation (#41612) [backport] Add geo_line aggregation (#41612) Nov 24, 2020
@talevy talevy merged commit cd7d1c9 into elastic:7.x Nov 24, 2020
@talevy talevy deleted the 7x-geoline branch November 24, 2020 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes backport Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants