Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide "aggregation" functions in Ditto search #1525

Open
thjaeckle opened this issue Oct 28, 2022 · 4 comments
Open

Provide "aggregation" functions in Ditto search #1525

thjaeckle opened this issue Oct 28, 2022 · 4 comments
Labels
outlook something which could be done in the future

Comments

@thjaeckle
Copy link
Member

Utilising the MongoDB search index, Ditto could support aggregation queries via its search API, e.g. in order to find out the mean temperature value of hundreds/thousands of selected things.

Our used RQL syntax also provides "aggregations":

aggregate(<property|function>,...) - Aggregates the array, grouping by objects that are distinct for the provided properties, and then reduces the remaining other property values using the provided functions

..

sum(<property?>) - Finds the sum of every value in the array or if the property argument is provided, returns the sum of the value of property for every object in the array
mean(<property?>) - Finds the mean of every value in the array or if the property argument is provided, returns the mean of the value of property for every object in the array
max(<property?>) - Finds the maximum of every value in the array or if the property argument is provided, returns the maximum of the value of property for every object in the array
min(<property?>) - Finds the minimum of every value in the array or if the property argument is provided, returns the minimum of the value of property for every object in the array
recurse(<property?>) - Recursively searches, looking in children of the object as objects in arrays in the given property value

If any adopter of Ditto already had a need for such kind of aggregations, please provide feedback and your use-cases for such a feature.

@thjaeckle thjaeckle added the outlook something which could be done in the future label Oct 28, 2022
@thfries
Copy link
Contributor

thfries commented Apr 2, 2023

Hi @thjaeckle,
I recently thought about this feature. We have a clear need to have aggregations of important thing states to manage a large fleet of devices. To have an overview of groups of devices we need to aggregate the connectivity or health or update state of the devices within a group efficiently. Current solution is to replicate the important states in a relational db and update the replica based on a Ditto Kafka connection. Aggregation is then done efficiently in the RDB (home turf for that). But not good to having need for two databases.

So if there is an easy way to use aggregation capabilities of the MongoDB this would be beneficial.

  • aggregations should be possible based on the values of an attribute (e.g. "groupId")
  • policies of the things should be considered for aggregation (only aggregate value that the subject is allowed to see)

@thjaeckle
Copy link
Member Author

thjaeckle commented Apr 2, 2023

@thfries very interesting, thx for the input

aggregations should be possible based on the values of an attribute (e.g. "groupId")

You mean that eg the "groupId" would be used as "group by" in an aggregation?
But having another specified field?

From your comment I assume that you would need a "distinct count" aggregation?
I can currently not imagine how to declare such a aggregation with the RQL syntax.

For other queries and interpreting the sparse RQL documentation (and the lack of a specified count aggregation), I would assume that e.g. a "count" aggregation could look like:

GET /api/2/search/things?filter=aggregate(attributes/groupId,count())

With a response being e.g.:

{
  "items": [
    {
      "my-group-1": 42
    },
    {
      "my-group-2": 7
    }
  ], 
... 
}

Another example to find out the mean temperature of a room with devices providing temperature readings:

GET /api/2/search/things?filter=aggregate(attributes/roomId,mean(features/temperature/properties/currentReading))

With a response being e.g.:

{
  "items": [
    {
      "kitchen": 21.32
    },
    {
      "living": 22.03
    }
  ],
... 
}

@thfries
Copy link
Contributor

thfries commented Apr 3, 2023

Thanks for your examples. Sorry for not being precise here. You're right, the use case is not only about aggregation but also about grouping. The full example in SQL would be:

select room, connectivity, count(*)
from thing
group by room, connectivity

with a response being:

room connectivity count(*)
kitchen online 3
living offline 1
living online 4

So the interesting part is to get a statistic overview of your fleet.
May be using a SQL db for that purpose is a valid solution. But I know relational databases much better than RQL or MongoDB...

@thjaeckle
Copy link
Member Author

thjaeckle commented Apr 3, 2023

I try to find a syntax which would support your query..

GET /api/2/search/things?filter=aggregate(attributes/roomId,features/connectivity/properties/state,count())

With a response being e.g.:

{
  "items": [
    {
      "attributes/roomId": "kitchen", 
      "features/connectivity/properties/state": "online", 
      "count()": 3
    },
    {
      "attributes/roomId": "living", 
      "features/connectivity/properties/state": "offline", 
      "count()": 1
    },
    {
      "attributes/roomId": "living", 
      "features/connectivity/properties/state": "online", 
      "count()": 4
    }
  ],
... 
}

Thinkable would also be sorting aggregations:

GET /api/2/search/things?filter=aggregate(attributes/roomId,features/connectivity/properties/state,count())&sort(-count())

Translated to MongoDB this query could be (I did not verify yet):

db.search.aggregate([
    {$group : {_id:{"attributes/roomId":"$t.attributes.roomId", "features/connectivity/properties/state":"$t.features.connectivity.properties.state"}, "count()":{$count:{}}}},
    {$sort : {value:-1}}
])

When I find the time, I will try those aggregation queries in MongoDB. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
outlook something which could be done in the future
Projects
None yet
Development

No branches or pull requests

2 participants