Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flowmix design doc #54

Open
zqhxuyuan opened this issue Jul 12, 2016 · 1 comment
Open

flowmix design doc #54

zqhxuyuan opened this issue Jul 12, 2016 · 1 comment

Comments

@zqhxuyuan
Copy link

zqhxuyuan commented Jul 12, 2016

writing a chinese document about flowmix designment(not totally finished yet): http://zqhxuyuan.github.io/2015/07/26/2015-09-11-Flowmix-CEP/
hopefully can help someone.

@cjnolet After deep into flowmix source code, I also have some question:

AggregatorWindow is composed of Aggregator and Window,
And Aggregator is response for storing aggregate variable, while Window is storing original Event.
Normally there are PartitionOp before AggregatorOp to do some group by operation.
And as partition make sure One Partition corresponding One Window.
If Window store at most 1000 events, and there are 1000 partition, suppose One event take 1kb size
So windows memory in AggregateBolt take 1000Partition*1000kb=1GB.
So that's why Aggregator store temporary variable which is good at aggregate result.
my question is If Aggregator temporary variable is good enough,why do we need Window events?

@cjnolet
Copy link
Member

cjnolet commented Jul 13, 2016

Hey,

The design doc looks great so far! I haven't look at it in extreme detail but I like what I saw upon a quick browse.

In reference to your question, if each event is 1kb in size and it is partitioned and grouped, the 1GB should actually be spread across the cluster. The trick with CEP here is to do the memory management to make sure heaps aren't blown (and I suppose if garbage collection becomes a concern, a local back end like Redis could help get it off the heap).

The reasoning behind storing the events and passing them along with the aggregate function is that events can expire out of a window and at that point they need to be expired from the aggregate function as well. Generally, it would be good practice to filter out only the attributes in the event that would be needed for the aggregation rather than storing every event.

If you have ideas on ways to minimize the footprint, pull requests would certainly be welcome here.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants