Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding rate limiting for command engine #10211

Merged
merged 85 commits into from
Mar 20, 2024

Conversation

stevenwinship
Copy link
Contributor

@stevenwinship stevenwinship commented Jan 5, 2024

What this PR does / why we need it: Adds rate limiting to command APIs in order to prevent over taxing of the system

Which issue(s) this PR closes: #9356

Special notes for your reviewer: Creates a redid service in docker but may need install/deploy work for production. Someone with more experience in this please chime in.

Suggestions on how to test this: Test with both superuser and non-superuser as well as guest account.

set up guest (tier 0) as 1 call per hour and authenticated user (tier 1) as 3 calls per hour.
curl http://localhost:8080/api/admin/settings/:RateLimitingDefaultCapacityTiers -X PUT -d '1,3'

PLSQL
INSERT INTO setting (name,content) VALUES (':RateLimitingDefaultCapacityTiers','1,3');
INSERT INTO setting (name,content) VALUES (':RateLimitingCapacityByTierAndAction','{"rateLimits":[{"tier": 0, "limitPerHour": 10, "actions": ["GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand"]}, {"tier": 0, "limitPerHour": 1, "actions": ["CreateGuestbookResponseCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]}, {"tier": 1, "limitPerHour": 2, "actions": ["CreateGuestbookResponseCommand", "GetLatestPublishedDatasetVersionCommand", "GetPrivateUrlCommand", "GetDatasetCommand", "GetLatestAccessibleDatasetVersionCommand", "UpdateDatasetVersionCommand", "DestroyDatasetCommand", "DeleteDataFileCommand", "FinalizeDatasetPublicationCommand", "PublishDatasetCommand"]}]}');

make more than 3 calls to a command API to see the exception (create dataset and delete dataset). Wait 30 minutes to see that the call now works.

after testing delete :RateLimitingDefaultCapacityTiers and : RateLimitingCapacityByTierAndAction from setting to turn off/ reset rate limiting to no limit
Note: settings are only read at startup so a restart will be needed to test turning off rate limiting

Does this PR introduce a user interface change? If mockups are available, please link/include them here: Not now

Is there a release notes update needed for this change?: in 9356-rate-limiting.md

Additional documentation:

@stevenwinship stevenwinship added Size: 3 A percentage of a sprint. 2.1 hours. D: FixRateLimitingBehaviors Address this collection of issues that impact rate limiting labels Jan 5, 2024
@stevenwinship stevenwinship added this to the 6.2 milestone Jan 5, 2024
@stevenwinship stevenwinship marked this pull request as draft January 5, 2024 16:29
@stevenwinship stevenwinship self-assigned this Jan 5, 2024
@coveralls
Copy link

coveralls commented Jan 5, 2024

Coverage Status

coverage: 20.661% (+0.09%) from 20.57%
when pulling a9b2514 on 9356-rate-limiting-command-engine
into 4f46d15 on develop.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@stevenwinship stevenwinship marked this pull request as ready for review January 5, 2024 21:27
@stevenwinship stevenwinship changed the title adding rate limiting for commmand engine adding rate limiting for command engine Jan 5, 2024
@stevenwinship stevenwinship removed their assignment Jan 5, 2024
@landreev landreev self-assigned this Jan 5, 2024
@landreev
Copy link
Contributor

landreev commented Jan 9, 2024

I've been educating myself about Redis; I've never used it before. Had a few questions, but was able to figure things out on my own so far. But I will be asking more questions/adding comments today.

@pdurbin pdurbin self-assigned this Jan 9, 2024
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at docs so far but I love the granularity, that each command can have it's own limits.

doc/release-notes/9356-rate-limiting.md Outdated Show resolved Hide resolved
doc/release-notes/9356-rate-limiting.md Show resolved Hide resolved

This comment has been minimized.

1 similar comment

This comment has been minimized.

@pdurbin
Copy link
Member

pdurbin commented Jan 9, 2024

I jumped on the branch and ran this to limit to 3 calls per hour:

curl http://localhost:8080/api/admin/settings/:RateLimitingDefaultCapacityTiers -X PUT -d '1,3'

Then I ran mvn test -Dtest=DatasetsIT figuring that it'll more than three collections, datasets etc.

I got what is perhaps an expected error (though it repeats itself):

{
    "status": "ERROR",
    "message": "Error creating dataverse: Error creating dataverse."
}

However, I'm seeing 500 errors: Expected status code <201> but was <500>. I imagine we shouldn't be throwing 500 errors.

I'm also seeing this in server.log: dev_dataverse> Caused by: jakarta.ejb.TransactionRolledbackLocalException: Exception thrown from bean: java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because "index" is null

Anyway, this was a quick test.

This comment has been minimized.

2 similar comments

This comment has been minimized.

This comment has been minimized.

@stevenwinship stevenwinship force-pushed the 9356-rate-limiting-command-engine branch from 5d98165 to 6b71621 Compare January 10, 2024 20:26

This comment has been minimized.

1 similar comment

This comment has been minimized.

stevenwinship and others added 5 commits March 18, 2024 13:52
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
* style(cache): switch from Gson to JSON-B via JSR-367

Avoiding usage of GSON will eventually allow us to reduce dependencies. Standards for the win!

* style(cache): address SonarLint suggestions for code improvements

- Remove unnecessary StringBuffers
- Switch to better readable else-if construction to determine capacity
- Add missing generics
- Remove stale import
@stevenwinship stevenwinship force-pushed the 9356-rate-limiting-command-engine branch from 36588b7 to a1ab6f9 Compare March 18, 2024 17:53

This comment has been minimized.

@landreev
Copy link
Contributor

@stevenwinship @poikilotherm What was decided for a multi-server case, like ours? Do we want to invest into figuring out how to share the same hazelcast cache between independent Payara instances - or are we going to be content with each node using its own?

clarifying that "... restart is required ..."
@poikilotherm
Copy link
Contributor

@landreev Payara autoclusters via Hazelcast if the instances can reach each other.

This comment has been minimized.

@landreev
Copy link
Contributor

landreev commented Mar 20, 2024

Payara autoclusters via Hazelcast if the instances can reach each other.

@poikilotherm But, "clusters" appears to be the key word here - from spending some time looking into this on Monday, my takeaway was that it would involve changing our config, from independent instances to a payara cluster. And that's kind of a big change, no?
Unless I'm missing some straightforward simpler alternative.

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to figure out, and experiment with a multi-instance setup sharing a cache eventually. I don't think this needs to be done as part of this PR (I started looking into it, but concluded it should be a separate effort) so I don't think we need anything in the guide on this case either at this point.

I have a very practical and simple request though: I want to be able to use this functionality to throttle access to the main, workhorse pages - dataset.xhtml and dataverse.xhtml. So I'd like to have 2 dedicated commands - something like CheckRateLimitForDatasetPage and CheckRateLImitForCollectionPage commands - that don't do anything, just to be executed as the first thing in the init() methods of the respective pages. If the command throws a rate limit exception there, I don't even want the page to do anything special/create any new error pages necessarily - return permissionsWrapper.notAuthorized(); would be perfectly acceptable.

Aside from this, I believe we are ready to merge this, finally. Good-looking code.

This comment has been minimized.

@landreev
Copy link
Contributor

For the record, this is the exception that's thrown on consecutive attempts to deploy (without restarting Payara):
Caused by: javax.cache.CacheException: A cache named 'rateLimitCache' already exists

This comment has been minimized.

1 similar comment

This comment has been minimized.

@landreev
Copy link
Contributor

@stevenwinship

For the record, this is the exception that's thrown on consecutive attempts to deploy (without restarting Payara): Caused by: javax.cache.CacheException: A cache named 'rateLimitCache' already exists

Does it just need a
manager.destroyCache(RATE_LIMIT_CACHE);
line in CacheFactoryBean, before trying to create it?

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9356-rate-limiting-command-engine
ghcr.io/gdcc/configbaker:9356-rate-limiting-command-engine

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@stevenwinship stevenwinship force-pushed the 9356-rate-limiting-command-engine branch from 6a4552a to a9b2514 Compare March 20, 2024 19:55
@landreev landreev merged commit cb3bd0e into develop Mar 20, 2024
19 of 20 checks passed
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9356-rate-limiting-command-engine
ghcr.io/gdcc/configbaker:9356-rate-limiting-command-engine

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@landreev landreev removed their assignment Mar 20, 2024
@poikilotherm poikilotherm deleted the 9356-rate-limiting-command-engine branch March 21, 2024 08:37
@landreev landreev restored the 9356-rate-limiting-command-engine branch March 21, 2024 13:05
@landreev
Copy link
Contributor

This is not looking good, in that I'm not seeing any traces of my commits that I made yesterday (fixing the 500/internal server errors from the Dataset and Dataverse pages, and renaming the commands). @stevenwinship did you revert my changes by any chance? - I'm looking at:
Screen Shot 2024-03-21 at 9 12 14 AM

Not the end of the world, we may just need to make a followup pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D: FixRateLimitingBehaviors Address this collection of issues that impact rate limiting Size: 10 A percentage of a sprint. 7 hours.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add execution rate metering to the command engine
7 participants