Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory overhead of tracking #3833

Merged

Conversation

realyze
Copy link
Contributor

@realyze realyze commented Feb 18, 2024

TL;DR Preallocating a size + 100 item array when tracking dependencies seems overly generous and can lead to more frequent garbage collections, causing jank. This PR replaces that by preallocating same space used by dependencies in the last run.

Example

Let's consider this scenario where we read 1M computed values in an autorun.

const obs = mobx.observable.box(1);
const arr = Array.from({length: 1_000_000}, () => mobx.computed(() => obs.get()));

mobx.autorun(() => {
  arr.forEach((val) => val.get());
});

function main() {
  // Recompute computeds in autorun.
  mobx.runInAction(() => obs.set(obs.get() + 1));
}

window.__main = () => main();

A million may seem like a lot but if we read e.g. 2000 computed values in every frame of an interaction and the interaction takes 5 seconds, we get 60fps * 5s * 2000 reads per frame and we're at 600K already. So: not super unrealistic, actually.

I've done some testing on our Canva codebase and the overhead of preallocations during some interactions (e.g. panning) is ~98% (i.e., we only use 2% of the preallocated space, rest gets GC'd).

Impact

Memory profiling with mobx production build from CDN, I get ~430MB allocated when running. __main() (i.e., we have huge swathes of memory that'll need to be GC'd). With this change, this drops to ~36MB.

Screenshot 2024-02-19 at 11 54 35 am Screenshot 2024-02-19 at 11 53 05 am

Devtools profiler also (unsurprisingly) reports shorter time spend in GC for the build with this change (37ms vs 57ms); overall the __main() function runs ~30% faster with this change (compared to current main).

Of course, this is a lab scenario so the difference in real-world cases would presumably be different. However, I think it's quite reasonable to expect that in P99 case the number of dependencies between derivation runs doesn't increase dramatically.

Code change checklist

  • Added/updated unit tests
  • Updated /docs. For new functionality, at least API.md should be updated
  • Verified that there is no significant performance drop (yarn mobx test:performance)

Copy link

changeset-bot bot commented Feb 18, 2024

🦋 Changeset detected

Latest commit: fcbcf17

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
mobx Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@realyze realyze marked this pull request as draft February 18, 2024 10:14
@realyze
Copy link
Contributor Author

realyze commented Feb 18, 2024

Ah sorry, it's been a while since I last contributed to OS, I forgot that creating a PR on my fork will create the PR on the main repo. 😆 Let me get some memory profiling numbers first (in the coming days! and update the description.

@realyze
Copy link
Contributor Author

realyze commented Feb 22, 2024

@urugator 👋 WDYT? I'm not sure who to ask for review here but you've always been awesomely responsive so I thought of you. 😅 I'd be keen to get this in since I'm hopeful it could help remove some GC-related jank during intensive interactions.

@urugator
Copy link
Collaborator

I am wondering about 2 things:

  1. What the right heuristic should be.
    Would it make sense to have some minimum or different calculation? 20% on low numbers does almost nothing.
    Would it be worth to make it configurable (globally/per derivation)?
  2. Is prealocating room for variations actually needed/useful? I would imagine engines already handle this. Just skimmed through, but eg: https://itnext.io/v8-deep-dives-understanding-array-internals-5b17d7a28ecc

This allows V8 to achieve constant amortized time for push() and similar operations that grow the array. The following formula is used to determine the new size in situations when the internal array is not enough:
new_capacity = (old_capacity + 50%) + 16

I will try to notify @mweststrate to provide feedback/blessing.

@realyze
Copy link
Contributor Author

realyze commented Feb 23, 2024

Great feedback, thanks, @urugator !

Would it make sense to have some minimum or different calculation? 20% on low numbers does almost nothing.
Would it be worth to make it configurable (globally/per derivation)?

Good question. We use Math.ceil so we'd always preallocate at least one extra slot...which may or may not be enough but it's in line with the thinking that the deps count will likely not change a lot between runs (i.e., that more likely than not it would be enough).

Is prealocating room for variations actually needed/useful? I would imagine engines already handle this. Just skimmed through, but eg: itnext.io/v8-deep-dives-understanding-array-internals-5b17d7a28ecc

Excellent point. 👍 And I agree. I think the inital-run preallocation is the one that might make sense (because it can prevent the runtime from reallocating the array). But then again, it's unclear to me whether in average case what we save by avoiding the array reallocs isn't lost by the time we spend in GC. 🤷

Let's see if @mweststrate could shed more light on what the thinking was when we added the preallocation logic (which was...8 years ago so I hope his memory is better than mine 😆 )

@mweststrate
Copy link
Member

Sorry for the late follow up, I've been a bit swamped lately. Did you also run the performance suite in the repo @realyze ? But I think you are right, this seems overly aggressive. I think either deps are typically pretty stable, and if not, they can be hugely varying (e.g. a conditional filter over a large map or so). I suspect you could even merely reduce it to derivation.observing_.length and leave the rest to the engine

@realyze
Copy link
Contributor Author

realyze commented Feb 23, 2024

Thanks for the reply, @mweststrate ! I ran yarn mobx test:performance (I assume that's the perf suite) and it all passed.

@mweststrate
Copy link
Member

mweststrate commented Feb 23, 2024 via email

derivation.newObserving_ = new Array(
derivation.observing_.length
? Math.ceil(derivation.observing_.length * 1.2) // 20% extra room for variation in deps
: 100 // Establishing initial dependencies, preallocate constant space.
Copy link
Collaborator

@urugator urugator Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is derivation.observing_.length a good indicator of "initial"? Ideally we would like to avoid prealocating anything for non-initial runs if there are no observable deps, eg:

const Cmp = observer(() => <div>no-observable-deps</div>)

Maybe there is an opportunity to bail-out even earlier is this case, dunno.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, good question! I think we wouldn't re-run trackDerivedFunction in case there are no dependencies, right? (Because nothing "inside" would trigger a rerun and external calls would get the cached value.)

Copy link
Collaborator

@urugator urugator Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, deps can still change based on something not observable like prop/state.
The main point is that zero deps isn't special case - we assume the same thing - the number of deps don't change significantly between runs. Therefore we preallocate array of the same size - in this case an array of zero length. Probably not much else we can do. But we need to somehow differentiate between initial run (preallocate constant space) and non-initial run (preallocate zero). Probably we can just change the condition to: derivation.runId_ !== 0

@realyze
Copy link
Contributor Author

realyze commented Mar 4, 2024

OK, here's my results based on running the proxy perf suite 10 times and recording the time as reported by the shell time call: https://docs.google.com/spreadsheets/d/1b_TE_3xjiLhfxCKwc9UGOz_g-LUaG3cPNBpV3lzes9E/edit?usp=sharing

Both user and sys times are faster with this PR compared to current mobx master. The user one is has slightly higher variance so it's probably about the same. But the sys one—which should reflect the time spent on memory allocations—is significantly lower for this PR: ~9% less time spent in sys (with slightly lower variance compared to original).

So at least within the context of the perf suite, this change seems to make mobx faster.

@realyze realyze requested a review from urugator March 5, 2024 04:37
@mweststrate
Copy link
Member

This looks great, thanks for the detailed measurement. LGTM!

@urugator
Copy link
Collaborator

urugator commented Mar 6, 2024

I don't want to delay this PR any further, just a thought I want to leave here:
In case of zero deps, we could set newObserving_ to null/undefined and create the array lazily when needed. This would avoid the need to allocate and then GC the array, which memory wise is not of zero length. But it can also negatively impact all other cases - due to the extra check for undefined when registering a depedency.

@realyze
Copy link
Contributor Author

realyze commented Mar 6, 2024

In case of zero deps, we could set newObserving_ to null/undefined and create the array lazily when needed

I agree. On the other hand glancing at the code, there are some non-null assertions in the codebase (derivation.newObserving_!) so I'm not sure what that could possibly break. 😅 So I'd rather take this win (where I'm cofident mobx won't start throwing NPEs at us) and leave possible future optimizations to the future - with the obvious risk that that might never happen but as I tried to argument here, I believe optimizing the non-initial case will yield the biggest benefits anyway.

@urugator
Copy link
Collaborator

urugator commented Mar 7, 2024

One last thing, can you add changeset please? yarn changeset
Sorry, I didn't noticed until I was about to merge...

@realyze
Copy link
Contributor Author

realyze commented Mar 7, 2024

Ah neat, done via fcbcf17!

@urugator urugator merged commit 6111b09 into mobxjs:main Mar 7, 2024
1 check passed
@urugator
Copy link
Collaborator

urugator commented Mar 7, 2024

Thank you!

@github-actions github-actions bot mentioned this pull request Mar 7, 2024
@realyze
Copy link
Contributor Author

realyze commented Mar 12, 2024

@mweststrate / @urugator I'm not sure how the mobx npm release process works but could we please release 6.12.1 soon? 🙏

@urugator
Copy link
Collaborator

how the mobx npm release process works

Sadly, it tends not to work:
https://github.com/mobxjs/mobx/actions/runs/8213813479/job/22465634950#step:6:65

@mweststrate
Copy link
Member

mweststrate commented Mar 13, 2024 via email

@realyze
Copy link
Contributor Author

realyze commented Mar 18, 2024

@mweststrate gentle ping please 😆 Anything I could do to help?

@mweststrate
Copy link
Member

ok think I found the culprit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants