Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use file allocation size where it's beneficial #51116

Closed
adamsitnik opened this issue Apr 12, 2021 · 7 comments
Closed

Use file allocation size where it's beneficial #51116

adamsitnik opened this issue Apr 12, 2021 · 7 comments
Assignees
Labels
area-System.IO tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

  1. Identify places, where we know the file size up-front. Example: File.WriteAll*. File.Copy etc.
  2. Ensure that the given source code has benchmarks coverage in the performance repo (contribute new benchmarks).
  3. Change the source code to use the new FileStream ctor that allows for specifying allocationSize and specify the exact value (or estimation as OS is smart enough to shrink the file (when it's closed) if it was too much).
  4. Compare before vs after. If there is a visible performance or reliability gain, contribute the change to dotnet/runtime.

Since #51111 has not been merged yet, please use it for measurements.

@adamsitnik adamsitnik added area-System.IO tenet-performance Performance related issue labels Apr 12, 2021
@adamsitnik adamsitnik added this to the 6.0.0 milestone Apr 12, 2021
@ghost
Copy link

ghost commented Apr 12, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details
  1. Identify places, where we know the file size up-front. Example: File.WriteAll*. File.Copy etc.
  2. Ensure that the given source code has benchmarks coverage in the performance repo (contribute new benchmarks).
  3. Change the source code to use the new FileStream ctor that allows for specifying allocationSize and specify the exact value (or estimation as OS is smart enough to shrink the file (when it's closed) if it was too much).
  4. Compare before vs after. If there is a visible performance or reliability gain, contribute the change to dotnet/runtime.

Since #51111 has not been merged yet, please use it for measurements.

Author: adamsitnik
Assignees: Jozkee
Labels:

area-System.IO, tenet-performance

Milestone: 6.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 12, 2021
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Apr 12, 2021
@jozkee
Copy link
Member

jozkee commented Apr 13, 2021

For places where we receive text instead of bytes, e.g: File.WriteAllText, should we try to determine the size in bytes up-front?

@sakno
Copy link
Contributor

sakno commented May 3, 2021

Do we need support this for Memory-Mapped Files? Dup of this comment.

@dmex
Copy link

dmex commented May 11, 2021

@jozkee

Can you please try benchmarking #51298 again with the preallocation changes but adding FILE_FLAG_SEQUENTIAL_SCAN to kernel32.createfile and FILE_SEQUENTIAL_ONLY to ntdll.createfile?

For *Async methods please try using FILE_FLAG_RANDOM_ACCESS to kernel32.createfile or FILE_RANDOM_ACCESS to ntdll.createfile.

WriteAllBytes is a sequential operation but dotnet doesn't use correct flags in these cases... The cache manager is likely trying to prefetch data from the file and potentially degrading the performance when preallocation was used which is not what you want when using functions like WriteAllBytes which are sequential.

See also:
https://devblogs.microsoft.com/oldnewthing/20120120-00/?p=8493

@GSPP
Copy link

GSPP commented May 19, 2021

Why should async IO use RANDOM_ACCESS specifically? Just curious for the reasoning.

@dmex
Copy link

dmex commented May 20, 2021

@GSPP

Why should async IO use RANDOM_ACCESS specifically? Just curious for the reasoning.

Just to compare the benchmark results with the other benchmarks.

@jozkee reported a "perf regression of 10%" when using preallocation which was unexpected... I'm thinking some of the changes related to 40359 and 51111 might be missing a call somewhere after the handle was opened and potentially trashing/polluting the cache manager when running the benchmarks and by comparing the results of different flags we could probably determine the prediction model that was being used by the cache manager at the time.

@adamsitnik adamsitnik modified the milestones: 6.0.0, 7.0.0 Jul 22, 2021
@adamsitnik
Copy link
Member Author

I am closing the issue as it's pointless with the regression that has been recently introduced and backported to 6.0 (#59705)

@ghost ghost locked as resolved and limited conversation to collaborators Nov 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants