Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory tuning #2619

Closed
wants to merge 14 commits into from
Closed

Memory tuning #2619

wants to merge 14 commits into from

Conversation

michaelcfanning
Copy link
Member

@michaelcfanning michaelcfanning commented Feb 16, 2023

This change makes some tactical changes to the current threading model in order to maximize CPU utilization and limit memory consumption, while preserving the deterministic output required for certain current client tool users.

@kelewis has pointed out that building analysis around a limited number of flattened records designed to enable efficient results streaming would be much more efficient. This seems like the right approach to me; I am only uncertain about how we would retrofit determinism for those who need it. We could create a post-processing sorting API. We could design a new binary format for serialized records, optimized for reconstructing into sorted SARIF. Looking forward to the conversation.

As it turns out, our runaway memory usage was do to a simple bug, the context creating channel wasn't throttled on write. Oops.

Tactical changes include:

  • Throttle context creation to, at most, 25k objects. I will make this configurable in the short-term.
  • Conversion of failure level and result kind to immutable sets, rather than list instances. We now prefer constant values for the default. These allocations were unnecessary and cluttered profiling results, so good to clear.
  • Added a simple measure of working set consumed at analysis time.
  • Eliminated HashUtilities.FileSystem, an unnecessary duplication of the FileSystem.Instance test knob. Taking this change required adding an IFileSystem instance to relevant API (a non-breaking change).
  • Lowered individual 'file skipped due to size' message to note. Generally, we have a couple of issues with diagnostics reporting: these values can't be individually controlled via the failure levels. They likely should be separated from them entirely, as trace values. The trace messages arguably should always be emitted, when configured. Would love to discuss this further with the team.
  • Updated 'one or more files skipped' warning to include a count of files skipped. Also moved this warning to the end of analysis (rather than embedding it in the middle of analysis results, when file enumeration is complete).
  • This change aggressively flushes SARIF json to disk, so that a concentration of results doesn't increase memory consumption. This writing occurs on a single thread and the incremental loss of IO efficiency is unlikely to be observeable as it's amortized alongside the multithreaded scanning.
  • We also now no longer take an exclusive lock on the SARIF log. This allows users to open an in-progress log or us a tool like tail.exe to watch output as it's generated.
  • I have disabled the caching/replay of file results by hash. This functionality isn't of general value and it definitely shouldn't be tied to the configuration to emit file hashes for files that are referenced in SARIF. This capability will come back, therefore, as a new config knob. In the meantime, we've dropped the hashing channel, which is helpful for lowering memory usage.

// reference to each scan target is put into a channel for hashing,
// if hashing is enabled.
Task<bool> hashFilesAndPutInAnalysisQueue = Task.Run(() => HashFilesAndPutInAnalysisQueueAsnc());
// 3: A dedicated set of threads pull scan targets and analyze them.
Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped number 2 #Resolved

Comment on lines 248 to 255
Task.WhenAll(hashWorkers)
.ContinueWith(_ => {
_loggerCache?.Clear();
Console.WriteLine("Done with hashing.");
readyToScanChannel.Writer.Complete();
})
.Wait();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some of the tasks in hashWorkers fail, does this ignore the error?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, if you're going to wait anyway, then I feel it would be clearer to use Task.WaitAll and then do _loggerCache?.Clear() etc. afterwards. The error handling would then be different, though.

@suvamM
Copy link
Collaborator

suvamM commented Feb 16, 2023

@michaelcfanning A PR description would be super helpful.

@@ -10,7 +10,7 @@ public enum DefaultTraces
{
None,
ScanTime = 0x01,
ScanExecution = 0x2,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScanExecution

'ScanExecution' previously unused.

@@ -30,7 +30,6 @@ public abstract class AnalyzeCommandBase<TContext, TOptions> : PluginDriverComma
private Run _run;
private Tool _tool;
internal TContext _rootContext;
private CacheByFileHashLogger _cacheByFileHashLogger;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_cacheByFileHashLogger

I've clobbered caching by file hash in general with this change. It will not come back for the single-threaded analysis, as this engine is getting dropped entirely.


return true;
}
private readonly ConcurrentDictionary<string, CachingLogger> _loggerCache = new ConcurrentDictionary<string, CachingLogger>();

Check failure

Code scanning / CodeQL

Container contents are never accessed

The contents of this container are never accessed.
Comment on lines +27 to +28
baseLoggerTestConcrete = new BaseLoggerTestConcrete(new[] { FailureLevel.Error }.ToImmutableHashSet(),
new List<ResultKind> { ResultKind.Informational, ResultKind.Fail }.ToImmutableHashSet());

Check warning

Code scanning / CodeQL

Useless assignment to local variable

This assignment to [baseLoggerTestConcrete](1) is useless, since its value is never read.
Comment on lines +30 to +31
baseLoggerTestConcrete = new BaseLoggerTestConcrete(new[] { FailureLevel.Note }.ToImmutableHashSet(),
BaseLogger.Fail);

Check warning

Code scanning / CodeQL

Useless assignment to local variable

This assignment to [baseLoggerTestConcrete](1) is useless, since its value is never read.
Comment on lines +33 to +34
baseLoggerTestConcrete = new BaseLoggerTestConcrete(new[] { FailureLevel.None }.ToImmutableHashSet(),
new List<ResultKind> { ResultKind.Informational }.ToImmutableHashSet());

Check warning

Code scanning / CodeQL

Useless assignment to local variable

This assignment to [baseLoggerTestConcrete](1) is useless, since its value is never read.
Traces = new string[] { };
Kind = new List<ResultKind> { ResultKind.Fail };
Level = new List<FailureLevel> { FailureLevel.Warning, FailureLevel.Error };
Trace = new string[] { };

Check warning

Code scanning / CodeQL

Virtual call in constructor or destructor

Avoid virtual calls in a constructor or destructor.
public virtual IEnumerable<string> Trace { get; set; }

private DefaultTraces? defaultTraces;
public DefaultTraces Traces
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traces

This pattern may be worth pursuing aggressively, i.e., some class surface area for interacting directly with the commandline library vs. reconstructed data more suitable for direct consumption by the engine.

I keep wondering whether options could be merged entirely with the context object...

readyToHashChannel = Channel.CreateBounded<int>(channelOptions);

channelOptions = new BoundedChannelOptions(2000)
var channelOptions = new BoundedChannelOptions(25000)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25000

We allow the scan phase to build up 25k targets, as candidates, or completed files that haven't triggered the logging thread.

This all designed to maximize CPU utilization for the scan phase.

Comment on lines +306 to +317
catch (Exception e)
{
if (context != null)
{
if (context != null)
{
RuntimeErrors |= context.RuntimeErrors;
context?.Dispose();
}
context = default;
Errors.LogUnhandledEngineException(rootContext, e);
RuntimeErrors |= rootContext.RuntimeErrors;
ThrowExitApplicationException(context, ExitReason.ExceptionWritingToLogFile, e);
RuntimeErrors |= context.RuntimeErrors;
context?.Dispose();
}
context = default;
Errors.LogUnhandledEngineException(rootContext, e);
RuntimeErrors |= rootContext.RuntimeErrors;
ThrowExitApplicationException(context, ExitReason.ExceptionWritingToLogFile, e);
}

Check notice

Code scanning / CodeQL

Generic catch clause

Generic catch clause.
@@ -276,62 +278,68 @@
{
int currentIndex = 0;

ChannelReader<int> reader = _resultsWritingChannel.Reader;
while (_fileContexts?.Count > 0 == false)

Check notice

Code scanning / CodeQL

Unnecessarily complex Boolean expression

The expression 'A == false' can be simplified to '!A'.
@michaelcfanning michaelcfanning mentioned this pull request Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants