Add more default internal audio format. #3008
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale
Issue #2998 pointed out large memory usage when buffering data.
After doing some experiments, it appears that the most important factor for memory usage with audio content is our internal data representation, namely 64 bit float. See below for a discussion on memory usage.
While 64 bit floats are quite useful for internal computations, they take up 4 times more space than 16 bit integers, which are considered acceptable for CD quality.
While this is true for large buffers, this is also true for all smaller intermediary buffer we have through our processing, i.e. harbor input buffer, file decoder buffer etc.
Changes
This PR introduces ways to mitigate the issue by adding new internal formats
pcm_s16
andpcm_f32
. These two formats should be usable the same way as the nativepcm
format.At this moment, only the FFmpeg decoder and encoder support them. Sources can also be converted back and forth (see below).
These formats can be selected using the ffmpeg encoder:
Type annotations also do work:
Of course, operators working with audio data still need native float. To that end, conversion operators are introduced at track level (
track.{encode,decode}.audio.{pcm_s16, pcm_f32}
) and source level (audio.{encoder, decode}.{pcm_s16, pcm_s32}
.Conventions and implementation specifics
All APIs to specify audio pcm types now have an explicit
pcm_kind
argument and encoders/decoders must explicitly specify which pcm implementation they are working with.In the typing side of things, the convention is now to have
(kind, format)
with format beingContent.Audio.format
for all pcm implementation. This makes it possible to share and unify audio params accros pcm types. Typically:Discussion on memory usage
The following script was used as the basic script to test memory usage under different conditions:
Here's the default:
That's about ~150Mo of buffered data, about twice as much as the expected raw size of ~70 Mo. It's not clear yet what's causing it, most likely some added overhead from the OCaml boxing.
Next, I tested with the new
pcm_s16
format:Here, we see an initial drop from a garbage collection cycle. Overall this seems below ~20Mo, which is close to the expected raw data size of ~17Mo.
The underlying implementation for this data type is a bigarray with a pointer outside of the OCaml memory to store data. Most likely, this is adding much less overhead!
It's worth noting, though, that the garbage collector can be pretty lazy collecting big array. This setting can help but will increase CPU usage as a trade-off: