Support file-like object in info #1108

mthrok · 2020-12-19T03:33:18Z

This PR adds support for file-like object to info function of sox_io backend and soundfile backends.

See also #1115

cpuhrsch · 2020-12-21T19:49:12Z

In case you don't like file-like object, file-object is actually defined to be a synonym in the python glosassry. Same with stream. https://docs.python.org/3/glossary.html#term-file-object

cpuhrsch · 2020-12-21T19:51:59Z

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

vincentqb · 2020-12-21T19:52:27Z

I'm assuming pybind is used since torchbind does not support bytes object, is that correct?
Are you planning to expose this to torchaudio.load? There is no test there.
Is this ready for review? It is currently marked as draft.

vincentqb · 2020-12-21T19:55:42Z

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

Ah, so pybind is picked for performance reason then? I know we observe performance difference in some context, but did we measure a significant difference here? There's a note in the description about being ok with low performance in the description for part of the PR.

torchaudio/csrc/sox_utils.cpp

cpuhrsch · 2020-12-21T20:03:12Z

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

Ah, so pybind is picked for performance reason then? I know we observe performance difference in some context, but did we measure a significant difference here? There's a note in the description about being ok with low performance in the description for part of the PR.

It's been picked for performance reasons in the context of torchtext where the dual pybind11/torchbind approach originated.

Using it for the purposes of getting eagermode-only features is new.

mthrok · 2020-12-21T20:05:09Z

@cpuhrsch

In case you don't like file-like object, file-object is actually defined to be a synonym in the python glosassry. Same with stream. https://docs.python.org/3/glossary.html#term-file-object

well, actually, I prefer the notion of file-"like". Because in my personal experience, more often it's not a file but a something else like data streamed over the network. (still, I guess one can claim it's a file object on raw level Linux point of view)

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

So the behavior is,

if the function is being scripted, then it directly tries to call function bound via TorchScript. In this case, passing anything other than string object is a failure, because of TS limitation.
if the function is not being scripted, it inspects the input argument, then based on the type, it uses either TS binding (for str) or PyBind11 version (bytes). And in this case, if the input type is file-like object, then perform read() to fetch the data as bytes then pass it to PyBind11 version.

mthrok · 2020-12-21T20:07:26Z

Using it for the purposes of getting eagermode-only features is new.

And it's a long-waited feature

#800 #754

mthrok · 2020-12-21T20:10:37Z

@vincentqb

I'm assuming pybind is used since torchbind does not support bytes object, is that correct?

yes

Are you planning to expose this to torchaudio.load? There is no test there.

It is tested. Checkout the tests added.

vincentqb · 2020-12-21T20:47:48Z

Are you planning to expose this to torchaudio.load? There is no test there.

It is tested. Checkout the tests added.

I meant: there are tests for the load functions within both sox_io and soundfile backends, but not one calling directly the torchaudio.load function. Did I miss it? If we're confident this covers what we need and will cover what is needed of a new backend in the future, then what is there already in the pull request would be enough.

mthrok · 2020-12-21T21:14:59Z

Are you planning to expose this to torchaudio.load? There is no test there.

It is tested. Checkout the tests added.

I meant: there are tests for the load functions within both sox_io and soundfile backends, but not one calling directly the torchaudio.load function. Did I miss it? If we're confident this covers what we need and will cover what is needed of a new backend in the future, then what is there already in the pull request would be enough.

Check out how the backend tests are written. torchaudio.load is not the right way to test.

cpuhrsch · 2020-12-22T00:18:44Z

So the behavior is,

if the function is being scripted, then it directly tries to call function bound via TorchScript. In this case, passing anything other than string object is a failure, because of TS limitation.

if the function is not being scripted, it inspects the input argument, then based on the type, it uses either TS binding (for str) or PyBind11 version (bytes). And in this case, if the input type is file-like object, then perform read() to fetch the data as bytes then pass it to PyBind11 version.

Makes sense. We could either as a follow-up or using this as a vehicle look into how we can improve the error message, if necessary, when someone tries to convert a model into TorchScript that calls these functions using bytes.

mthrok · 2020-12-22T20:14:01Z

So the behavior is,

if the function is being scripted, then it directly tries to call function bound via TorchScript. In this case, passing anything other than string object is a failure, because of TS limitation.

if the function is not being scripted, it inspects the input argument, then based on the type, it uses either TS binding (for str) or PyBind11 version (bytes). And in this case, if the input type is file-like object, then perform read() to fetch the data as bytes then pass it to PyBind11 version.

Makes sense. We could either as a follow-up or using this as a vehicle look into how we can improve the error message, if necessary, when someone tries to convert a model into TorchScript that calls these functions using bytes.

Well, once the function is scripted, we cannot do much about it, because TorchScript runtime won't run the function if the input type is invalid. However it seems that the error message is clear if the reason for the failure is wrong type.

foo.py

import torch
import torchaudio

path = 'test/torchaudio_unittest/assets/sinewave.wav'


# eager execution - works
with open(path, 'rb') as file_:
    torchaudio.load(file_)


load = torch.jit.script(torchaudio.load)

# jit execution with str - works
load(path)

# jit execution with file-like obj - does not work.
with open(path, 'rb') as file_:
    load(file_)

Traceback (most recent call last):
  File "foo.py", line 10, in <module>
    load(file_)
RuntimeError: load() Expected a value of type 'str' for argument 'filepath' but instead found type 'BufferedReader'.
Position: 0
Value: <_io.BufferedReader name='test/torchaudio_unittest/assets/sinewave.wav'>
Declaration: load(str filepath, int frame_offset=0, int num_frames=-1, bool normalize=True, bool channels_first=True, str? format=None) -> ((Tensor, int))
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

mthrok · 2020-12-22T20:39:07Z

@cpuhrsch Update the docstring to mention the difference between scripted/eager mode.

igorgad · 2021-01-04T02:47:27Z

Hi @mthrok, I wonder if the parameters num_frames and offset could be used with network streams and if it would download only the requested frames?
Thanks

mthrok · 2021-01-04T19:08:41Z

Hi @mthrok, I wonder if the parameters num_frames and offset could be used with network streams and if it would download only the requested frames?
Thanks

Hi @igorgad

Thanks for the question, the answer is (unfortunately) no. The goal of this PR is to extend the support to Python's file object, which is the protocol requests happen to implement, thus it works for data transferred over the network. There is no specific optimization or operation carried out for online data transfer.

Then the question is, can we (relatively easily) extend the support for such efficient network data transfer? I think the answer is no, because of the limitation libsox poses. In my understanding, libsox works on FILE object pointer in C level. (which is abstracted away from user API of libsox) You can use some URL with sox command but that's just calling wget in subprocess and piping the data to stdin of the process sox command is running. There is no network utility libsox utilizes, and plugging such library into libsox is not trivial.

Maybe ffmpeg has that kind of capability, but currently torchaudio does not bind ffmpeg, though it's in my wish list. (but we do not have an action plan for it)

If you want that feature, maybe you can file an issue with feature request and provide your workflow? (total length of the audio file and the protocol you are using to fetch the data) Though I cannot guarantee that we can full fill your request, but it will certainly help us understand your demand.

igorgad · 2021-01-04T19:52:10Z

Yes. Indeed I think that plugging a network utility into libsox isn't the best option. I investigated the use of FFmpeg to load files from GCS buckets, but it doesn't work unless you set a public URL.

I'm currently investigating options to scale the training of our models on a managed cloud infrastructure, and one of the options is the Ai-Platform running custom containers. The problem is that we cannot mount NFS shares with our data on the containers due to the lack of the privileged run mode. The dataset is also big to copy from a remote path at the beginning of each experiment.

This way, and inspired by this implementation, would you consider a cloud_io backend (probably with fsspec) using miniaudio a good solution to access files directly and efficiently from buckets? Of course, it requires some tinkering to translate audio_frames into file offsets but I think that's relatively easy for wav files.

Thanks

mthrok · 2021-01-04T23:44:45Z

Yes. Indeed I think that plugging a network utility into libsox isn't the best option. I investigated the use of FFmpeg to load files from GCS buckets, but it doesn't work unless you set a public URL.

I'm currently investigating options to scale the training of our models on a managed cloud infrastructure, and one of the options is the Ai-Platform running custom containers. The problem is that we cannot mount NFS shares with our data on the containers due to the lack of the privileged run mode. The dataset is also big to copy from a remote path at the beginning of each experiment.

This way, and inspired by this implementation, would you consider a cloud_io backend (probably with fsspec) using miniaudio a good solution to access files directly and efficiently from buckets? Of course, it requires some tinkering to translate audio_frames into file offsets but I think that's relatively easy for wav files.

Thanks

@igorgad

Your question made me realized that I did not test on those frame slicing. While I was adding test, I looked into the read behavior, and turned out that libsox is smart enough to stop reading once it read enough samples for num_frames and frame_offset options. I think it still needs to read until the beginning of the frames.

Regarding cloud_io backend, I once had a conversation with @cpuhrsch about adding specific I/O functions for different data source like database, S3 but we do not have a concrete plan for adding it yet. We are considering getting rid of the notion of backend and moving to format-based automatic backend selection. From there we can add I/O functions that are specific to format or data source.

Giving some thoughts on this, I think the process can be broken down into the following steps;

Query a header and detect the format
Compute the byte position using the format information from 1.
Fetch the corresponding part of the data
Perform decoding

Then here are the questions that came up to me;

How can we perform efficient data transfer?
Looking at AWS S3 documentation, they accept range header so it is possible to download only the related part, but I guess it has to initiate a new connection. I do not have experience in other cloud storage but if the cloud vendor accepts range header or some equivalent, this should be possible.
How can we compute bytes range?
I think there are two potential questions to be answer
1. Can you always get a header?
  I am not an expert in audio formats, but I think there are formats that do not have a global header, in this case, it's hard to get the understanding of the whole file from the beginning part of the file.
2. Can we always get the precise byte range?
  I think there are formats that support variable bit rate. In this case, it will be difficult to know the byte range from header.

As you mention, if one knows that the format is WAV, then solving 2 will be easy, but solving it for general case will be difficult. Also with cloud provider, one has to think of a way to pass access tokens to the object that is making the network access. So I think, for now, to solve your problem, having your own custom implementation is the easiest solution.

datumbox

LGTM! I left only a non-blocking comment.

datumbox · 2021-01-27T16:30:40Z

torchaudio/csrc/sox/io.cpp

+  // See:
+  // https://xiph.org/vorbis/doc/Vorbis_I_spec.html
+  auto capacity = 4096;
+  std::string buffer(capacity, '\0');


Same as other PR. Perhaps there is a better type buffer than std:string here?

#1181 (comment)

facebook-github-bot added the CLA Signed label Dec 19, 2020

mthrok changed the title ~~Add bytes and file-like object support to sox_io.load~~ Support bytes and file-like object in sox_io.load Dec 19, 2020

mthrok force-pushed the file-like-obj branch 3 times, most recently from 6937235 to 37a2654 Compare December 20, 2020 21:15

mthrok changed the title ~~Support bytes and file-like object in sox_io.load~~ Support bytes and file-like object in load Dec 20, 2020

mthrok changed the title ~~Support bytes and file-like object in load~~ [BC-Breaking] Support bytes and file-like object in load Dec 20, 2020

mthrok mentioned this pull request Dec 21, 2020

Support in-memory encoding/decoding #1115

Closed

6 tasks

mthrok force-pushed the file-like-obj branch from 5809424 to 22398a3 Compare December 21, 2020 15:00

mthrok changed the title ~~[BC-Breaking] Support bytes and file-like object in load~~ Support bytes and file-like object in load Dec 21, 2020

mthrok mentioned this pull request Dec 21, 2020

Support bytes and file-like object in info #1114

Closed

mthrok force-pushed the file-like-obj branch 2 times, most recently from d15a004 to 55ae6f4 Compare December 21, 2020 19:12

vincentqb reviewed Dec 21, 2020

View reviewed changes

torchaudio/csrc/sox_utils.cpp Outdated Show resolved Hide resolved

mthrok mentioned this pull request Dec 21, 2020

Add support for file-like object #754

Closed

mthrok force-pushed the file-like-obj branch 2 times, most recently from 6e369c3 to b2793c7 Compare December 22, 2020 20:01

mthrok force-pushed the file-like-obj branch 4 times, most recently from 28c98d9 to 22aadef Compare January 1, 2021 04:11

mthrok mentioned this pull request Jan 1, 2021

RFC: Applying codecs as data augmentation #1146

Closed

mthrok force-pushed the file-like-obj branch from 13023be to 7510c6e Compare January 4, 2021 23:00

mthrok mentioned this pull request Jan 5, 2021

Sharing My Projects 2021 H1 #1154

Closed

mthrok force-pushed the file-like-obj branch 2 times, most recently from c3e51cf to 19dd571 Compare January 6, 2021 17:43

mthrok mentioned this pull request Jan 6, 2021

Support file-like object in load function #1158

Merged

mthrok force-pushed the file-like-obj branch from 19dd571 to 427f891 Compare January 7, 2021 21:34

mthrok changed the title ~~Support file-like object in load/info/sox_effects~~ Support file-like object in info/sox_effects Jan 7, 2021

mthrok force-pushed the file-like-obj branch 2 times, most recently from 6367777 to 1abe038 Compare January 8, 2021 16:49

mthrok changed the title ~~Support file-like object in info/sox_effects~~ Support file-like object in info Jan 8, 2021

mthrok mentioned this pull request Jan 8, 2021

Add smoke test for sox_io fileobj #1165

Merged

mthrok force-pushed the file-like-obj branch 2 times, most recently from c40edf3 to 3e1b33c Compare January 15, 2021 23:34

mthrok added this to the v0.8 milestone Jan 25, 2021

mthrok force-pushed the file-like-obj branch 2 times, most recently from 956c84b to f57d1a3 Compare January 26, 2021 21:13

Support file-like object in info

f8ad641

mthrok force-pushed the file-like-obj branch from f57d1a3 to f8ad641 Compare January 26, 2021 21:37

datumbox approved these changes Jan 27, 2021

View reviewed changes

mthrok merged commit 41c76a1 into pytorch:master Jan 27, 2021

mthrok deleted the file-like-obj branch January 27, 2021 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support file-like object in info #1108

Support file-like object in info #1108

mthrok commented Dec 19, 2020 •

edited

Loading

cpuhrsch commented Dec 21, 2020

cpuhrsch commented Dec 21, 2020

vincentqb commented Dec 21, 2020

vincentqb commented Dec 21, 2020 •

edited

Loading

cpuhrsch commented Dec 21, 2020

mthrok commented Dec 21, 2020

mthrok commented Dec 21, 2020

mthrok commented Dec 21, 2020

vincentqb commented Dec 21, 2020 •

edited

Loading

mthrok commented Dec 21, 2020

cpuhrsch commented Dec 22, 2020

mthrok commented Dec 22, 2020

mthrok commented Dec 22, 2020

igorgad commented Jan 4, 2021

mthrok commented Jan 4, 2021

igorgad commented Jan 4, 2021

mthrok commented Jan 4, 2021

datumbox left a comment

datumbox Jan 27, 2021

mthrok Jan 27, 2021

Support file-like object in info #1108

Support file-like object in info #1108

Conversation

mthrok commented Dec 19, 2020 • edited Loading

cpuhrsch commented Dec 21, 2020

cpuhrsch commented Dec 21, 2020

vincentqb commented Dec 21, 2020

vincentqb commented Dec 21, 2020 • edited Loading

cpuhrsch commented Dec 21, 2020

mthrok commented Dec 21, 2020

mthrok commented Dec 21, 2020

mthrok commented Dec 21, 2020

vincentqb commented Dec 21, 2020 • edited Loading

mthrok commented Dec 21, 2020

cpuhrsch commented Dec 22, 2020

mthrok commented Dec 22, 2020

mthrok commented Dec 22, 2020

igorgad commented Jan 4, 2021

mthrok commented Jan 4, 2021

igorgad commented Jan 4, 2021

mthrok commented Jan 4, 2021

datumbox left a comment

Choose a reason for hiding this comment

datumbox Jan 27, 2021

Choose a reason for hiding this comment

mthrok Jan 27, 2021

Choose a reason for hiding this comment

mthrok commented Dec 19, 2020 •

edited

Loading

vincentqb commented Dec 21, 2020 •

edited

Loading

vincentqb commented Dec 21, 2020 •

edited

Loading