Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support file-like object in info #1108

Merged
merged 1 commit into from
Jan 27, 2021
Merged

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Dec 19, 2020

This PR adds support for file-like object to info function of sox_io backend and soundfile backends.

See also #1115

@mthrok mthrok changed the title Add bytes and file-like object support to sox_io.load Support bytes and file-like object in sox_io.load Dec 19, 2020
@mthrok mthrok force-pushed the file-like-obj branch 3 times, most recently from 6937235 to 37a2654 Compare December 20, 2020 21:15
@mthrok mthrok changed the title Support bytes and file-like object in sox_io.load Support bytes and file-like object in load Dec 20, 2020
@mthrok mthrok changed the title Support bytes and file-like object in load [BC-Breaking] Support bytes and file-like object in load Dec 20, 2020
@mthrok mthrok mentioned this pull request Dec 21, 2020
6 tasks
@mthrok mthrok changed the title [BC-Breaking] Support bytes and file-like object in load Support bytes and file-like object in load Dec 21, 2020
@mthrok mthrok force-pushed the file-like-obj branch 2 times, most recently from d15a004 to 55ae6f4 Compare December 21, 2020 19:12
@cpuhrsch
Copy link
Contributor

In case you don't like file-like object, file-object is actually defined to be a synonym in the python glosassry. Same with stream. https://docs.python.org/3/glossary.html#term-file-object

@cpuhrsch
Copy link
Contributor

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

@vincentqb
Copy link
Contributor

  • I'm assuming pybind is used since torchbind does not support bytes object, is that correct?
  • Are you planning to expose this to torchaudio.load? There is no test there.
  • Is this ready for review? It is currently marked as draft.

@vincentqb
Copy link
Contributor

vincentqb commented Dec 21, 2020

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

Ah, so pybind is picked for performance reason then? I know we observe performance difference in some context, but did we measure a significant difference here? There's a note in the description about being ok with low performance in the description for part of the PR.

@cpuhrsch
Copy link
Contributor

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

Ah, so pybind is picked for performance reason then? I know we observe performance difference in some context, but did we measure a significant difference here? There's a note in the description about being ok with low performance in the description for part of the PR.

It's been picked for performance reasons in the context of torchtext where the dual pybind11/torchbind approach originated.

Using it for the purposes of getting eagermode-only features is new.

@mthrok
Copy link
Collaborator Author

mthrok commented Dec 21, 2020

@cpuhrsch

In case you don't like file-like object, file-object is actually defined to be a synonym in the python glosassry. Same with stream. https://docs.python.org/3/glossary.html#term-file-object

well, actually, I prefer the notion of file-"like". Because in my personal experience, more often it's not a file but a something else like data streamed over the network. (still, I guess one can claim it's a file object on raw level Linux point of view)

I'm a fan. Using pybind11 in eager mode and torchbind for deployment only really is opening a lot of doors. What's the error behavior when someone is using a pybind11 feature that torchbind does not support? For now we've been using this dual registration approach for performance reasons since pybind11 has lower eagermode overhead.

So the behavior is,

  • if the function is being scripted, then it directly tries to call function bound via TorchScript. In this case, passing anything other than string object is a failure, because of TS limitation.
  • if the function is not being scripted, it inspects the input argument, then based on the type, it uses either TS binding (for str) or PyBind11 version (bytes). And in this case, if the input type is file-like object, then perform read() to fetch the data as bytes then pass it to PyBind11 version.

@mthrok
Copy link
Collaborator Author

mthrok commented Dec 21, 2020

Using it for the purposes of getting eagermode-only features is new.

And it's a long-waited feature

#800 #754

@mthrok
Copy link
Collaborator Author

mthrok commented Dec 21, 2020

@vincentqb

  • I'm assuming pybind is used since torchbind does not support bytes object, is that correct?

yes

  • Are you planning to expose this to torchaudio.load? There is no test there.

It is tested. Checkout the tests added.

@vincentqb
Copy link
Contributor

vincentqb commented Dec 21, 2020

  • Are you planning to expose this to torchaudio.load? There is no test there.

It is tested. Checkout the tests added.

I meant: there are tests for the load functions within both sox_io and soundfile backends, but not one calling directly the torchaudio.load function. Did I miss it? If we're confident this covers what we need and will cover what is needed of a new backend in the future, then what is there already in the pull request would be enough.

@mthrok
Copy link
Collaborator Author

mthrok commented Dec 21, 2020

  • Are you planning to expose this to torchaudio.load? There is no test there.

It is tested. Checkout the tests added.

I meant: there are tests for the load functions within both sox_io and soundfile backends, but not one calling directly the torchaudio.load function. Did I miss it? If we're confident this covers what we need and will cover what is needed of a new backend in the future, then what is there already in the pull request would be enough.

Check out how the backend tests are written. torchaudio.load is not the right way to test.

@cpuhrsch
Copy link
Contributor

So the behavior is,

  • if the function is being scripted, then it directly tries to call function bound via TorchScript. In this case, passing anything other than string object is a failure, because of TS limitation.
  • if the function is not being scripted, it inspects the input argument, then based on the type, it uses either TS binding (for str) or PyBind11 version (bytes). And in this case, if the input type is file-like object, then perform read() to fetch the data as bytes then pass it to PyBind11 version.

Makes sense. We could either as a follow-up or using this as a vehicle look into how we can improve the error message, if necessary, when someone tries to convert a model into TorchScript that calls these functions using bytes.

@mthrok mthrok force-pushed the file-like-obj branch 2 times, most recently from 6e369c3 to b2793c7 Compare December 22, 2020 20:01
@mthrok
Copy link
Collaborator Author

mthrok commented Dec 22, 2020

So the behavior is,

  • if the function is being scripted, then it directly tries to call function bound via TorchScript. In this case, passing anything other than string object is a failure, because of TS limitation.
  • if the function is not being scripted, it inspects the input argument, then based on the type, it uses either TS binding (for str) or PyBind11 version (bytes). And in this case, if the input type is file-like object, then perform read() to fetch the data as bytes then pass it to PyBind11 version.

Makes sense. We could either as a follow-up or using this as a vehicle look into how we can improve the error message, if necessary, when someone tries to convert a model into TorchScript that calls these functions using bytes.

Well, once the function is scripted, we cannot do much about it, because TorchScript runtime won't run the function if the input type is invalid. However it seems that the error message is clear if the reason for the failure is wrong type.

foo.py
import torch
import torchaudio

path = 'test/torchaudio_unittest/assets/sinewave.wav'


# eager execution - works
with open(path, 'rb') as file_:
    torchaudio.load(file_)


load = torch.jit.script(torchaudio.load)

# jit execution with str - works
load(path)

# jit execution with file-like obj - does not work.
with open(path, 'rb') as file_:
    load(file_)
Traceback (most recent call last):
  File "foo.py", line 10, in <module>
    load(file_)
RuntimeError: load() Expected a value of type 'str' for argument 'filepath' but instead found type 'BufferedReader'.
Position: 0
Value: <_io.BufferedReader name='test/torchaudio_unittest/assets/sinewave.wav'>
Declaration: load(str filepath, int frame_offset=0, int num_frames=-1, bool normalize=True, bool channels_first=True, str? format=None) -> ((Tensor, int))
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

@mthrok
Copy link
Collaborator Author

mthrok commented Dec 22, 2020

@cpuhrsch Update the docstring to mention the difference between scripted/eager mode.

@igorgad
Copy link

igorgad commented Jan 4, 2021

Hi @mthrok, I wonder if the parameters num_frames and offset could be used with network streams and if it would download only the requested frames?
Thanks

@mthrok
Copy link
Collaborator Author

mthrok commented Jan 4, 2021

Hi @mthrok, I wonder if the parameters num_frames and offset could be used with network streams and if it would download only the requested frames?
Thanks

Hi @igorgad

Thanks for the question, the answer is (unfortunately) no. The goal of this PR is to extend the support to Python's file object, which is the protocol requests happen to implement, thus it works for data transferred over the network. There is no specific optimization or operation carried out for online data transfer.

Then the question is, can we (relatively easily) extend the support for such efficient network data transfer? I think the answer is no, because of the limitation libsox poses. In my understanding, libsox works on FILE object pointer in C level. (which is abstracted away from user API of libsox) You can use some URL with sox command but that's just calling wget in subprocess and piping the data to stdin of the process sox command is running. There is no network utility libsox utilizes, and plugging such library into libsox is not trivial.

Maybe ffmpeg has that kind of capability, but currently torchaudio does not bind ffmpeg, though it's in my wish list. (but we do not have an action plan for it)

If you want that feature, maybe you can file an issue with feature request and provide your workflow? (total length of the audio file and the protocol you are using to fetch the data) Though I cannot guarantee that we can full fill your request, but it will certainly help us understand your demand.

@igorgad
Copy link

igorgad commented Jan 4, 2021

Yes. Indeed I think that plugging a network utility into libsox isn't the best option. I investigated the use of FFmpeg to load files from GCS buckets, but it doesn't work unless you set a public URL.

I'm currently investigating options to scale the training of our models on a managed cloud infrastructure, and one of the options is the Ai-Platform running custom containers. The problem is that we cannot mount NFS shares with our data on the containers due to the lack of the privileged run mode. The dataset is also big to copy from a remote path at the beginning of each experiment.

This way, and inspired by this implementation, would you consider a cloud_io backend (probably with fsspec) using miniaudio a good solution to access files directly and efficiently from buckets? Of course, it requires some tinkering to translate audio_frames into file offsets but I think that's relatively easy for wav files.

Thanks

@mthrok
Copy link
Collaborator Author

mthrok commented Jan 4, 2021

Yes. Indeed I think that plugging a network utility into libsox isn't the best option. I investigated the use of FFmpeg to load files from GCS buckets, but it doesn't work unless you set a public URL.

I'm currently investigating options to scale the training of our models on a managed cloud infrastructure, and one of the options is the Ai-Platform running custom containers. The problem is that we cannot mount NFS shares with our data on the containers due to the lack of the privileged run mode. The dataset is also big to copy from a remote path at the beginning of each experiment.

This way, and inspired by this implementation, would you consider a cloud_io backend (probably with fsspec) using miniaudio a good solution to access files directly and efficiently from buckets? Of course, it requires some tinkering to translate audio_frames into file offsets but I think that's relatively easy for wav files.

Thanks

@igorgad

Your question made me realized that I did not test on those frame slicing. While I was adding test, I looked into the read behavior, and turned out that libsox is smart enough to stop reading once it read enough samples for num_frames and frame_offset options. I think it still needs to read until the beginning of the frames.

Regarding cloud_io backend, I once had a conversation with @cpuhrsch about adding specific I/O functions for different data source like database, S3 but we do not have a concrete plan for adding it yet. We are considering getting rid of the notion of backend and moving to format-based automatic backend selection. From there we can add I/O functions that are specific to format or data source.

Giving some thoughts on this, I think the process can be broken down into the following steps;

  1. Query a header and detect the format
  2. Compute the byte position using the format information from 1.
  3. Fetch the corresponding part of the data
  4. Perform decoding

Then here are the questions that came up to me;

  1. How can we perform efficient data transfer?
    Looking at AWS S3 documentation, they accept range header so it is possible to download only the related part, but I guess it has to initiate a new connection. I do not have experience in other cloud storage but if the cloud vendor accepts range header or some equivalent, this should be possible.
  2. How can we compute bytes range?
    I think there are two potential questions to be answer
    1. Can you always get a header?
      I am not an expert in audio formats, but I think there are formats that do not have a global header, in this case, it's hard to get the understanding of the whole file from the beginning part of the file.
    2. Can we always get the precise byte range?
      I think there are formats that support variable bit rate. In this case, it will be difficult to know the byte range from header.

As you mention, if one knows that the format is WAV, then solving 2 will be easy, but solving it for general case will be difficult. Also with cloud provider, one has to think of a way to pass access tokens to the object that is making the network access. So I think, for now, to solve your problem, having your own custom implementation is the easiest solution.

@mthrok mthrok force-pushed the file-like-obj branch 2 times, most recently from c3e51cf to 19dd571 Compare January 6, 2021 17:43
@mthrok mthrok changed the title Support file-like object in load/info/sox_effects Support file-like object in info/sox_effects Jan 7, 2021
@mthrok mthrok force-pushed the file-like-obj branch 2 times, most recently from 6367777 to 1abe038 Compare January 8, 2021 16:49
@mthrok mthrok changed the title Support file-like object in info/sox_effects Support file-like object in info Jan 8, 2021
@mthrok mthrok force-pushed the file-like-obj branch 2 times, most recently from c40edf3 to 3e1b33c Compare January 15, 2021 23:34
@mthrok mthrok added this to the v0.8 milestone Jan 25, 2021
@mthrok mthrok force-pushed the file-like-obj branch 2 times, most recently from 956c84b to f57d1a3 Compare January 26, 2021 21:13
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left only a non-blocking comment.

// See:
// https://xiph.org/vorbis/doc/Vorbis_I_spec.html
auto capacity = 4096;
std::string buffer(capacity, '\0');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as other PR. Perhaps there is a better type buffer than std:string here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok mthrok merged commit 41c76a1 into pytorch:master Jan 27, 2021
@mthrok mthrok deleted the file-like-obj branch January 27, 2021 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants