Skip to content

Latest commit

 

History

History
35 lines (21 loc) · 2.42 KB

description.md

File metadata and controls

35 lines (21 loc) · 2.42 KB

STL2

Daniel Stoller¹, Sebastian Ewert², Simon Dixon¹

¹Queen Mary University London

²Spotify London

Contact: d.stoller (AT) qmul.ac.uk

Additional Info

  • is_blind: no
  • additional_training_data: no

Supplemental Material

Method

Task: Multi-instrument separation. For the same model applied to singing voice separation, see STL1 submission.

We use the Wave-U-Net, an adaptation of the U-Net architecture to the one-dimensional time domain to perform end-to-end audio source separation. Through a series of downsampling and upsampling blocks, which involve convolutions combined with a down-/upsampling process, features are computed on multiple scales/levels of abstraction and time resolution, and combined to make a prediction. Training is done on 75 MUSDB training set songs, validation with early stopping on the remaining 25 MUSDB songs, both from the training set. Training loss is the MSE on the raw audio source outputs.

A paper with more details, experiments and analysis is currently under review elsewhere.

References