Skip to content

AudioCLIP Assets & Snapshots

Latest
Compare
Choose a tag to compare
@AndreyGuzhov AndreyGuzhov released this 29 Jun 11:58
· 6 commits to master since this release

Text embeddings' vocabulary and PyTorch' state_dicts containing weights of the AudioCLIP model trained on AudioSet:

  • bpe_simple_vocab_16e6.txt.gz – CLIP's vocabulary (origin)
  • CLIP.pt – vanilla CLIP (text Transformer & ResNet-50 image-head, origin)
  • ESRNXFBSP.pt – ESResNeXt trained on AudioSet (standalone)
  • AudioCLIP trained on AudioSet (+ video frames)
    • AudioCLIP-Full-Training.pt – training of all three heads (text, image and audio)
    • AudioCLIP-Partial-Training.pt – training of the audio-head only