Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make the search request only return videos with subtitle? #30794

Closed
3 tasks done
keshawnhsieh opened this issue Mar 30, 2022 · 8 comments
Closed
3 tasks done

How to make the search request only return videos with subtitle? #30794

keshawnhsieh opened this issue Mar 30, 2022 · 8 comments
Labels

Comments

@keshawnhsieh
Copy link

Checklist

  • I'm asking a question
  • I've looked through the README and FAQ for similar questions
  • I've searched the bugtracker for similar questions including closed ones

Question

My goal is to download videos with human uploaded subtitle. The pipeline which I am currently execute is search some query first then parse the info.json to filter out the videos with human made subtitle and download them.

But as we all know videos with human made subtitle is much less than the videos without human made subtitle and considering that each search request only return ~500 results. How can I grasp these precious resource to make the returned results more efficient, which means that I can make only the videos with subtile return rather than all videos with or without subtitle in advance.

Do youtube-dl have any parameter that I can set when do each search request to achieve this purpose?

@dirkf
Copy link
Contributor

dirkf commented Mar 30, 2022

A mechanism within yt-dl would have to filter based on the extracted metadata (like --match-filter ...), so it wouldn't help with your problem.

The v3 YT search API has this parameter:

videoCaption:string

The videoCaption parameter indicates whether the API should filter video search results based on whether they have captions. If you specify a value for this parameter, you must also set the type parameter's value to video.
Allowed values are:

any
closedCaption
none

The yt-dl extractor uses https://www.youtube.com/youtubei/v1, which doesn't have this option.

@pukkandan
Copy link
Contributor

With #27749, you can pass search URLs like https://www.youtube.com/results?search_query=python&sp=EgIoAQ%253D%253D. You can find the correct sp param by applying the filters you want on the website

@dirkf
Copy link
Contributor

dirkf commented Mar 31, 2022

That's nice. I must find that YouTube 101 course ...

Apparently just sp=EgQQASgB for videos with subtitles/CC. You can't use this with the ytsearch... pseudo-schemes, and you need the git master (or the PR).

Unfortunately, with https://www.youtube.com/results?q=ditempat&sp=EgQQASgB we still appear to run out of result pages (31, 538 results). I guessed that the subtitles/CC filter wasn't effective because videos get subtitled automatically, and those are the ones that OP wanted to exclude, but that isn't apparently the case.

This presentation (PDF) was useful for terminology.

@pukkandan
Copy link
Contributor

I'm pretty sure the "subtitles/CC" filter excludes automatic captions

@dirkf
Copy link
Contributor

dirkf commented Mar 31, 2022

Looks right. It's just that ditempat is a very productive query. With ditemaptx, the subtitle filter picks 7 from 227 unfiltered results.

@dirkf
Copy link
Contributor

dirkf commented Mar 31, 2022

This is not robust.

So far the YT extractor uses just a few canned search criteria like this. That's robust as long as people don't do a JavaScript on the ProtoBuf encoding spec. When someone identifies a need to create ProtoBuf criteria dynamically, it might be necessary to write or borrow a marshaller.

The yt-dl extractor uses https://www.youtube.com/youtubei/v1, which doesn't have this option.

Thats not true:

So read that

... doesn't have this option according to the misleading documentation that I found.

Thanks.

@dirkf
Copy link
Contributor

dirkf commented Mar 31, 2022

Randomly changing the specification in incompatible ways, I mean.

@dirkf
Copy link
Contributor

dirkf commented Apr 11, 2022

#30794 (comment) seems to be the answer OP wanted. PR branch or git master needed, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants