Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroducing Hardsubs #1

Open
familyfriendlymikey opened this issue May 21, 2022 · 11 comments
Open

Reintroducing Hardsubs #1

familyfriendlymikey opened this issue May 21, 2022 · 11 comments

Comments

@familyfriendlymikey
Copy link
Owner

In response to a github user who messaged me elsewhere about reintroducing the hardsubs feature:

I think I may have a great idea regarding this, sort of similar to what you mentioned. I'm assuming you're familiar with the fact that the script generates a "cut list" text file, which you can feed to the included Python script make_cuts to make the cuts. It was already somewhat of a pain to jump in a terminal, navigate to the correct directory, and run make_cuts. But what if we were to forego the cut list and make_cuts files entirely, and instead have mpv-cut generate a Python file, such as LIST_CHANNEL_NAME_Source_Video_File_Name.mkv.py, which would act as both the cut list for backup purposes and as the make_cuts script:

filename = "LIST_CHANNEL_NAME_Source_Video_File_Name.mkv"
timestamps = [
	(105.67, 108),
	(156, 187),
	(256, 270.12432)
]

hardsubs = True

import ffmpeg
...

To be clear, this would mean that the "hardsubs" option wouldn't actually generate a video file with hardsubs. Instead, by default mpv-cut would generate a cut list which would now be a self-contained python file that gets placed in the same directory as the source video, which you would simply double click to execute once you're done watching your video, and enabling the "hardsubs" option would generate a script where hardsubs = True.

This has several pros:

  • We're no longer limited to mpv's API, so we can use some Python ffmpeg wrapper to deal with all the ffmpeg/ffprobe crap.
  • The cut list backup files are now self-contained. As long as the user has the source video file in the same directory, all they have to do is double click, no more fiddling with a make_cuts executable.
  • If the ffmpeg command fails or produces an undesirable output for whatever reason (which will undoubtedly happen, no matter how much we tune the ffmpeg command), it'll be much easier to modify the resulting script for a niche use case to fix it, a fix which will then persist with the cut list itself!
  • If there's an uncaught exception with the mpv script, IIRC the script stops running. Even if we catch all exceptions, displaying errors with the OSD is inelegant, whereas with a python script the exception is displayed very clearly, and does not interrupt further use of mpv.
  • If the user wants to do something programmatic with the cut list, doing so would now be really easy because it's already in Python as opposed to an arbitrary text file format, and there would already be an underlying foundation of ffmpeg-related code. Even further with this point, we could change the formatting of the generated python script to be like so:
files = [
	{
		filename: "LIST_CHANNEL_NAME_Source_Video_File_Name.mkv",
		timestamps = [
			(105.67, 108),
			(156, 187),
			(256, 270.12432)
		]
	}
]

hardsubs = True

import ffmpeg
...

and instead of dealing with just one file, we loop through the filenames list. This way, the user can easily combine multiple cut lists together by pasting additional file objects into the files dict (I have some ideas to make this even more convenient but maybe later).

Furthermore, if the user wants to change the behavior of the generated script or supply some flags to ffmpeg without having to edit the script, we could allow easy passing of args by doing something like:

files = [
	...
]

import sys

if len(sys.argv) > 2:
	args = sys.argv[2:]
else:
	# this else block would contain the options the user chose
	# inside of mpv, for example having hardsubs toggled on
	args = ["hardsubs"]

import ffmpeg
...

This way, we could potentially add even more functionality, such as a concat option or a framerate option (not inside of mpv-cut, but as commandline args to the generated cut list script).

The drawbacks, which IMO are completely negligible, are that

  • The user now has to perform one single extra action (double clicking the .py file after they're done watching the video in mpv).
  • The user has to install Python, pip, and the ffmpeg wrapper globally.
  • The cut list files will take up more space because they have python in them.
  • If the user chooses to use the global option, the cut list and cut will be placed at the global dir, so double clicking the cut list Python file won't work. If someone does want that functionality, we could store the absolute path to the source file and check if it exists, and if not use the current directory.

Let me know if you disagree with anything. I know for some people one extra step can be a dealbreaker, but compared to the reduced complexity in mpv/increased modularity and flexibility this provides, I think this seems like a great workflow. There are some technicalities to deal with but this shouldn't be too hard. I'm open to discussion though.

If you do like this idea, I'd be down to implement it if you figure out the Python ffmpeg wrapper queries for hardsubs, assuming they exist.

@familyfriendlymikey
Copy link
Owner Author

Not sure if this is considered hacky, but this appears to work well. Can test by placing test_video.mkv in the same dir.

import os, subprocess

def make_cuts(cut_list):

    print()

    for index, cut in enumerate(cut_list):

        inpath, args, timestamps = cut[2:].strip().split(" : ")
        inpath_noext, ext = os.path.splitext(inpath)
        args = args.split()
        start_time, duration = timestamps.split()
        end_time = str(float(start_time) + float(duration)).rstrip('0').rstrip('.')
        outpath = f"ENCODE_{inpath_noext}_FROM_{start_time}_TO_{end_time}{ext}"

        cmd = [
            "ffmpeg",
            "-nostdin", "-y", "-loglevel", "error",
            "-ss", start_time,
            "-t", duration,
            "-i", inpath,
            "-pix_fmt", "yuv420p",
            "-crf", "16",
            "-preset", "superfast",
            outpath
        ]

        print(f"\x1b[32m({index + 1}/{len(cut_list)})\x1b[0m {inpath} \x1b[32m->\x1b[0m")
        print(f"{outpath}\n")
        print(f"\x1b[34m{' '.join(cmd)}\x1b[0m\n")

        subprocess.run(cmd)

with open(__file__, "r") as this_file:
    lines = this_file.readlines()
    cut_list_index = lines.index("# BEGIN CUT LIST\n") + 1
    cut_list = lines[cut_list_index:]
    make_cuts(cut_list)

# BEGIN CUT LIST
# test_video.mkv : encode : 10.438 2.254
# test_video.mkv : encode : 20.843 5.438

If we put the cut list at the bottom like this, we don't have to parse the file in any way, and we can rely on LUA to append lines to the file if it already exists rather than risking reading and writing over an existing file.

@lukasschwab
Copy link

Preface: I don't have much use for the cut list feature. I see how it could be useful, but I'm almost always scrubbing for and clipping a single range. Not having an extra step to do that––not having to leave mpv to reencode a range––is what makes mpv-cut useful to me.

You know what you want from cut lists, so I'll gladly defer to you!

But what if we were to forego the cut list and make_cuts files entirely, and instead have mpv-cut generate a Python file, such as LIST_CHANNEL_NAME_Source_Video_File_Name.mkv.py, which would act as both the cut list for backup purposes and as the make_cuts script...

I see two risks with encapsulated cut lists

  1. Dependencies/environments might change in a way that breaks already-generated scripts.
  2. Data stored in a Python script is harder to access programmatically, which makes it harder to compose mpv-cut with other scripts.

It seems more natural to me to beef up make_cuts and use it more consistently, rather than trying to keep it in sync with the ffmpeg logic in cut.lua.

Suppose make_cuts reads tabular or JSONL data from stdin:

{ "file": "test_video.mkv", "start": 10.438, "end": 2.254 }
{ "file": "test_video.mkv", "start": 20.843, "end": 5.438 }

(JSON is nice because it's easy to extend with named options in the future, without breaking backwards-compatibility with old make_cuts versions and old cut lists).

Then cut.lua could invoke make_cuts instead of calling ffmpeg, passing it a single clip definition:

mp.commandv(
	"run",
	"echo", '{ "file": "test_video.mkv", "start": 20.843, "end": 5.438 }',
	"|", "make_cuts" -- TODO: figure out if you can actually pipe like this.
)

Then make_cuts can become the flexible/modular part, with options to tailor its behavior (a concat option, a framerate option, etc). cut.lua passes the local vars that serve as its config as make_cuts flags.

Of course, I'm totally content to customize these scripts for my own use case and just report whatever I learn about using ffprobe to switch between the ffmpeg subs-encoding options. No pressure!

Cheers

@lukasschwab
Copy link

I immediately ran into an issue hardcoding SRT subs from ffmpeg-python: kkroening/ffmpeg-python#663

@familyfriendlymikey
Copy link
Owner Author

Great points, thanks for sharing your thoughts. I'm open to doing that.

Regarding ffmpeg-python, I figured using a wrapper would be more straightforward/robust than parsing the result of ffprobe from scratch, but if not, using child_process or whatever is cool too. I'm not sure if you already had things working for both srt and image subs on your fork, so I hope the ffmpeg-python stuff wasn't too time consuming!

Also, this may no longer be relevant, but regarding getting stdout from a subprocess in mpv:

local r = mp.command_native({
    name = "subprocess",
    playback_only = false,
    capture_stdout = true,
    args = {"cat", "/proc/cpuinfo"},
})
if r.status == 0 then
    print("result: " .. r.stdout)
end

I'll see about modifying cut.lua.

@akippnn
Copy link

akippnn commented Feb 3, 2023

I'm assuming Node will be abandoned in favour of Python?

@familyfriendlymikey
Copy link
Owner Author

Nope, I implemented the changes in Python but the VM startup seemed pretty slow to me so I rewrote it in Imba targeting Node.

@akippnn
Copy link

akippnn commented Feb 4, 2023

Oh right. I suppose considering you haven't closed it yet, I assume hardcoding subs is still on the scope of the project?

@familyfriendlymikey
Copy link
Owner Author

I assume hardcoding subs is still on the scope of the project?

Absolutely. It's just that ffmpeg is a bit of a beast imo, so finding one command to handle every use case is pretty hard. That's why I ended up making the mpv extension as simple as possible (iirc lukas here had some nice ideas that inspired me), and just had it pipe JSON to an arbitrary external script where people could support their own use cases without having to learn lua or mpv's api. I'm now realizing though that most people won't customize the make_cuts script.

Maybe a better solution is to just allow people to supply their own actions in the config.lua file. So users could potentially do something like:

ACTIONS = {
	"COPY": "ffmpeg -ss {start_time} -t {duration} -i {infile} -c copy COPY_{channel}_FROM_{start_time}_TO_{end_time}_{outfile}"
	"HARDSUBS_IMAGE": ...
	"HARDSUBS_SRT": ...
}

Then users can support their own use cases as long as they have some ffmpeg knowledge, and I might be able to just get rid of the Node dependency altogether.

@akippnn
Copy link

akippnn commented Feb 5, 2023

That sounds about right. I don't think there's one command to handle every use case. I think in this case it's probably better to get information like the subtitle format from ffmpeg -i and the subtitle track currently being viewed.

Though I assume you can also get the subtitle format from mpv itself, so there's no need to use ffmpeg -i.

@familyfriendlymikey
Copy link
Owner Author

Alright I completely removed the Node dependency and made it very easy to configure custom actions. Example config.lua:

ACTIONS.ENCODE = function(d)
	local args = {
		"ffmpeg",
		"-nostdin", "-y",
		"-loglevel", "error",
		"-i", d.inpath,
		"-ss", d.start_time,
		"-t", d.duration,
		"-pix_fmt", "yuv420p",
		"-crf", "16",
		"-preset", "superfast",
		utils.join_path(d.indir, "ENCODE_" .. d.channel .. "_" .. d.infile_noext .. "_FROM_" .. d.start_time_hms .. "_TO_" .. d.end_time_hms .. d.ext)
	}
	mp.command_native_async({
		name = "subprocess",
		args = args,
		playback_only = false,
	}, function() print("Done") end)
end

I'm still interested in having the default encode action account for subs but I'm not really stoked to research ffmpeg stuff for a use case I don't have at this moment, so it'll have to wait unless someone finds a command that works for srt, image, and no subs (or any other cases if there are any).

@akippnn
Copy link

akippnn commented Feb 6, 2023

I'll see what I can do at the weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants