V2 of refactor LiveText to use diff-match-patch #11639

codeofdusk · 2020-09-19T19:09:54Z

Link to issue number:

Supersedes #11500. Closes #3200. Closes #11498.

Summary of the issue:

LiveText objects use difflib to calculate text changes:

Difflib is slow, making 10,000-line objects completely unusable and slowing down or locking up NVDA for other large objects.
Comparisons are made at the line level, so edits in the middle of lines cause characters to the right of the caret to be read out.

Description of how this pull request fixes the issue:

This PR changes diffing functions to operate at the string rather than line level and adds diff-match-patch (DMP) as an optional diffing algorithm for LiveText objects. It is anticipated that DMP will become the default in a future NVDA release pending positive user testing.

Unlike #11500, this PR does not import or dynamically link to DMP due to licensing issues. Instead, a Python application is run in another process that calls the DMP extension, and communicates over standard IO.

Testing performed:

Tested this approach, difflib, and a dynamically linked DMP Python extension on very large (millions of characters) blocks of text. Observed that while DMP was able to return diffs in about 10 seconds (even with insignificant IPC overhead), Difflib ran for several minutes (I aborted the diff before it could complete).
Tested deleting and inserting characters in the middle of a line both with difflib and DMP. With difflib the characters to the right of the caret were read out (as stated in better support for editing characters in the middle of lines in command prompt/terminal #3200). With DMP only the changed characters were read.

Known issues with pull request:

None.

Change log entry:

== Bug Fixes ==

In terminal programs on Windows 10 version 1607 and later, when inserting or deleting characters in the middle of a line, the characters to the right of the caret are no longer read out. (better support for editing characters in the middle of lines in command prompt/terminal #3200)
- This experimental fix must be manually enabled in NVDA's advanced settings panel by changing the diff algorithm to Diff Match Patch.

== Changes For Developers ==

LiveText._getTextLines has been removed. Instead, override _getText which returns a string of all text in the object. (V2 of refactor LiveText to use diff-match-patch #11639)
LiveText objects can now calculate diffs by character. To forcibly modify this for a particular object, override the diffAlgo property (see the docstring for details). (V2 of refactor LiveText to use diff-match-patch #11639)

codeofdusk · 2020-09-19T19:15:46Z

Cc @camlorn, @feerrenrut, @LeonarddeR, @michaelDCurran, @tspivey.

feerrenrut · 2020-09-21T14:31:51Z

Thanks for the update @codeofdusk.

I don't think it is a good idea to try to rush this, it would be nice to get it into 2020.4, but we have to be realistic about the amount of work left to do. If it is ready in time we will include it.

We privately discussed creating the separate application in Python, did you explore this? If so, what stopped this approach?
- It's great that you have been able to verify this approach using the C++ application.
- Looking at the source quickly, there will be some work to turn this into something ready for production, using python here will likely save you a lot of time.
For the sake of simplicity in this initial approach I'd suggest using standard input / output instead of named pipes, especially since there seems to be open questions about them. See https://stackoverflow.com/a/8475367 for more information. I can't speak to the performance differences with certainty, though I strongly suspect they performance costs will be very similar. I wouldn't spend time looking at it now, but here are some alternative IPC approaches to consider for future reference https://docs.microsoft.com/en-us/windows/win32/ipc/interprocess-communications

LeonarddeR · 2020-09-21T17:11:00Z

Is this winapi module you used also part of the c++ code?
I agree with @feerrenrut that it isn't very helpful to introduce named pipes as a communication layer to NVDA that is very different from other ways of inter process communication like stdin/stdout and rpc. I'd prefer one of those.
If there is c++ code working with named pipes, it shouldn't be that difficult to make it work with RPC, I assume.

feerrenrut · 2020-09-21T17:50:47Z

Sorry, I was a little unclear from the linked answer on where the issues with named pipes are? They don't seem to be discussed at all?

The open questions are in the current diff for this PR, in particular synchronization for their creation. The link was showing you how to use standard input from python.

Using Python for what amounts to I/O seems like a lot of extra overhead (I'd just be calling a DMP Python extension written in C++).

The benefits of doing this (that I can immediately identify):

is that you don't have to get the memory management correct and robust in C++, currently there are problems.
you don't have to do the work to build the executable, a python script is enough.
you can match the IPC mechanism on the DMP side with the NVDA side, you don't have to learn two ways of doing things.

I think there is likely to be minimal overhead for this. Most of the work will be done by the native parts of python anyway.

Could you please provide some additional comments on this? Where is work necessary, and how would Python save me time?

If you still wish me to review that code then I'll set some time aside to write things up properly, in the meantime you could look at some modern C++ examples specifically on memory management.

feerrenrut · 2020-09-21T20:47:12Z

Might I want to consider looking into RPC? (it seems like it could be easier, but might be "too intimate" for licensing...)

Perhaps, eventually. Please start by getting this delivered using standard IO.

Is there an equivalent of Valgrind on Windows? How do you suggest I debug memory leaks/access problems in general?

This is not necessary, it's a very simple program, but it is not following modern approaches, this makes it error prone. Judging by the minimalism of the code, I suspect it doesn't handle failures well. To learn more, the following is a good resource: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines

Please try to write this in python instead.

feerrenrut · 2020-09-22T16:15:55Z

a Python implementation of NVDA_dmp, that depends on the DMP C++ extension.

Just to be clear, this is using JoshData/diff_match_patch-python right?

However, calls to sys.stdout.buffer.write don't seem to work. How should I write binary data to stdout?

What happens? How do these fail? There are some other suggestions on how to do this on stackoverflow, but it would be good to understand why this fails.
Though note, from https://stackoverflow.com/a/8475367 which states that communicate should be used:

Use communicate() rather than .stdin.write, .stdout.read or .stderr.read
to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.

Also, how do I call the included Python interpreter as a subprocess? sys.executable from an installed copy of NVDA just points to nvda.exe

Yes, we'll have to add a way to run an external script to NVDA. Eventually it should be a separate executable, but for now you could add an option to NVDA to execute a script with execfile('file.py'). eg you call nvda.exe --exec file.py This will allow you to test the approach.

Finally, an alternative would be to create a python extension using the LGPL-2.1 licensed opencor/libxdiff or similar libraries (eg xdiff from libgit). This still uses the same Meyers diff algorithm used by diff match patch. The advantage is it could be run in the same process.

lukaszgo1 · 2020-09-22T22:59:26Z

Yes, we'll have to add a way to run an external script to NVDA. Eventually

Can't we just compile the Python script to exe similar to how nvda_eoaProxy and nvda_slave is build?

it should be a separate executable, but for now you could add an option to NVDA to execute a script with execfile('file.py'). eg you call nvda.exe --exec file.py This will allow you to test the approach.

I guess it would be simpler to add such option to nvda_slave.

feerrenrut · 2020-09-23T14:55:23Z

communicate blocks until the process returns

Ok, that is something to deal with. That said the warning to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process should be noted.

Looking at the sample you posted:

have you confirmed this does what you expect if it is not binary encoded?
can you print some messages to show what stage nvda_dmp is at?
It would be good to confirm that nvda_dmp is doing what it looks like it should be doing.
Do you know for sure that the shell / idle wherever you are running this code, will actually show binary data piped to stdout?
You might instead capture it and print the string?

… requires that NVDA be run from source!

AppVeyorBot · 2020-09-24T20:10:28Z

System test failure
Build (for testing PR)

See test results for failed build of commit 1e65d27cef

codeofdusk · 2020-09-24T20:14:11Z

@feerrenrut I've updated the PR and description.

This also adds the DMP Python extension to the repo, but I'm unsure if it should be in a different location.

AppVeyorBot · 2020-09-26T01:07:41Z

Build (for testing PR)

See test results for failed build of commit 80aaa05acc

codeofdusk · 2020-09-26T01:46:17Z

The build is failing, but I can't see why. What's going on here? Cc @feerrenrut, @LeonarddeR, @michaelDCurran.

michaelDCurran · 2020-09-26T05:55:43Z

In setup.py, you have explicitly imported some things from versionInfo, no doubt to make linting happy. However, versionInfo is already specifically imported further down, underneath where gettext is imported and installed, providing ```_()```. You need to move your versionInfo import down past the gettext stuff. Of course flake8 may complain about imports not at the top of the file, but you'll have to silence that with a flake8 comment.

…d to more accurately describe behaviour.

feerrenrut

This looks ok. I think it will need significant testing, so I'm not comfortable to include it at this late stage in the 2020.4 release. I'll merge it for release with 2021.1 as soon as we branch off for the beta.

…21.1.

feerrenrut

Thanks for your work on this @codeofdusk

codeofdusk · 2021-01-01T14:52:31Z

There seems to be a deadlock somewhere in this code, making autoread stop working sometimes in some consoles. I've had a tough time finding it. Hopefully someone will report the issue with a clear set of steps to reproduce.

zstanecic · 2021-01-01T16:58:25Z

It’s very hard give steps to reproduce, as it happens randomly, for example in the wsl consoles. 1. Windows subsistem for linux

feerrenrut · 2021-01-04T01:12:56Z

@codeofdusk see if you can get a stack trace / log for it. That will give a lot of clues about what is going on.

feerrenrut · 2021-01-04T02:56:19Z

It looks like the 'in nvda process' code doesn't flush stdin / stdout. This stack overflow answer suggests both processes need to flush.

More research is required to determine if it necessary to flush for each "step" in the communication.

codeofdusk · 2021-01-04T10:55:22Z

I've found a set of steps to reproduce on my particular system. Adding flush calls at various points in the communication process does not appear to change behaviour in any way.

codeofdusk · 2021-01-11T23:44:18Z

#11998 includes a possible patch for the described diff hang issue.

Refactor LiveText to use diff-match-patch via IPC with a C++ process.

8b1ff14

codeofdusk mentioned this pull request Sep 19, 2020

UI Automation in Windows Console: make default in FORMATTED consoles (Windows 11 Sun Valley 2) #10964

Merged

Explain magic numbers.

bddf453

This comment has been minimized.

Sign in to view

codeofdusk mentioned this pull request Sep 23, 2020

NVDA hangs up in terminal, when a large piece of text is loaded #6291

Closed

This comment has been minimized.

Sign in to view

switch to a Python implementation of nvda_dmp. This commit definitely…

59cfbb5

… requires that NVDA be run from source!

Compile nvda_dmp as an exe.

816ea00

This also adds the DMP Python extension to the repo, but I'm unsure if it should be in a different location.

codeofdusk added 2 commits September 26, 2020 07:57

Move import.

74ac3b3

Uniformize ends of inserted chunks.

7bc466e

codeofdusk mentioned this pull request Sep 26, 2020

Allow NVDA to still function if a component changes the current directory #11650

Merged

s/available/supported in config, for consistency with rest of code an…

1e4b3f4

…d to more accurately describe behaviour.

Review action.

47b0c13

feerrenrut previously approved these changes Dec 1, 2020

View reviewed changes

feerrenrut added this to the 2021.1 milestone Dec 1, 2020

This comment has been minimized.

Sign in to view

Remove _getTextLines entirely, as this PR has been retargetted for 20…

b4f65fe

…21.1.

codeofdusk dismissed feerrenrut’s stale review via b4f65fe December 8, 2020 05:02

Merge branch 'master' into livetext-dmp-cpp

dbd8ad8

codeofdusk requested a review from feerrenrut December 9, 2020 08:46

codeofdusk added 2 commits December 19, 2020 08:42

Merge branch 'master' into livetext-dmp-cpp

16db1a9

Update user guide (s/2020.4/2021.1) as PR was held back.

7a8d579

feerrenrut added the deprecated/2021.1 Label used to track deprecations due for removal in the 2021.1 release label Dec 31, 2020

feerrenrut previously approved these changes Dec 31, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into livetext-dmp-cpp

d890ac9

feerrenrut added a commit to codeofdusk/nvda that referenced this pull request Dec 31, 2020

update changes file for PR nvaccess#11639

55a7d69

feerrenrut dismissed their stale review via 55a7d69 December 31, 2020 08:59

update changes file for PR nvaccess#11639

78e4012

feerrenrut force-pushed the livetext-dmp-cpp branch from 55a7d69 to 78e4012 Compare December 31, 2020 09:02

feerrenrut approved these changes Dec 31, 2020

View reviewed changes

feerrenrut merged commit 9889285 into nvaccess:master Dec 31, 2020

codeofdusk deleted the livetext-dmp-cpp branch December 31, 2020 10:56

codeofdusk mentioned this pull request Dec 31, 2020

Fix error in user guide due to DMP retargetting #11981

Merged

codeofdusk mentioned this pull request Jan 11, 2021

Update nvda_dmp to possibly address an autoread hang #11998

Merged

CyrilleB79 mentioned this pull request Feb 4, 2021

Fixes somes context help issues in advanced settings panel. #12034

Merged

seanbudd mentioned this pull request Mar 5, 2021

Deprecations for 2021.1 #12123

Closed

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2 of refactor LiveText to use diff-match-patch #11639

V2 of refactor LiveText to use diff-match-patch #11639

codeofdusk commented Sep 19, 2020 •

edited

Loading

codeofdusk commented Sep 19, 2020

feerrenrut commented Sep 21, 2020

This comment has been minimized.

LeonarddeR commented Sep 21, 2020

This comment has been minimized.

feerrenrut commented Sep 21, 2020

This comment has been minimized.

feerrenrut commented Sep 21, 2020

This comment has been minimized.

feerrenrut commented Sep 22, 2020

lukaszgo1 commented Sep 22, 2020

This comment has been minimized.

feerrenrut commented Sep 23, 2020

This comment has been minimized.

This comment has been minimized.

AppVeyorBot commented Sep 24, 2020

codeofdusk commented Sep 24, 2020

AppVeyorBot commented Sep 26, 2020

codeofdusk commented Sep 26, 2020

michaelDCurran commented Sep 26, 2020 via email

feerrenrut left a comment •

edited

Loading

This comment has been minimized.

feerrenrut left a comment

codeofdusk commented Jan 1, 2021

zstanecic commented Jan 1, 2021 via email •

edited by feerrenrut

Loading

feerrenrut commented Jan 4, 2021

feerrenrut commented Jan 4, 2021

codeofdusk commented Jan 4, 2021

codeofdusk commented Jan 11, 2021

V2 of refactor LiveText to use diff-match-patch #11639

V2 of refactor LiveText to use diff-match-patch #11639

Conversation

codeofdusk commented Sep 19, 2020 • edited Loading

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

codeofdusk commented Sep 19, 2020

feerrenrut commented Sep 21, 2020

This comment has been minimized.

LeonarddeR commented Sep 21, 2020

This comment has been minimized.

feerrenrut commented Sep 21, 2020

This comment has been minimized.

feerrenrut commented Sep 21, 2020

This comment has been minimized.

feerrenrut commented Sep 22, 2020

lukaszgo1 commented Sep 22, 2020

This comment has been minimized.

feerrenrut commented Sep 23, 2020

This comment has been minimized.

This comment has been minimized.

AppVeyorBot commented Sep 24, 2020

codeofdusk commented Sep 24, 2020

AppVeyorBot commented Sep 26, 2020

codeofdusk commented Sep 26, 2020

michaelDCurran commented Sep 26, 2020 via email

feerrenrut left a comment • edited Loading

Choose a reason for hiding this comment

This comment has been minimized.

feerrenrut left a comment

Choose a reason for hiding this comment

codeofdusk commented Jan 1, 2021

zstanecic commented Jan 1, 2021 via email • edited by feerrenrut Loading

feerrenrut commented Jan 4, 2021

feerrenrut commented Jan 4, 2021

codeofdusk commented Jan 4, 2021

codeofdusk commented Jan 11, 2021

codeofdusk commented Sep 19, 2020 •

edited

Loading

feerrenrut left a comment •

edited

Loading

zstanecic commented Jan 1, 2021 via email •

edited by feerrenrut

Loading