Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paste Handler: convert lists inside of table cells to pseudo lists WIP #55651

Draft
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

mpkelly
Copy link
Contributor

@mpkelly mpkelly commented Oct 26, 2023

Fixes #45774

Related #46775

Details to follow - WIP.

What?

Why?

How?

Testing Instructions

Testing Instructions for Keyboard

Screenshots or screencast

bullet = `${ index + 1 }. `;
}
if ( isNested ) {
bullet = ` ${ bullet }`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellatrix I made some progress on this. Now that the transformation happens during an earlier stage, the indentation whitespace I am adding gets stripped away. Any ideas on how to preserve it so nested lists look correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this:

Screenshot 2566-10-26 at 17 09 11

Gets rendered as this:

Screenshot 2566-10-26 at 17 09 31

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whitespace is being stripped out by htmlFormattingRemover in the same file. What are the implications of changing the code so it doesn't do this for td?

Or maybe the transformation I am adding can add an attribute data-preserve-whitespace="true" which htmlFormattingRemover can check? (it can also remove the attribute after reading it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to use an attribute like data-preserve-whitespace or just a flag like node.preserveWhiteSpace but these don't work. Any attributes need to be on the schema(s), or they get removed. And adding new properties to nodes like node.preserveWhiteSpace does not work because these properties get removed during other transformations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing I just tried was to record the XPath of the elements where you want to preserve white space, e.g. body/b/div/table/tbody/tr[3]/td and then check these paths in htmlFormattingRemover and return early. However, this doesn't work because in htmlFormattingRemover, the same path now looks like body/table/tbody/tr[3]/td due to other transform functions removing nodes. This is harder than I expected!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what are they using to represent the tables? Are those just unicode bullets (text) or is it actual list markup? Maybe test in a couple of more places?

  • Apple Notes (should be identical to Pages)
  • Anything else you use?
  • Could you create a table with a list in HTML and paste it in Google Docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want (1) these "fake" lists to be too good that it becomes confusing to understand what you're dealing with and (2) people to thing you can indent with spaces, because you cannot. Try a soft line break in a paragraph followed by spaces and preview the result. On the front-end the spaces will be collapsed because that's how HTML works. In the editor all rich text instances have white-space: pre-wrap turned on so even invisible spacing is visible. It's not ideal... either we should remove that CSS in the editor (but it's weird to press space multiple times with nothing happening), or we need to alternate between inserting spaces and non breaking spaces.

Copy link
Contributor Author

@mpkelly mpkelly Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want (1) these "fake" lists to be too good that it becomes confusing to understand what you're dealing with and (2) people to thing you can indent with spaces, because you cannot

@ellatrix, this feels like a bug to me. As it's a WYSIWYG editor, it feels like they should match. I tried creating the same "fake" list in the editor and it's fine but on the frontend it is collapsed.

Screenshot 2566-11-09 at 15 11 56 Screenshot 2566-11-09 at 15 13 13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we need to alternate between inserting spaces and non breaking spaces.

Can confirm this works. Can we go with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed my latest code.

@@ -189,6 +190,7 @@ export function pasteHandler( {
}

const filters = [
nestedListedConverter,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellatrix in #46775, you mentioned updating filterInlineHTML, but I found that this is not always called. I registered a transformer function here - is this ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add it both here and filterInlineHTML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the transform function there too. Still need to test it though.

…ting does not get removed during raw-handling transforms

if ( isList( node ) ) {
const nodes = transformList( node );
if ( nodes.length ) {
Copy link
Contributor Author

@mpkelly mpkelly Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the comment above, specialCommentConverter, which mentions using a special block node, core/nextpage, to avoid further transformations.

As a POC, I used the same node and found that whitespace is now preserved. Can we use something like this as a basis for a solution to keeping whitespace, @ellatrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this solution produces the final result we're after:

Screenshot 2566-11-06 at 12 02 15

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this work? Why are non breaking spaces otherwise stripped? What's responsible for that?

Copy link
Contributor Author

@mpkelly mpkelly Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this change isn't necessary:

const wrapper = doc.createElement( 'wp-block' );
wrapper.dataset.block = 'core/nextpage';

The way I added space characters (\u00A0) in this last commit seems to be enough by itself. Previously, I created a span with just a \u00A0, which didn't survive. In this commit, I append the space and bullet (\u00A0\u00A0-) together, and it works.

So I think I have a solution. All that remains is to write some tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, my earlier attempt used document.createTextNode( '\u00A0' ). I just tried that now, and it actually works too. 🤔

return (
Array.from( { length } )
// eslint-disable-next-line no-unused-vars
.map( ( i ) => `\u00A0 \u00A0` )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why three spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is hard to spot, and two didn't feel like enough. This produces something closer to the source list.

@@ -191,6 +193,7 @@ export function pasteHandler( {
const filters = [
googleDocsUIDRemover,
msListConverter,
nestedListedConverter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this shouldn't be done for all lists. Ideally, this is added to phrasing content cleanup so that it runs when pasting inline HTML, but also when pasting blocks where a table or figure captions contains lists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be part of removeInvalidHTML? Before removing an invalid list, convert it to plain text. We already have some fixing logic there that inserts line breaks when it encounters a block level element (see bottom of cleanNodeList).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than this, code looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started reworking it so it's compatible with cleanNodeList . I can maybe get a PR delivered tomorrow on the train; otherwise, it will be after the meet-up. I will include some integration tests.

Copy link
Member

@ellatrix ellatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some integration tests to the existing raw handling integration tests? The list in a table from Google Docs would be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raw Handling: unexpected results when pasting table with list from Google Docs
3 participants