Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webclips extraction confused by tabs #565

Open
Tokolino opened this issue Jan 3, 2024 · 21 comments
Open

Webclips extraction confused by tabs #565

Tokolino opened this issue Jan 3, 2024 · 21 comments
Labels
enhancement New feature or request

Comments

@Tokolino
Copy link

Tokolino commented Jan 3, 2024

I have a webclipped note where there are tabs at the beginning of the line - which makes the line interpreted as code block, and linked images are not recognised as such.

The attached note shows this behavior (in the lower part).
Debug.zip

@akosbalasko
Copy link
Owner

Well, I'm afraid by default tabs are code blocks in Obsidian, and images within a codeblock is not supported by Obsidian at all. So, I think it is a feature request, but what would be the desired behavior, how should it look like?

Screenshot 2024-01-04 at 10 34 19

@akosbalasko akosbalasko added the enhancement New feature or request label Jan 4, 2024
@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

I think the problem is that Yarle writes tabs at the beginning of the line at all. It may be that these are in the HTML source code, but then they have a different (wrong) meaning for Obsidian. So the correct behavior would be to remove or escape the tabs (if possible).
It is not only a problem with the image extraction: As it is regarded as codeblock any markdown formatting is not rendered but shown as it is.
The only reasonable solution is: Tabs should be only at the beginning of a line when the text is intended to be a code block.

@akosbalasko
Copy link
Owner

Aham. And what about a general "skip tabs recognition as codeblocks (by removing them)" toggle in the configuration panel?
But in relation of your last sentence, the question would be how Yarle should recognize that "the text is intended to be a code block". Any ideas are welcome.

@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

The only thing that I know of (and I am not an HTML expert) are the tags <pre> and <code>. Concerning other notes other than webclips:
As leading tabs in notes are not an indication of code blocks, it is maybe an idea to replace them with a certain number of spaces when they are at the beginning of a line. Or add an additional space at the beginning of the line to prevent interpretation as code block.

@akosbalasko
Copy link
Owner

Yes, that's true, but unfortunately evernote's Enex content is not a clear standard html. Now I tested it in the latest version, and the codeblock is stored as follows:

<div style="--en-codeblock:true;  ...

But as far as I remember these kind of new formats were introduced in v10, it was stored differently in v7. As I don't have that version, could you please create a simple codeblock-note in v7, and send me its enex exported in v7?

Thanks a lot!

@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

No problem.
Debug.zip

@akosbalasko
Copy link
Owner

thanks you! okay, the old one has almost the same div + style but a different attribute: -en-codeblock:true.
So the solution would be to convert codeblocks only if these settings are found in the note, and trim tabs from the beginning of the lines to prevent to be recognized them as codeblocks in Markdown.
Sense good?

@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

What happens when you webclip this html page?

<html>
  <body>
    This is not a code block.
	<pre>
	  This is a code block.
	</pre>
    This is not a codeblock
  </body>
 </html>

@akosbalasko
Copy link
Owner

I made it here: https://amethyst-juliana-94.tiiny.site/
Then I exported it in all of the meaningful variations: Fullpage, Article, SimplifiedArticle, Selection, here are the results:
In case of Fullopage and Article, it keeps the pre tags but converts a bit on the whole page like this:

<div style="min-height: 93px; font-size: 16px; display: block; min-width: 100%; position: relative;"> <div><div><span>
    This is not a code block.
	</span><pre>	  This is a code block.
	</pre>
    This is not a codeblock
  
 </div></div></div>

In SimplifiedArticle it replaces pre by div and puts the attribute <div style="--en-codeblock:true; :

<div style="--en-codeblock:true; --en-lineWrapping:false;box-sizing: border-box; padding: 8px; font-family: Monaco, Menlo, Consolas, &quot;Courier New&quot;, monospace; font-size: 12px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background-color: rgb(251, 250, 248); border: 1px solid rgba(0, 0, 0, 0.14902); background-position: initial initial; background-repeat: initial initial;"><div>      This is a code block.</div><div>    </div></div><div>     This is not a codeblock     </div>

and Multiple Selection e it replaces pre by div and puts the attribute <div style="--en-codeblock:true; and some others by span:

<div style="--en-codeblock:true; --en-lineWrapping:false;box-sizing: border-box; padding: 8px; font-family: Monaco, Menlo, Consolas, &quot;Courier New&quot;, monospace; font-size: 12px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background-color: rgb(251, 250, 248); border: 1px solid rgba(0, 0, 0, 0.14902); background-position: initial initial; background-repeat: initial initial;"><div>      This is a code block.</div><div>    </div></div><div><span style="font-size: 16px;">     This is not a codeblock     </span></div>

The variations differ from the layout because of the inline styles added to the divs, see screenshots:
Screenshot 2024-01-04 at 15 34 34
Screenshot 2024-01-04 at 15 34 38
Screenshot 2024-01-04 at 15 34 42
Screenshot 2024-01-04 at 15 34 47

@akosbalasko
Copy link
Owner

So, long story short: Yarle has to be prepared for all of the possibilities: handle the attribute, and the "pre" as well. + a toggle to trim the tabs from the beginning of the lines.

@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

You should also check what happens if a plain text note just contains tabs for formatting reasons.

@akosbalasko
Copy link
Owner

eh... okay I tested the codeblock stuff and everything looks fine (I added some extra tests).
Then I switched to the tab issue, but then I realized that in Obsidian the images works well, and shown, even if they are in an intended (tab at the beginning) line.
Then I checked your enex file, and I think the problem is not around the tab at all, but the fact that the images are gif-s, which are not shown in Obsidian at all. I'm not sure if it is a bug there or not, but I think it is not related to the conversion.

@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

I think you are not quite correct. I checked again with the note I provided in the original post. This is how a part of it looks in reading mode:
image
This is how it looks in edit mode:
image
And this is how it looks in edit mode with source view:
image
The image which is linked in this section has no back references:
image
(the image seems to be broken, but this is not the issue here)

Now I manually removed the leading tabs in this section, and this changed the situation. How it looks in edit mode with source view:
image
In edit mode:
image
And in reading mode:
image
And the image has its backlink:
image

All this change was only due to the removal of the leading tabs.

@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

But I can reproduce the behaviour that the link is displayed (and linked) in a simple test note which has leading tabs. But, as shown above, in the other note the tabs clearly have an effect on the linking. But currently I have no idea what the reason is...

What I also notice: When I use tabs in simple text notes, then the text is not displayed in monospace, meaning it is not considered as code block. And of course, if a section is not considered as code, then images are linked and pictures are displayed.

@akosbalasko
Copy link
Owner

Maybe once the line that contains a link is being edited may trigger a reload of the references, mentions, backlinks etc.

@Tokolino
Copy link
Author

Tokolino commented Jan 4, 2024

Nope. When I just change text in this line then nothing changes. When I remove the tab, then the backlink appears. When I enter the tab again, then the backlink is gone again. You can try yourself with the note I provided. My case is around line 209.

@akosbalasko
Copy link
Owner

Well, yeah, this exactly shows that Obsidian interprets tabs, e.g. indentation "with more precedence", differently than normal text, and after hitting one or more tabs it looses the backlinks.
So I still think that the problem is around Obsidian, not around Yarle. And I hardly see anything to fix it on Yarle's side.
I don't want to implement an option any more that removes the starting indentations, because it can easily lead to data loss (I mean loss of indentation where it has real meaning).
I don't want to implement any workarounds neither, because of the fact that Obsidian interprets the produced markdown differently.
The only way what I can see to serve a controlled solution is to have a config option that removes the indentation IF ** the line contains a link** AND the note is a webclip

@Tokolino
Copy link
Author

Tokolino commented Jan 5, 2024

Currently I seem to have the impression:
If there is a non-indented line of text directly above the indented line, then the indentation does not create a code block. If there is no such a line, then it creates a code block.

I think the solution with the addional webclip option is best...

@Tokolino
Copy link
Author

Tokolino commented Jan 6, 2024

It's also mentioned in the help vault (source code view):
image

From my experience: If the line above the line with the tabs is also a code block, then the line is formatted as code block. So if I remove the blank lines between the two blocks then it is still seen as a code block, because the block above is a code block, too. If I just write normal text above (without indentation) then it is just formatted as indented text.

@akosbalasko
Copy link
Owner

@Tokolino
Copy link
Author

Tokolino commented Jan 7, 2024

Thank you! But I think it is not working in all cases. It does in a lot of cases, but not all of them... In the attached enex-file check for the image before "Je m’appelle Alex" in the note "Voyage en Guyane _ Mes Conseils et Mon Itinéraire Idéal de 15 jours"

The attachment should be here:

No problem. Debug.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants