Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some StarDict dictionnaries have their html tags exposed when converted. #253

Closed
foldfree opened this issue Aug 27, 2022 · 11 comments
Closed

Comments

@foldfree
Copy link

foldfree commented Aug 27, 2022

--version Plato 0.9.30
--device Kobo Libra h20

Following theses instructions, I added StarDict dictionaries from ebook-reader-dict in the /dictionaries/ folder.
It used to work fine a couple of months ago but since I updated the dictionaries today, I have a formatting issue where html tags for font style are visible (see the following image):
photo_2022-08-27_16-06-16

Not sure if I did something wrong or if BoboTiG changed the formatting of their dictionaries.

@foldfree foldfree changed the title some stardict dictionnaries have style html tags exposed when converted Some StarDict dictionnaries have their html tags exposed when converted. Aug 27, 2022
@baskerville
Copy link
Owner

This particular definition breaks the HTML detection routine: the first non-blank character is \ and not <.

@foldfree
Copy link
Author

All the definitions are formatted the same sadly, the English dictionary as well. I'm guessing other languages provided by the repo does too. Would it be possible to fix it?

@baskerville
Copy link
Owner

You can try the fix by amending convert-dictionary.sh or you can wait for a new version to be released. In both cases you'll have to let the StarDict dictionary be reconverted.

@BoboTiG
Copy link

BoboTiG commented Feb 3, 2023

Out of curiosity, should we change something on our side in https://github.com/BoboTiG/ebook-reader-dict to prevent using hacks or reconverting dicts?

@foldfree
Copy link
Author

foldfree commented Feb 3, 2023

Out of curiosity, should we change something on our side in https://github.com/BoboTiG/ebook-reader-dict to prevent using hacks or reconverting dicts?

I am not a dev but I am guessing replacing backslashes \by the html code &#92; could be a solution?

@baskerville
Copy link
Owner

baskerville commented Feb 3, 2023

Out of curiosity, should we change something on our side in https://github.com/BoboTiG/ebook-reader-dict to prevent using hacks or reconverting dicts?

If the first non-blank character in the definition is <, then the definition is seen as HTML by Plato's Dictionary application, otherwise just text. Since some of the definitions start with raw pronunciation strings (for example \ˈsɪɡ.mə ˈæl.dʒɪ.bɹə\), I've had to wrap those strings in a tag.

@MoTem
Copy link

MoTem commented Feb 24, 2023

Since the issue has yet to be resolved by either Plato nor BoboTIG, can someone guide me on what exactly to edit in the convert-dictionary.sh?

@baskerville
Copy link
Owner

Since the issue has yet to be resolved by either Plato nor BoboTIG, can someone guide me on what exactly to edit in the convert-dictionary.sh?

On Plato's side, the issue was resolved on August 27, 2022 by 67bd7fb.

@occivink
Copy link

occivink commented May 14, 2023

I've imported these dictionaries (versions 3.0.0 from 2023-05-01) into plato 0.9.35 and the issue persists.
This occurs with all the dictionaries I've tried (english french and german). There doesn't seem to be anything of interest in plato's log.
screenshot

@occivink
Copy link

Unfortunately the fix worked for english, but not all dictionaries.
I had not realized that it relied on the pronunciation delimiters, which appears to be \abc\ for French, /abc/ for English, and [abc] for German. I had a brief look on wiktionary and it looks like that should cover most languages.
I'm not sure if this is something that should be normalized in the export, the pronunciation wrapped in a paragraph (@BoboTiG) or the workaround extended. For now I've done the latter by changing the sed call to sed 's|^\([\[/].*\)|<p>\1</p>|', but that looks like a fragile fix to me.

@baskerville
Copy link
Owner

I just noticed another bug in the english dictionary: all the definitions end with a closing html tag but have no opening tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants