Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode bug in version 1.0.5-SNAPSHOT #588

Closed
vipcxj opened this issue Oct 27, 2020 · 3 comments
Closed

unicode bug in version 1.0.5-SNAPSHOT #588

vipcxj opened this issue Oct 27, 2020 · 3 comments

Comments

@vipcxj
Copy link
Contributor

vipcxj commented Oct 27, 2020

String effectiveString = TextRenderer.getEffectivePrintableString(s);

This line will cause some unicde font not visible. Such as ''. It is not printable in genernal fonts but printable when using Wingdings 2.
Here is the test case, Please change the version of openhtmltopdf in pom from 1.0.4 to 1.0.5-SNAPSHOT

@vipcxj vipcxj changed the title 1.0.5-SNAPSHOT unicode bug in version 1.0.5-SNAPSHOT Oct 27, 2020
@danfickle
Copy link
Owner

Hi @vipcxj,

I was not able to reproduce. Specifically, I got the # replacement character on both versions. As well as downloading and running your test case I conducted a couple of other tests:

    public static void main(String... args) throws FontFormatException, IOException {
        String html = "<span style=\"font-family:'Wingdings 2',serif;font-size:60px\">&#9746;</span>";
        
        PdfRendererBuilder builder = new PdfRendererBuilder();
        builder.useFastMode();
        builder.withHtmlContent(html, null);
        builder.toStream(new FileOutputStream("C:\\Users\\dan\Desktop\\desk.pdf"));
        builder.useFont(new File("C:\\Users\\dan\\Desktop\\fonts\\Wingdings2.ttf"), "Wingdings 2");
        builder.run();
        
        Font fnt = Font.createFont(Font.TRUETYPE_FONT, new File("C:\\Users\\dan\\Desktop\\fonts\\Wingdings2.ttf"));
        System.out.println(fnt.canDisplay('\u2612'));
    }

The result of that test, on both versions, was also the # character and false, suggesting that the font does not contain the desired character.

The method you mention runs characters through this method:

	/**
	 * Checks if a code point is printable. If false, it can be safely discarded at the 
	 * rendering stage, else it should be replaced with the replacement character,
	 * if a suitable glyph can not be found.
	 * @param codePoint
	 * @return whether codePoint is printable
	 */
	public static boolean isCodePointPrintable(int codePoint) {
		if (Character.isISOControl(codePoint))
			return false;
		
		int category = Character.getType(codePoint);
		
		return !(category == Character.CONTROL ||
				 category == Character.FORMAT ||
				 category == Character.UNASSIGNED ||
				 category == Character.PRIVATE_USE ||
				 category == Character.SURROGATE);
	}

This method claims that '\u2612' is indeed printable, so it shouldn't be filtered out.

@danfickle
Copy link
Owner

EDIT: The charater you pasted is U+F053 (&#61523;) which is indeed in the private use range. IF Wingdings 2 is using this character we do have a problem.

@danfickle
Copy link
Owner

EDIT 2: Yes, Wingdings 2 is using this character. POO! We'll have to rethink our approach in filtering characters, probably to one that just filters a couple of problematic characters such as soft hyphens.

Thanks for finding this before release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants