Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Check for Invalid Characters in Sentences #1591

Merged
merged 36 commits into from
Aug 2, 2019

Conversation

samm82
Copy link
Collaborator

@samm82 samm82 commented Jun 19, 2019

When an invalid character is encountered in a Sentence (wrapped as S String), an error is thrown:

chipmunkdocs.EXE: Invalid character: '%' in string "in this case. TEST% A rectangle represents the"

I added the invalid characters list as per these websites: HTML and LaTeX, although I left out the apostrophe and underscore characters since they were both used in Drasil. Should they be checked for as well? Any other thoughts? @JacquesCarette @Mornix

Closes #1534

@samm82
Copy link
Collaborator Author

samm82 commented Jun 21, 2019

EDIT: I looked at drasil-gen, and there isn't really a spot for checkValidStr there. I'm trying to add it to Tex.Print and HTML.Print when I noticed this:

In the LaTeX Print.hs file, we have a function escapeChars that escapes underscores:

escapeChars :: Char -> String
escapeChars '_' = "\\_"
escapeChars c = [c]

Which (if any) HTML and LaTeX characters should be escaped and which should result in errors? @JacquesCarette These are the lists of special characters in both languages.
HTML: " ' & < >
LaTeX: # $ % & ~ _ ^ \ { }

Also, before you mentioned using Either String P.Spec to avoid hard errors, and I'm not exactly sure what you meant by that. I (kind of) understand how Either works, but I'm not sure what design you had in mind.

@Mornix
Copy link
Collaborator

Mornix commented Jun 21, 2019

For HTML: " and ' are only "special characters" within the context of tags. I think for HTML & should be escaped (since & usually denotes the start of an escape sequence) while <> should be hard errors.

For LaTeX: I would say & should be escaped, and the rest should be hard errors.

@JacquesCarette
Copy link
Owner

@samm82 I definitely didn't mean to implement checkValidStr in drasil-gen; rather, that might be where it should be called. If not, then we kind of need to understand the delegation chain a bit more, to understand why not.

I think @Mornix 's refinement of what should be escaped / errors is good.

I mean that a checking function should return type Either String P.Spec, with the interpretation that a Left means an error happened, and a Right that everything's fine.

@samm82
Copy link
Collaborator Author

samm82 commented Jun 24, 2019

As far as I can tell, calling the check function in gen would make the most sense to be called in this function:

-- | Renders the documents
writeDoc :: PrintingInformation -> Format -> Filename -> Document -> Doc
writeDoc s TeX _ doc = genTeX doc s
writeDoc s HTML fn doc = genHTML s fn doc
writeDoc _ _ _ _ = error "we can only write TeX/HTML (for now)"

by going over doc, although I'm not sure the best way to do that.

@samm82
Copy link
Collaborator Author

samm82 commented Jun 24, 2019

Also, changing the check function to give an Either String P.Spec causes an import cycle, as printers depends on utils. @JacquesCarette

@JacquesCarette
Copy link
Owner

So writeDoc is probably 1 level too high - it should be done inside genTeX and genHTML, in part because they have different checks they need to do.

There are two check functions to write:

  1. one that operates on P.Spec and is a traversal
  2. one that operators on String
    The first should be in drasil-printers, the second in drasil-utils. The second should have type Either String String.

@samm82
Copy link
Collaborator Author

samm82 commented Jun 24, 2019

What should we do about underscores already in Drasil (mainly in labels, like Check-Physical_Constraints)? I'd say just replace them with hyphens.

@samm82
Copy link
Collaborator Author

samm82 commented Jun 24, 2019

Also, I'm not sure how to make/where to put the first function. I tried moving the structure from Import with the new Either to Tex.Print, like so,

spec :: Spec -> D
spec a@(s :+: t) = s' <> t'
  where
    ctx = const $ needs a
    s' = switch ctx $ spec s
    t' = switch ctx $ spec t
spec (E ex) = toMath $ pure $ text $ pExpr ex
spec (S s)  = either error (pure . text . (concatMap escapeChars)) $ checkValidStr s invalid
  where invalid = ['&', '#', '$', '%', '&', '~', '^', '\\', '{', '}'] 
spec (Sy s) = pUnit s
spec (Sp s) = pure $ text $ unPL $ L.special s
spec HARDNL = pure $ text "\\newline"
spec (Ref Internal r sn) = snref r $ spec sn
spec (Ref Cite2    r EmptyS) = cite (pure $ text r)
spec (Ref Cite2    r i)      = citeInfo (pure $ text r) (spec i)
spec (Ref External r sn) = externalref r $ spec sn
spec EmptyS              = empty
spec (Quote q)           = quote $ spec q

and I got a

glassbr.EXE: Invalid character: '\' in string "\nocite{*}
\bibstyle{ieeetr}
\printbibliography[heading=none]"
CallStack (from HasCallStack):
error, called at .\Language\Drasil\TeX\Print.hs:260:22 in drasil-printers-0.1.9-44KNry45UqsHgFD5XQ794w:Language.Drasil.TeX.Print
make: *** [glassbr_gen] Error 1

which means that this is a level too deep.

I personally think that it makes sense to have this function in Import, since if we're going to throw an error for an invalid character in the LaTeX, why not just check for it once? Although I guess it would make sense if we only wanted to generate the HTML - then the LaTeX characters wouldn't matter.

@JacquesCarette
Copy link
Owner

Yes, replacing underscores with hyphens is a good idea.

My guess is someone is prematurely rendering because there should be no '' in an S`. That "\nocite" shouldn't be inside a String. Whoever put it there is the guilty party. So this is catching a bug! Someone's stashing some literal LaTeX in a String. We'll need to find that and fix it.

@samm82
Copy link
Collaborator Author

samm82 commented Jun 24, 2019

It's defined a little later on in the file:

bibLines :: String
bibLines =
"\\nocite{*}\n" ++
"\\bibstyle{" ++ bibStyleT ++ "}\n" ++
"\\printbibliography[heading=none]"

Also, at some point, wouldn't we have to have a "\" inside a string to have it defined so it can be rendered? Since this use of "\" is internal to printers, I think it should be OK to just move the check to a higher level?

@JacquesCarette
Copy link
Owner

The "" will all be in Doc, not String. And those are fine.

bibLines should be of type Doc at the very least. But even better would be to have it generated properly, using our infrastructure. Shouldn't be that hard.

@samm82
Copy link
Collaborator Author

samm82 commented Jun 24, 2019

I have no idea how I'd go about this. Going through the rabbit hole of function calls from makeBib (which calls bibLines) leads to CiteField, with data as follows:

data CiteField = Address      Spec
               | Author       People
               | BookTitle    Spec -- Used for 'InCollection' references only.
               | Chapter      Int
               | Edition      Int
               | Editor       People
               | HowPublished HP
               | Institution  Spec
               | Journal      Spec
               | Month        Month
               | Note         Spec
               | Number       Int
               | Organization Spec
               | Pages        [Int] -- Range of pages (ex1. 1-32; ex2. 7,31,52-55)
               | Publisher    Spec
               | School       Spec
               | Series       Spec
               | Title        Spec
               | Type         Spec -- BibTeX "type" field
               | Volume       Int
               | Year         Int

And the Specs are concatenated together, and then passed to spec. It doesn't look like Doc has a concatenation, so I don't know how I'd translate bibLines to Doc and be able to add the native Spec to it.

@JacquesCarette
Copy link
Owner

Ok, I can help, but I may not get a proper time window to do it until late today / tomorrow.

@samm82
Copy link
Collaborator Author

samm82 commented Jun 25, 2019

I would prioritize #1601, as it is holding up a couple of other issues unless you think this change is more important. @JacquesCarette

@samm82
Copy link
Collaborator Author

samm82 commented Jul 12, 2019

I'm going to close this PR for now until you have a chance to look at the branch @JacquesCarette since it isn't ready to merge.

@samm82 samm82 closed this Jul 12, 2019
@samm82 samm82 reopened this Jul 26, 2019
Copy link
Owner

@JacquesCarette JacquesCarette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also some very nice changes to the latex printer in here. But a couple of issues to fix.

code/drasil-data/Data/Drasil/Citations.hs Show resolved Hide resolved
@JacquesCarette JacquesCarette merged commit 3b56d28 into master Aug 2, 2019
@JacquesCarette JacquesCarette deleted the checkInvalidChars branch August 2, 2019 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check for Invalid Characters in Strings for Printing
4 participants