Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badly formated XML for docx output #60

Open
SG-phimeca opened this issue Jan 29, 2021 · 4 comments
Open

Badly formated XML for docx output #60

SG-phimeca opened this issue Jan 29, 2021 · 4 comments

Comments

@SG-phimeca
Copy link

Eqnos produces badly formated XML when used for docx output.
I compile file demo.md containing only

$$ y = mx + b $$ {#eq:line}

or alternatively

$$ y = mx + b $${#eq:line}

with the following command

pandoc demo.md -o demo.docx --filter pandoc-eqnos

The output file cannot be open with Microsoft Windows (in a windows VirtualBox) nor LibreOffice.

I am running Ubuntu 16.04.

No error is reported on compilation.
The -v flags outputs

pandoc 2.11.3.2
Compiled with pandoc-types 1.22, texmath 0.12.1, skylighting 0.10.2,
citeproc 0.3.0.3, ipynb 0.1.0.1

I use version 2.5.0 of pandoc eqnos, installed with anaconda.

@nialov
Copy link

nialov commented Mar 17, 2021

Same issue, the {#eq:eq_label} labeling will stop Word from opening the compiled docx.

➜ pandoc -v
pandoc 2.11.4
Compiled with pandoc-types 1.22, texmath 0.12.1, skylighting 0.10.2,
citeproc 0.3.0.5, ipynb 0.1.0.1
➜ pandoc-eqnos --version
pandoc-eqnos 2.5.0

@pfeffer90
Copy link

Hi, I have the same issue. The error by libreoffice (v7.0) is

image

Previous googling got me to #16, so it might be a related problem. Indeed, when I looked at the generated document.xml file, there seems to be issues with matching of the <w:p> tags in the <w:bookmarkStart w:id="0" w:name="eq:eq1" />. When I removed the two problematic tags, firefox correctly display the document.xml file, but rezipping into a docx still lead to a corrupted files.

Here is the document.xml

<?xml version="1.0" encoding="UTF-8"?><w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"><w:body><w:p><w:pPr><w:pStyle w:val="FirstParagraph" /></w:pPr><w:r><w:t xml:space="preserve">An equation</w:t></w:r><w:r><w:t xml:space="preserve"> </w:t></w:r><w:bookmarkStart w:id="0" w:name="eq:eq1"/><w:r><w:t></w:p><w:p><w:pPr><w:pStyle w:val="BodyText" /></w:pPr><m:oMathPara><m:oMathParaPr><m:jc m:val="center" /></m:oMathParaPr><m:oMath><m:r><m:t>x</m:t></m:r><m:r><m:t>  </m:t></m:r><m:d><m:dPr><m:begChr m:val="(" /><m:endChr m:val=")" /><m:grow /></m:dPr><m:e><m:r><m:t>1</m:t></m:r></m:e></m:d></m:oMath></m:oMathPara></w:p><w:p><w:pPr><w:pStyle w:val="FirstParagraph" /></w:pPr></w:t></w:r><w:bookmarkEnd w:id="0"/></w:p><w:sectPr /></w:body></w:document>
pandoc -v  
           
pandoc 2.13
Compiled with pandoc-types 1.22, texmath 0.12.2, skylighting 0.10.5,
citeproc 0.3.0.9, ipynb 0.1.0.1
pandoc-eqnos --version

pandoc-eqnos 2.5.0

@johnallison0
Copy link

Same issue for me. I believe the issue was introduced due to a change in pandoc 2.11.3. I have reverted back to pandoc 2.11.2 and Word no longer complains. All pandoc releases after 2.11.2 results in the badly formatted XML output.

@BRainynight
Copy link

I've got "Xml parsing error" when I converted markdown to docx. After I removed <w:r><w:t> in variable bookmarkstart, and </w:t></w:r> in bookmarkend (pandoc_eqnos.py L215) , my markdown file can be converted to doxc successfully.

I found this solution by comparing with pandoc-fignos, code in these 2 projects has a little different:

This is in fignos:

        bookmarkstart = \
          RawBlock('openxml',
                   '<w:bookmarkStart w:id="0" w:name="%s"/>'
                   %attrs.id)
        bookmarkend = \
          RawBlock('openxml', '<w:bookmarkEnd w:id="0"/>')

But this is in eqnos:

        bookmarkstart = \
          RawInline('openxml',
                    '<w:bookmarkStart w:id="0" w:name="%s"/><w:r><w:t>'
                    %attrs.id)
        bookmarkend = \
          RawInline('openxml',
                    '</w:t></w:r><w:bookmarkEnd w:id="0"/>')
        ret = [bookmarkstart, AttrMath(*value), bookmarkend]

I'm not really sure what will affect after removing them, seems like the bookmark break <w:p> and </w:p> pairs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants