Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the internal hyperlinks are broken after the merge #3868

Closed
pemmadi opened this issue Sep 17, 2024 · 7 comments
Closed

the internal hyperlinks are broken after the merge #3868

pemmadi opened this issue Sep 17, 2024 · 7 comments
Labels
example required not a bug not a bug / user error / unable to reproduce Waiting for information

Comments

@pemmadi
Copy link

pemmadi commented Sep 17, 2024

Description of the bug

I am trying to merge multiple PDFs into a single PDF using PyMuPDF, the merge works but the internal hyperlinks are broken after the merge.

How to reproduce the bug

import fitz  # PyMuPDF

def merge_pdfs(pdf_list, output):
    merged_pdf = fitz.open()
    for pdf in pdf_list:
        with fitz.open(pdf) as mfile:
            merged_pdf.insert_pdf(mfile)
    merged_pdf.save(output)

pdf_files = ['file1.pdf', 'file2.pdf']
merge_pdfs(pdf_files, 'merged_output.pdf')

PyMuPDF version

1.24.0

Operating system

Windows

Python version

3.8

@JorjMcKie
Copy link
Collaborator

It is mandatory to provide reproducing data when submitting a bug.

@pemmadi
Copy link
Author

pemmadi commented Sep 17, 2024

@pemmadi
Copy link
Author

pemmadi commented Sep 17, 2024

@JorjMcKie - file1.pdf & file2.pdf have internal links and they are working as expected but after merge they are not working(merged_ouput.pdf)

@JorjMcKie JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Sep 17, 2024
@JorjMcKie
Copy link
Collaborator

Please read the documentation here. You will see that "named" internal links are not supported / ignored.
As you do not want to provide an example file 😒, you need to check yourself whether file2.pdf has named internal links.

@pemmadi
Copy link
Author

pemmadi commented Sep 17, 2024

@JorjMcKie - I am using XSL to read data from XML and creating a ToC with internal links then converting to PDF, below is the code snippet for generating links

<xsl:template name="make-tableofcontents">
    <h2>
        <a name="toc">Table of Contents</a>
    </h2>
    <ul>
        <xsl:for-each select="n1:component/n1:structuredBody/n1:component/n1:section/n1:title">
                <li>
                    <a href="#{generate-id(.)}">
                        <xsl:value-of select="."/>
                    </a>
                </li>
        </xsl:for-each>
    </ul>
</xsl:template>

the generate-id() function in XSLT does not directly create a named destination link. It only generates a unique identifier for an XML node.

Can you help how I can fix this internal links issue, Is there any other way to create links without using named destination?

@JorjMcKie
Copy link
Collaborator

You can convert named links to GoTo links using PyMuPDF. This script does work:

import pymupdf

doc1 = pymupdf.open("file1.pdf")
doc2 = pymupdf.open("file2.pdf")
for page in doc2:
    links = page.get_links()
    for link in links:  # replace NAMED by GOTO links
        if link["kind"] != pymupdf.LINK_NAMED:
            continue
        nlink = {
            "kind": pymupdf.LINK_GOTO,
            "from": link["from"],
            "to": link["to"],
            "page": link["page"],
            "zoom": link["zoom"],
        }
        page.delete_link(link)  # delete named link
        page.insert_link(nlink)  # insert its GOTO version
    page = doc2.reload_page(page)  # important: finalize page updates!
doc1.insert_pdf(doc2)
doc1.ez_save("merged.pdf")

@pemmadi
Copy link
Author

pemmadi commented Sep 17, 2024

@JorjMcKie - Thanks man, its working, really appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
example required not a bug not a bug / user error / unable to reproduce Waiting for information
Projects
None yet
Development

No branches or pull requests

2 participants