You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current code in marker/output.py separates the path at the first point, which can cause problems with scientific papers. These papers often have file names in the format '1706.03762.pdf', which causes the folder name to be erroneously '1706' instead of '1706.03762'. This can cause folders with different papers to be overwritten.
Example to illustrate
File name: '1706.03762.pdf'
Current folder name: '1706'
Desired folder name: '1706.03762'
This change ensures that the folder name is generated correctly and no data is lost.
Advantage
Separating the .pdf at the end of the filename instead of at the beginning ensures that scientific papers are saved in the correct folders and that no data is overwritten.
Important note
This is especially important for scientific papers such as those found on arxiv.org.
Current code (output.py line 6):
subfolder_name=fname.split(‘.’)[0]
Suggested new code:
subfolder_name=fname.rsplit(‘.’, 1)[0]
The text was updated successfully, but these errors were encountered:
Description
The current code in marker/output.py separates the path at the first point, which can cause problems with scientific papers. These papers often have file names in the format '1706.03762.pdf', which causes the folder name to be erroneously '1706' instead of '1706.03762'. This can cause folders with different papers to be overwritten.
Example to illustrate
This change ensures that the folder name is generated correctly and no data is lost.
Advantage
Separating the .pdf at the end of the filename instead of at the beginning ensures that scientific papers are saved in the correct folders and that no data is overwritten.
Important note
This is especially important for scientific papers such as those found on arxiv.org.
Current code (output.py line 6):
Suggested new code:
The text was updated successfully, but these errors were encountered: