Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: can't download PDF of demo ATBD on live APT #545

Closed
Tracked by #588 ...
deborahUAH opened this issue Aug 2, 2022 · 53 comments
Closed
Tracked by #588 ...

Bug: can't download PDF of demo ATBD on live APT #545

deborahUAH opened this issue Aug 2, 2022 · 53 comments
Assignees
Labels
bug Something isn't working PDF Issues in or related to PDF exporting

Comments

@deborahUAH
Copy link
Collaborator

deborahUAH commented Aug 2, 2022

image
Image above shows the error obtained when I try to download the PDF of the demo ATBD located on earthdata.nasa.gov/apt.
Please identify error and fix.

Success Criteria:
user can download the public Demo ATBD as a PDF

@deborahUAH deborahUAH added the bug Something isn't working label Aug 2, 2022
@deborahUAH deborahUAH added this to the PI 22.4 APT Milestone milestone Aug 2, 2022
@aboydnw
Copy link

aboydnw commented Aug 3, 2022

@bwbaker1 Here is the ticket Deborah created about the PDF generation bug we discussed today. If you are able to add an abstract and even create a new demo ATBD and still are not able to generate a PDF, that would be a great troubleshoot just to help narrow down what could be causing this.

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Aug 3, 2022 via email

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Aug 3, 2022

@aboydnw I tried adding missing content, updating to new minor version, and updating to new major version and received the same error from the screen shot. Looks like it will need to be a completely new document.

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Aug 4, 2022

New document was created and all original content was migrated over and inline equations, references, etc. were updated. The published document is now downloadable.

@aboydnw
Copy link

aboydnw commented Aug 5, 2022

Thanks @bwbaker1 I am sure that was a ton of work! There might have been some error when we migrated data (cc @leothomas @naomatheus to check my diagnosis there)

However, when I try to download this document (which I assume is the one you just made)

I wonder if there is an issue with downloading documents as an unauthorized user? Here is a screenshot of the error I receive. If it is unrelated to the original issue, we can always create a new ticket.
image

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Aug 5, 2022

@aboydnw Yeah, that is a completely new document that I created. I was able to download it yesterday, but I got the same error as you today when I tried to download it.

Also, attached is the PDF I downloaded yesterday. I just tested to see if it would download and did not look at the PDF. There are some strange things about it. First, only a few pages were in the PDF (intro, historical perspective, and algorithm description). Second, in-text references show up as "(?, ?)." Third, there are extra spaces around variables (e.g., E d).
demo-only-mod-20-instantaneous-p-v1-0.pdf

@aboydnw
Copy link

aboydnw commented Aug 5, 2022

@naomatheus @oliverroick the demo ATBD we've been looking at above has been accidentally deleted. If I remember correctly, this action in APT is a soft delete, right? Is there any way for you to recover this deleted ATBD? It would have "[Demo]" in the title and probably be the most recently created document.

@aboydnw
Copy link

aboydnw commented Aug 8, 2022

@bwbaker1 Unfortunately, this action is a hard delete. So, there is no way of recovering the deleted ATBD. Which is a bummer, knowing how much work you put into it. The tiniest silver lining might be that it gives us a chance to confirm if the download functionality is broken. Also, there is an update that the team is working on the fix document search that we should probably wait to be completed before publishing a new ATBD. Would it be possible to wait until publishing a new demo ATBD until after that update has gone through?
It is this ticket, for reference: #500

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Aug 8, 2022

@aboydnw I will confirm with Deborah but I don't see a problem with waiting to publish the demo (as long as it is soon). I will go ahead and recreate it and have it ready to publish when the document search is fixed.

Thanks for checking on recovering the ATBD.

@aboydnw
Copy link

aboydnw commented Aug 8, 2022

@bwbaker1 sounds good. We expect this fix to be rolled out by end of week.

@aboydnw
Copy link

aboydnw commented Aug 12, 2022

No description provided.

@deborahUAH
Copy link
Collaborator Author

@bwbaker1 please update your status on the demo ATBD

@deborahUAH
Copy link
Collaborator Author

demo ATBD is ready to publish @aboydnw @wrynearson can we do so? per messages above, we were waiting for the fix to document search. Please let Brad know as soon as we can.

ALSO - both Brad and I get the error at the top of this ticket when trying to print either the journal PDF or the document PDF. This must be fixed by 9/16. Work with Brad next week please to get the problem resolved.

@wrynearson
Copy link
Member

Hi @deborahUAH, correct, please wait until we fix the document search before publishing. We'll let you know as soon as that's done.

I assume that the PDF export error is on https://www.earthdata.nasa.gov/apt/?

@naomatheus could you please look into the backend logs for this?

@wrynearson
Copy link
Member

@deborahUAH, just out of curiosity, why is 9/16 the deadline?

@deborahUAH
Copy link
Collaborator Author

deborahUAH commented Sep 8, 2022 via email

@wrynearson
Copy link
Member

Hi @leothomas, has @naomatheus been in touch with you about this issue? If not, would you be able to help us look into this bug?

I wasn't sure if @naomatheus had access to the logs in the prod environment. For context for reproducing, this was in reference to the demo ATBD on production. This was later accidentally deleted, and we've having APT colleagues wait to recreate a demo ATBD until we get OpenSearch changes to prod (#500), but they are still having PDF download issues on other reports. I am unable to recreate because my account (will@developmentseed.org) hasn't been approved yet (does someone, maybe @bwbaker1 or @kaulfusa have permissions to approve my account?).

cc: @TimMcCauley

@bwbaker1
Copy link
Collaborator

@wrynearson Your account is now approved.
So I get two different errors when I try to download ATBD PDFs. When I am logged in as a contributor, and go to download

download

I get this error:
error_message

When I try to download while logged in as a curator, I get this error:
Untitled_message

It doesn't seem to matter what the current status of the ATBD is in (draft, review, etc.).

@leothomas
Copy link
Collaborator

Hey there! I'm currently unable to access MCP (likely due to an update to Kion that was run last night), but happy to check out the logs once my access is restored. If seen the first error ('NoneType object has no attribute 'get') when trying to generate a PDF that is missing one of the required sections (eg: Abstract). Ideally this would be handled a bit more gracefully, but I would check make sure all the required sections of the ATBD have content

@bwbaker1
Copy link
Collaborator

@leothomas This is the ATBD I'm trying to download: "[DEMO Only] MOD20: Instantaneous Photosynthetically Available Radiation and Absorbed Radiation by Phytoplankton." The best I can tell, it seems all of the required sections are completed. Some are just placeholders, but there is still content. But I still get ('NoneType object has no attribute 'get') error.

@aboydnw
Copy link

aboydnw commented Oct 3, 2022

@leothomas @naomatheus I thought we had removed the requirement for sections to be completed to generate a PDF? Did those changes get reverted or not pushed to production? Or am I losing my mind with those changes.

@naomatheus
Copy link
Collaborator

@aboydnw that sounds familiar Anthony. Do you happen to know which issue that was covered under?

@aboydnw
Copy link

aboydnw commented Oct 3, 2022

I see in this release it has a note that Abstract should be optional. Maybe that is not the bug here, but could just be more info:
https://github.com/NASA-IMPACT/nasa-apt/releases/tag/v2.4.0-beta

@naomatheus
Copy link
Collaborator

Hi @bwbaker1 , could you approve my account also? My username/email are RobinsMK12/matthew@developmentseed.org

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Oct 3, 2022

@naomatheus Your account is approved.

@leothomas
Copy link
Collaborator

I logged in as a curator and tried downloading the [DEMO Only] Mod20: ... ATBD and got an error which isn't being picked up by the error handling logic, but looking through the full LaTeX output, the error is:

[15]
Runaway argument?
{\nolinkurl {https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/ \ETC.
! Paragraph ended before \href@split was complete.
 
                   \par 
l.728 

Which is due to a space in this url:
image

(between archive/allData/ and 61/MOD05_L2/?...).

As a curator, I don't have the ability to edit that document, and I'd rather not do it through the database. Can someone try removing the space and seeing if it resolves the error?

@bwbaker1 I wasn't able to replicate the 'NoneType object has no attribute 'get' error when downloading the [DEMO Only] Mod20:... document, but the screenshot you posted has a document with title request review test - can you confirm if the NoneType error also happens with the [DEMO Only] document once the URL space issue is resolved?

@naomatheus
Copy link
Collaborator

What's going on here is that when an ATBD does not have content, the backend receives some null values in the document_data parameter. (see app/pdf/generator.generate_latex).

This can be resolved by a backend restructuring (similar to what was just done for SNWG PDF generation).

However I wanted to ask @danielfdsilva is there something that could be implemented on the front end that places some placeholder values into an empty ATBD when we call the PDF download and generation endpoint?

@danielfdsilva
Copy link
Collaborator

@naomatheus As far as I know, the server sends the values to the pdf generator. The frontend only requests a pdf. There should be placeholders for almost all fields, but that should be a server thing

@bwbaker1
Copy link
Collaborator

bwbaker1 commented Nov 4, 2022

@naomatheus That makes sense, but the ATBDs we have tested have all of the content there (sometimes it is just "TBD"). That is my confusion on this ticket because this has been mentioned before.

@naomatheus
Copy link
Collaborator

@danielfdsilva thanks Daniel. Yea you're right on that. After looking into this, the error was raising not because of "missing" values, but because of how some typings have changed since the PDF generation logic was originally written.
For example, we now have attributes of PDF documents that are just titles, so only strings.
The old logic was expecting "heavier" data structures and didn't know how to handle things like standalone titles - hence "no attribute get."

Pending review of PR 592 this will push up to staging for testing @bwbaker1 @wrynearson

@wrynearson
Copy link
Member

Thanks @naomatheus. #592 will probably only be reviewed early next week. Once it is, we will let @bwbaker1 know so that he is able to test it on staging.

@naomatheus
Copy link
Collaborator

Hi @bwbaker1 #592 is merged into staging. cc @wrynearson

@wrynearson
Copy link
Member

Hi @bwbaker1, @naomatheus found a bug related to images not displaying in downloaded PDFs. When you test this ticket in staging, could you test for images as well? @naomatheus made a new issue: #593

@deborahUAH
Copy link
Collaborator Author

deborahUAH commented Nov 16, 2022

I just got an email from a user regarding PDF error. She needs to be able to download the PDF. This is her error, just putting it here if this helps shed any more light on the various PDF issues. @naomatheus

Latexmk: Examining 'omps-hcho-v1-0.log'
=== TeX engine is 'XeTeX'
Latexmk: Found bibliography file(s) [/tmp/44/pdf/omps-hcho-v1-0.bib]
Latexmk: Summary of warnings from last run of *latex:
Latex found 16 multiply defined reference(s)
Latex failed to resolve 73 citation(s)
Collected error summary (may duplicate other messages):
xelatex: Command for 'xelatex' gave return code 1
Refer to 'omps-hcho-v1-0.log' for details
Latexmk: Use the -f option to force complete processing,
unless error was exceeding maximum runs, or warnings treated as errors.
Latexmk: Errors, so I did not complete making targets

@deborahUAH
Copy link
Collaborator Author

@wrynearson @naomatheus @bwbaker1 I just tested the PDF download on staging. I added to the DeborahATBD draft some images with captions and some references to test the function of both. Here is what I found:

  1. I did get both a journal PDF and a regular PDF when requested!! I downloaded 4 times and it worked every time. Yay!
  2. I did not get any images showing in either type of PDF (and no captions)

image

image

  1. I did not get any references in either type of PDF -> not body text references or anything in the reference list.

image

image

  1. I was able to successfully upload my bibtex file (it was put through a cleaner first). Yay!
  2. However, the uploaded references do not have any DOIs though the bibtex file does contain them.

image

image

Since no references show up in the PDFs I can not tell what else may be going wrong with them. I will add this information to the bibtex ticket as well.
  1. as expected the top of the PDF says "This ATBD was downloaded from the NASA Algorithm Publication Tool (APT)" however it also says "Manuscript submitted to Earth and Space Science" which it should not since I did not yet tell it I was going to submit to the journal.

image

So ....the PDF is still not working correctly. Please prioritize this repair. PDF function is a P1 bug. Thanks!

@naomatheus
Copy link
Collaborator

@wrynearson
I noticed the references not showing also. I discussed this functionality @leothomas as well.

@naomatheus
Copy link
Collaborator

@wrynearson

PDF behavior question:
This might be a bit of a "meta" question, but nonetheless.

  • What sort of behavior would we want if there are no captions for an image? For instance, create a space by inserted an "empty caption", or insert a placeholder.
  • What behavior would we want if an image cannot be found for some reason? For instance, just leave it blank, or insert a placeholder "image not found."

@wrynearson
Copy link
Member

  1. I'd think a placeholder "No caption provided.", but let's ask @deborahUAH and @bwbaker1
  2. I'd think a placeholder "Image not found.", but let's ask @deborahUAH and @bwbaker1

@deborahUAH
Copy link
Collaborator Author

@naomatheus 1) after thinking about this a bit, I think having "no caption provided" would be best as it would indicate to the user that they forgot to enter one. They could just enter a space if they choose to have no caption, but I figure nearly all scientists will have a caption.
2) if a user uploaded an image, but for some reason it does not show up in the PDF, then an "Image not found" message is appropriate. Ideally, more information would be desired, something like user image successfully uploaded to APT, image not showing in PDF. Error information "xxxxx" Where the error information will help with tracking down what is wrong.

@deborahUAH
Copy link
Collaborator Author

deborahUAH commented Nov 29, 2022

@naomatheus
here is a [previous PDF generated by the APT in 2020].
nasa-atbd.pdf

here is a recent skeletal PDF
deborah-atbd-v1-0.pdf

I provide both so you can see differences. Ask when you have any questions as both have issues and neither are perfect.
But each should show you the order of content in the ATBD template

@naomatheus
Copy link
Collaborator

naomatheus commented Dec 1, 2022

@deborahUAH @wrynearson
I deployed the updates in this first PDF download fix to the production environment (earthdata url).
I've encountered errors with some of the ATBDs, but many can be downloaded. As you know, formatting is not correct and we are currently reviewing updates to resolve all PDF generation/download issues.

cc: @bwbaker1

@wrynearson
Copy link
Member

wrynearson commented Dec 5, 2022

@naomatheus thanks for the update. Can we identify a pattern of which PDFs are not downloading? @bwbaker1 said the main demo ATBD can't be downloaded. Do you have any idea why this is occurring on prod when it wasn't on staging?

@naomatheus
Copy link
Collaborator

@wrynearson
I am looking into this and I need a review on PR #595 that resolves images PDF images, caption, all sections, and most content types. Once 595 goes to staging these issues should be resolved.

The same errors did not present in staging due differences in content between the PDFs in the prod site and those in the staging site.

@deborahUAH
Copy link
Collaborator Author

@naomatheus please provide current status of this ticket.... we still can not download a PDF of the Demo ATBD in production. I would like to close this ticket, but need to get PDF download working.. I see other tickets mentioned herein. Please summarize status of effort here.

@naomatheus
Copy link
Collaborator

naomatheus commented Jan 5, 2023

@deborahUAH Please discuss with @wrynearson regarding what to do with the status of this ticket. This particular ticket has been open for a long time, and the state of PDF has improved significantly. To summarize, I'll have at least two additional team members assisting me with APT starting in one and two weeks respectively. Our pace will accelerate as we'll be able to focus on PDF download availability and PDF formatting requirements.

State of PDF download:

I am able to download most (not all) PDFs in both production, with my own user role, and staging environment, with the contributor role,. Regarding the PDFs that do not download successfully in production. I believe an additional deployment of staging code base to production will resolve download errors more completely. The remaining reason for some PDFs not downloading successfully has to do with the images - I won't go into detail here as we're still discussing. We identified the issue today and this will be assigned by @wrynearson .

I am able to successfully download the following PDFs in production.

  • bib-te-x-video-v1-0-journal.pdf
  • imp-sci-mcp-v1-0-journal.pdf
  • iop-v1-0-journal.pdf
  • opera-disturbance-product-v1-0-journal.pdf
  • request-review-test-v1-0-journal.pdf
  • tempo-formaldehyde-v1-0-journal.pdf
  • tempo-l-0-1-b-processor-v1-0-journal.pdf
  • tempo-nitrogen-dioxide-v1-0-journal.pdf

I am able to download all PDFs that I have access to in staging.

Recent efforts

Recently my efforts have been in the area of PDF generation, specifically the formatting of PDFs. These are issues like those described in the most recent 3-4 issues created by @bwbaker1 in the main issues page. It is appropriate to separate formatting issues even though there is overlap with PDF generation in general. A formatting quirk can block the download of a particular PDF and this would be the overlap.
The team and I will be focusing on both the availability of PDF generation in general and the specific formatting requirements.

@wrynearson wrynearson added the PDF Issues in or related to PDF exporting label Jan 17, 2023
@wrynearson wrynearson assigned naomatheus and unassigned wrynearson Jan 17, 2023
@wrynearson
Copy link
Member

wrynearson commented Jan 17, 2023

This ticket is for the PDF generation succeeding or not. Other PDF issues are tagged with the PDF Issues in or related to PDF exporting tag, and will be addressed separately.

@wrynearson
Copy link
Member

Closing this as Document PDFs are now downloadable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PDF Issues in or related to PDF exporting
Projects
None yet
Development

No branches or pull requests

7 participants