Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commas in email addresses break guestbook CSV files #8193

Closed
jggautier opened this issue Oct 27, 2021 · 5 comments · Fixed by #8343
Closed

Commas in email addresses break guestbook CSV files #8193

jggautier opened this issue Oct 27, 2021 · 5 comments · Fixed by #8343
Assignees

Comments

@jggautier
Copy link
Contributor

jggautier commented Oct 27, 2021

What steps does it take to reproduce the issue?

  • Create and publish a Dataverse collection and add a guestbook
  • In that Dataverse collection, create and publish a dataset with at least one file and add the guestbook to that dataset
  • Download one of the dataset's files and in the guestbook form's Email field, enter an email address with a comma anywhere in the string
  • Go back to the Dataverse collection and download the guestbook CSV file
  • Open the CSV file in any spreadsheet application and see that the comma in the email address is interpreted as a delimiter, so the string is split into one or more columns on the row, which pushes the rest of that response's answers out of the columns that they're supposed to be in

When does this issue occur?
When downloading a guestbook (over UI or the newer API endpoint) that has any responses where a downloader has entered one or more commas in the email field

Which page(s) does it occurs on?
NA

What happens?
The comma in the email address that the downloader has entered into the guestbook form is interpreted as a delimiter, so the string is split into one or more columns on the row, which pushes the rest of that response's answers a column over

To whom does it occur (all users, curators, superusers)?
All users

What did you expect to happen?
The email addresses with commas in them wouldn't be split into two or more columns.

Maybe this could be avoided by adding quotes (“) to each field and data to avoid the content being misplaced. This was suggested in #4671 and it's what pgAdmin does when exporting database query results to CSV.

I'm not sure if commas should even be allowed in the Email address field, so perhaps field validation could also be added for the Email field. But the guestbooks in the Harvard Dataverse Repository already contain around 1,621 responses in 30 guestbooks (including the default guestbook) where there's one or more commas entered in the Email field, so something would have to be done with those. Adding the quotes would help spreadsheet applications interpret and display the guestbook data correctly.

Sometimes it's obvious that people entered commas in their email addresses in place of periods and other times people entered multiple email addresses in the one field. So even if field validation were added to the Email field to ensure that commas couldn't be entered, existing emails would still break the CSV.

Which version of Dataverse are you using?
v5.6 and v5.7

Any related open or closed issues to this bug report?
#4671 and #3449

Screenshots:

Screen Shot 2021-10-27 at 6 41 18 PM

I entered juliangautier@g.harvard.edu,in in my guestbook's Email field. in is put in the next column, pushing the rest of the answers (Institution and Position) out of the columns they're supposed to be in.

Here's the pictured guestbook CSV file:
Testing_guestbooks_216_GuestbookReponses.csv

@jggautier
Copy link
Contributor Author

jggautier commented Oct 27, 2021

About those 1,621 guestbook entries, sometimes people entered multiple email addresses in the Email field and sometimes the email addresses have with commas (is that a thing?)

@djbrooke
Copy link
Contributor

djbrooke commented Nov 3, 2021

@jggautier thanks for the report and for discussing just now, and explaining that this is blocking self-service for guestbooks in both the UI and API. Will prioritize.

@jggautier
Copy link
Contributor Author

jggautier commented Nov 3, 2021

Thanks Danny.

For more context, for commas in other guestbook fields, the solution was to replace each comma with a space. @pdurbin mentioned in an old issue that this was kind of weird but I think there were no big objections. I think this solution would be even weirder for commas in email addresses. If I saw an email address with a space, and I didn't know that commas were being replaced with spaces, I'd wonder if I have the right email address.

@djbrooke
Copy link
Contributor

djbrooke commented Dec 8, 2021

  • We should escape the commas in the CSV, this will have impact on email addresses and also other fields in the guestbooks responses

@qqmyers
Copy link
Member

qqmyers commented Dec 8, 2021

FWIW:

String name = "\"" + dataset.getCurrentName().replace("\"", "\"\"") + "\"";
does this escaping - fairly simple - perhaps time for a util method?

@sekmiller sekmiller self-assigned this Jan 12, 2022
sekmiller added a commit that referenced this issue Jan 21, 2022
sekmiller added a commit that referenced this issue Jan 21, 2022
kcondon added a commit that referenced this issue Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants