Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable metadata_fields parameter on search query. #7942

Merged
merged 31 commits into from
Feb 3, 2022

Conversation

FNI18300
Copy link
Contributor

The search API is enhanced to add a new parameter called "metadata_fields"
The metadata_fields could take value such as :

  • metadata_fields=<metadatablock>:*

or

  • metadata_fields=<metadatablock>:<field1>&metadata_fields=:<field2>

When the parameter is added in request, the requested fields are returned in a "metadataBlocks" part.
The content of this block is for exemple as follow :

GET /api/search?q=*&metadata_fields=rudi:rudi_summary

"metadatafields": {
	"rudi": {
		"displayName": "Rudi Metadata",
		"fields": [
			{
				"typeName": "rudi_summary",
				"multiple": true,
				"typeClass": "compound",
				"value": [
					{
						"rudi_summary_language": {
							"typeName": "rudi_summary_language",
							"multiple": false,
							"typeClass": "primitive",
							"value": "cs-CZ"
						},
						"rudi_summary_text": {
							"typeName": "rudi_summary_text",
							"multiple": false,
							"typeClass": "primitive",
							"value": "string"
						}
					}
				]
			}
		]
	}
}

Limitation : If the requested field is a a compound one, all the compound properties are returned.
It's is not possible to request a field single from a compound objet.

What this PR does / why we need it:

The PR add a new parameter in search API to customize the returned items.
This way, to display dedicated attributes of a dataset only one request is needed.
Without this enhancement we need to make a first query to get results then for each result a second one to get the requested attributes.

Which issue(s) this PR closes:

Closes #7863

Special notes for your reviewer:

Suggestions on how to test this:
A usual call to the API with the new parameter is sufficient using for exemple a field from geagraphic metadatablock

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No user interface changes.
The added attribute for search API is not mandatory
In SolrSearchResult class when a parameter has been added to methods, the original one is keept.

Is there a release notes update needed for this change?:
no

The metadata_fields could take value such as :
- metadata_fields=<metadatablock>:*
- metadata_fields=<metadatablock>:<field1>&metadata_fields=<metadatablock>:<field2>
@coveralls
Copy link

coveralls commented Jun 14, 2021

Coverage Status

Coverage decreased (-0.009%) to 19.008% when pulling 47da150 on FNI18300:search_api_metadatafields into 389373d on IQSS:develop.

@sekmiller sekmiller self-assigned this Jun 28, 2021
return toJsonObject(showRelevance, showEntityIds, showApiUrls, null);
}

public JsonObject toJsonObject(boolean showRelevance, boolean showEntityIds, boolean showApiUrls, List<String> metadataFields) {
return json(showRelevance, showEntityIds, showApiUrls).build();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't working because you're passing the metadataFields to the toJsonObject method, but not to the json method at line 414

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right : I forget this part of code when I report my code from a previous git repository.
The correction is done

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. There's one code issue that I added as a single comment and also if you could take a shot at updating the documentation to show examples and output similar to what you put in the issue/PR

Also, please update your branch from develop to make sure there are no conflicts.

@djbrooke
Copy link
Contributor

Thanks @FNI18300, I'll move this back to the review column so we can take another look. cc: @sekmiller

@djbrooke
Copy link
Contributor

Hi @FNI18300 - one more thing that I missed when I moved this back to review. Can you please add some documentation for this, as @sekmiller suggested? It looks like you have a good example above. If you can add this into the documentation, I'll be happy to look at it for grammar/formatting.

@djbrooke djbrooke assigned FNI18300 and unassigned djbrooke Jun 30, 2021
@pdurbin
Copy link
Member

pdurbin commented Jun 30, 2021

If I could also add, I wouldn't say no to some integration tests, if you feel like writing them @FNI18300 . You could add them to SearchIT.java and if you have any questions, I'm happy to help.

@FNI18300
Copy link
Contributor Author

FNI18300 commented Jul 6, 2021

It's a good suggestion to add documentation about this.
You think probably about this page https://guides.dataverse.org/en/latest/api/search.html
But how can I contribute to this ? Is there a github project for this too ?

@pdurbin
Copy link
Member

pdurbin commented Jul 6, 2021

@FNI18300 Docs are all in the same repo. Here's the source for that Search API page: https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/api/search.rst

You can find some tips on contributing to documentation here: https://guides.dataverse.org/en/5.5/developers/documentation.html

If you run into trouble, just ask. 😄

@pdurbin
Copy link
Member

pdurbin commented Sep 30, 2021

@FNI18300 hi! The following comment from https://groups.google.com/g/dataverse-community/c/l41Cp2KEf9M/m/bAsmR0exAgAJ by @jggautier reminded me to come back to this pull request:

Phil, can the Search API be used to get all of the metadata of each dataset? I've always seen that the Search API returns a few metadata fields (description, subject, keywords) but definitely not everything. See https://demo.dataverse.org/api/search?q=*&type=dataset.

It looks like you added docs back in July (!) in e50df25. Are you ready for this pull request to go back into code review? Thanks!

@FNI18300
Copy link
Contributor Author

Hi,

yes i'am ready to go back into code review.
In fact we already use a version of Dataverse with this enhancement for our projet... which is in pre-production.

I did not found time to add tests (sorry about that) so it will be very kind that my PR will be reviewed "as is"...

Do you think it's possible ?

Thanks in advance.

@pdurbin
Copy link
Member

pdurbin commented Sep 30, 2021

@FNI18300 yep. No problem. I just put it back in review. Thanks for the contribution! When you're ready to go live, please do get in touch to add your installation to the map! 😄 You can just create an issue at https://github.com/IQSS/dataverse-installations/issues

@djbrooke djbrooke self-assigned this Oct 4, 2021
@djbrooke
Copy link
Contributor

djbrooke commented Oct 5, 2021

Hi @FNI18300 - sorry about the delay in getting to this. Can you update from develop? I'll be happy to update the formatting of the docs as well.

@pdurbin
Copy link
Member

pdurbin commented Jan 7, 2022

As @kcondon and I have been discussing, I'm not able to reproduce the null pointer exception but he's going to give it a shot. I've been testing with a branch created from this pull request plus the latest from develop merged in: https://github.com/IQSS/dataverse/tree/7942-tweaks

I tried with the dataset he mentioned in our Slack conversation as well as the "all fields" dataset (dataset-create-new-all-default-fields.json). (This JSON didn't "just work" to create a dataset, by the way. I got an error and had to take out an item from a controlled vocabulary. Weird.)

@pdurbin
Copy link
Member

pdurbin commented Jan 7, 2022

@FNI18300 I should add that if the stacktrace above helps you reproduce the null pointer by all means please go ahead and fix it.

@pdurbin pdurbin self-assigned this Jan 10, 2022
@kcondon
Copy link
Contributor

kcondon commented Jan 10, 2022

@pdurbin @FNI18300 I was able to reproduce the issue by pasting the url from the stack trace into my browser:
https://dataverse-internal.iq.harvard.edu/api/v1/search?q=*&type=dataset&metadata_fields=citation:*

If I put a / before the first * in the url or before the last, I get different responses but neither fails. Is there some logic that counts the occurrence of *?

https://dataverse-internal.iq.harvard.edu/api/v1/search?q=\*&type=dataset&metadata_fields=citation:*
{"status":"OK","data":{"q":"\*","total_count":0,"start":0,"spelling_alternatives":{},"items":[],"count_in_response":0}}

https://dataverse-internal.iq.harvard.edu/api/v1/search?q=*&type=dataset&metadata_fields=citation:\*
{"status":"OK","data":{"q":"*","total_count":704,"start":0,"spelling_alternatives":{},"items":[{"name":"Version 1","type":"dataset","url":"https://doi.org/10.70122/FK2/3AUDHA","global_id":"doi:10.70122/FK2/3AUDHA","description":"1","published_at":"2020-06-11T18:07:08Z","publisher":"Local file1","citationHtml":"Admin, Dataverse, 2001,
...

and to complete the various combinations:
https://dataverse-internal.iq.harvard.edu/api/v1/search?q=\*&type=dataset&metadata_fields=citation:\*
{"status":"OK","data":{"q":"\*","total_count":0,"start":0,"spelling_alternatives":{},"items":[],"count_in_response":0}}

I'm not 100% sure why I was using the various combinations but I think I was following the testing instructions to make a usual call to the api with the new parameter and I saw this in the history of the guides:
c1d3eb4#diff-e723b4c884f9ceaa942a5aff9cf07982b6e82c5d3d434ab83b33b8e36d410a1f

and the formatting in the guides around the parameter was a bit confusing to me on a quick scan, the example url does show. the format that would work q=* for first occurrence, then citation:* but the sentence below it says, The above example metadata_fields=citation:* returns under "metadataBlocks" all fields from the "citation" metadata block.
[Kevin] it seems that when the doc is rendered, the * appears as *, so that is likely where I got it from

So, I see two issues: 1. what is the correct syntax and does the doc agree, 2. in the case of the incorrect syntax, should it be throwing a null ptr?

In discussing with Phil on Friday, he was not able to reproduce the null ptr with the above url on his local build. I was wondering whether dev environments still build and run in debug like they used to and so perhaps automatically initialize all vars as opposed to running in my test env?

@kcondon kcondon removed their assignment Jan 10, 2022
@pdurbin
Copy link
Member

pdurbin commented Jan 11, 2022

@FNI18300 hi, I've been playing around with your code and I'm not sure why I was unable to reproduce the null pointer before but now I can. I can reproduce the problem both on the 7942-tweaks branch I mentioned (which is just the "develop" branch merged in) and this pull request itself, the tip of which is d02c7f8 as of this writing.

This is how I've reproduced it:

  • Create a dataset in the root.
  • Publish it.
  • Use the database id of the dataset (e.g. 2 in this example) like this: curl "http://localhost:8080/api/v1/search?q=entityId:2&type=dataset&metadata_fields=citation:*"
  • Observer a 500 error and NullPointerException at edu.harvard.iq.dataverse.search.SolrSearchResult.json(SolrSearchResult.java:729)

I'm including a full stack trace and database dump:

Can you please see if you can reproduce (and hopefully fix) this?

One thing I observed during my troubleshooting is the error seems to happen on the field publication which is strange because I'm not actively filling in this field. (It's the parent field for Related Publication, which has multiple subfields.) I used this query to confirm for myself that the dataset version includes publication (though I'm a little shaky in this area):

dvndb=> select datasetfield.id, name from datasetfield join datasetfieldtype on datasetfieldtype_id = datasetfieldtype.id where datasetversion_id = (select id from datasetversion where dataset_id
= 2);
 id |      name      
----+----------------
 14 | title
  3 | author
  4 | datasetContact
  1 | dsDescription
  7 | subject
  2 | publication
  6 | depositor
 12 | dateOfDeposit
(8 rows)

Please let us know if you need more help troubleshooting. Once you fix the bug, please be sure the merge the latest from "develop" into your branch before requesting another review. Thanks!

@pdurbin pdurbin assigned FNI18300 and unassigned pdurbin Jan 11, 2022
@FNI18300
Copy link
Contributor Author

Well it seems that the behavior of jsonification has changed.
So I fix the null pointer which appears when a null array is added into the json object.

I merge develop into my FNI18300:search_api_metadatafields branch

Hoping this times is the right one for merging (this way I could upgrade the project to the new version ;-)

@djbrooke
Copy link
Contributor

Hey @FNI18300 - thanks for the updates! We'll take another look.

@landreev landreev removed their assignment Feb 2, 2022
@kcondon
Copy link
Contributor

kcondon commented Feb 2, 2022

@FNI18300 Please refresh this branch from develop since a db change that dropped a column makes this throw exceptions when creating datasets. Thanks!

@kcondon kcondon self-assigned this Feb 2, 2022
@FNI18300
Copy link
Contributor Author

FNI18300 commented Feb 3, 2022

Hi,
I merge develop into the FNI18300:search_api_metadatafields branch
The continuous integration looks good

@kcondon kcondon merged commit 3f109e6 into IQSS:develop Feb 3, 2022
@pdurbin pdurbin added this to the 5.10 milestone Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Search enhancement - Returning metadatafields
8 participants