Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native API doesn't show Categories or File Tags (at least for "FITS file" and "Image file") #3067

Closed
raprasad opened this issue Apr 7, 2016 · 17 comments
Assignees

Comments

@raprasad
Copy link
Contributor

raprasad commented Apr 7, 2016

Dataset with FITS file tag:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:10904/10065

API call used: https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=hdl:10904/10065

See this gist for resulting JSON w/o FITS file and `Image file`` tags:
https://gist.github.com/raprasad/1387646984f7c8402278d69f793e5379#example-fits-file

@kcondon
Copy link
Contributor

kcondon commented Apr 21, 2016

@raprasad does this happen for all file tags or only fits?

@raprasad
Copy link
Contributor Author

As far as I can tell, all tags. Including Code, Data, Documentation, etc

@raprasad raprasad changed the title Native API doesn't show File Tags (at least for "FITS file" and "Image file") Native API doesn't show Categories or File Tags (at least for "FITS file" and "Image file") Oct 26, 2016
raprasad added a commit that referenced this issue Oct 26, 2016
pdurbin added a commit that referenced this issue Jan 3, 2017
pdurbin added a commit that referenced this issue Jan 3, 2017
@pdurbin
Copy link
Member

pdurbin commented Jan 3, 2017

This was requested in https://groups.google.com/d/msg/dataverse-community/obUKSqR4GRA/MWKo7bb1EgAJ and the fix is trivial, I think, so I went ahead and made pull request #3553. I'll put this in Code Review at https://waffle.io/IQSS/dataverse

@pdurbin
Copy link
Member

pdurbin commented Jan 3, 2017

Here's how I tested locally, with a dataset that has two files, one with tags and one without tags:

curl -s http://localhost:8080/api/datasets/:persistentId?persistentId=doi:10.5072/FK2/G3KNOU | jq '.data.latestVersion.files'

[
  {
    "label": "50by1000.tab",
    "directoryLabel": "example",
    "version": 1,
    "datasetVersionId": 43,
    "tags": [
      "Code",
      "Data"
    ],
    "dataFile": {
      "id": 16,
      "filename": "50by1000.tab",
      "contentType": "text/tab-separated-values",
      "storageIdentifier": "159231994f1-b1509642e2a5",
      "originalFileFormat": "application/x-stata",
      "originalFormatLabel": "Stata Binary",
      "UNF": "UNF:6:x10r+Q9EK6aF/BMi+eKzGw==",
      "md5": "003b8c67fbdfa6df31c0e43e65b93f0e",
      "checksum": {
        "type": "MD5",
        "value": "003b8c67fbdfa6df31c0e43e65b93f0e"
      }
    }
  },
  {
    "description": "",
    "label": "trees.png",
    "directoryLabel": "trees",
    "version": 1,
    "datasetVersionId": 43,
    "tags": [],
    "dataFile": {
      "id": 12,
      "filename": "trees.png",
      "contentType": "image/png",
      "storageIdentifier": "159042c66b3-01aa5fa3c874",
      "originalFormatLabel": "UNKNOWN",
      "md5": "0386269a5acb2c57b4eade587ff4db64",
      "checksum": {
        "type": "MD5",
        "value": "0386269a5acb2c57b4eade587ff4db64"
      },
      "description": ""
    }
  }
]

@landreev
Copy link
Contributor

OK, since I have worked both on the branch 2290, and was talking to Phil as he was working on this task (adding the categories and tags); so it was easy to review.
It all looks great. The only not-entirely-straightforward thing here was the distinction between the "categories" and "tabular tags" - the former go into filemetadata sections, the latter - into the top-level datafile section. And it's all implemented properly now.

The non-trivial part about the logistics of handling this issue is that while the issue is "in review" on waffle, there is no pull request for it yet. Since Phil added this fix to the 2290 branch; and 2290 is still being finished. So I can't formally review it on github yet. So Phil and I agreed that I'll move the issue back into dev., and then, when Steve and Raman make a pull request for the 2290 branch, he'll link this issue to that PR and move it into QA...

@kcondon
Copy link
Contributor

kcondon commented Jan 26, 2017

OK, working. Closing.

@pdurbin
Copy link
Member

pdurbin commented Jan 26, 2017

@kcondon as I mentioned, there's a harvesting component that could be tested. Please see this comment for details: #3067 (comment)

@kcondon kcondon self-assigned this Jan 26, 2017
@kcondon kcondon reopened this Jan 27, 2017
@kcondon
Copy link
Contributor

kcondon commented Jan 27, 2017

@pdurbin json harvest is failing, ddi and dc work

@kcondon kcondon removed their assignment Jan 27, 2017
@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2017

Huh, I only tried JSON harvesting and I just got the stacktrace below as of e8dd342 when I click the "Run Harvesting" button. This may have been pulling the latest without doing a clean and build. Let me restart Glassfish and play with this some more.

[2017-01-27T09:07:37.972-0500] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=184 _ThreadName=Thread-9] [timeMillis: 1485526057972] [levelValue: 1000] [[
  java.lang.RuntimeException: java.lang.IllegalStateException: This web container has not yet been started
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1455)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
        at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2979)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:489)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
        at edu.harvard.iq.dataverse.harvest.client.FastGetRecord.harvestRecord(FastGetRecord.java:138)
        at edu.harvard.iq.dataverse.harvest.client.FastGetRecord.<init>(FastGetRecord.java:97)
        at edu.harvard.iq.dataverse.harvest.client.oai.OaiHandler.runGetRecord(OaiHandler.java:227)
        at edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean.processRecord(HarvesterServiceBean.java:326)
        at edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean.harvestOAI(HarvesterServiceBean.java:272)
        at edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean.doHarvest(HarvesterServiceBean.java:186)
        at edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean.doAsyncHarvest(HarvesterServiceBean.java:106)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1081)
        at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1153)
        at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4786)
        at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:656)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
        at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
        at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:55)
        at org.jboss.weld.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
        at sun.reflect.GeneratedMethodAccessor1292.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
        at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:608)
        at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
        at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
        at sun.reflect.GeneratedMethodAccessor1293.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:883)
        at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:822)
        at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:369)
        at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4758)
        at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4746)
        at com.sun.ejb.containers.EjbAsyncTask.call(EjbAsyncTask.java:101)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: This web container has not yet been started
        at org.glassfish.web.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1674)

@pdurbin pdurbin self-assigned this Jan 27, 2017
@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2017

@kcondon ok, I stopped Glassfish, did mvn clean and mvn package and now harvesting seems to work via JSON. I'm still on e8dd342. Screenshots below.

screen shot 2017-01-27 at 9 21 49 am
screen shot 2017-01-27 at 9 21 19 am

@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2017

I just tested harvesting file tags specifically (I'm still on e8dd342 ) and that seems to work too. Here's a screenshot:

screen shot 2017-01-27 at 10 01 59 am

I guess the thumbnails themselves aren't harvested, which is why they probably show as broken images above. This is what it shows in the log:

[2017-01-27T10:03:01.502-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter] [tid: _ThreadID=130 _ThreadName=http-listener-1(5)] [timeMillis: 1485529381502] [levelValue: 800] [[
  Failed to read in an image from /Users/pdurbin/dataverse/files/10.5072/FK2/APWMAE/159e06dfdb2-36c59112d220: Can't read input file!]]

[2017-01-27T10:03:01.502-0500] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.dataaccess.ImageThumbConverter] [tid: _ThreadID=130 _ThreadName=http-listener-1(5)] [timeMillis: 1485529381502] [levelValue: 800] [[
  Failed to read in an image from /Users/pdurbin/dataverse/files/10.5072/FK2/YKZ42A/15984ad1eee-d5b57e8c8e81: Can't read input file!]]

@kcondon kcondon assigned kcondon and unassigned pdurbin Jan 27, 2017
@kcondon
Copy link
Contributor

kcondon commented Jan 27, 2017

This is working, was a config issue involving certs and siteUrl setting that needed to be documented. Closing.

@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2017

With a self signed cert, this was the error I saw on my laptop in glassfish/domains/domain1/logs/harvest_tab_2017-01-27T11-51-33.log

<message>Exception processing harvest, server= https://dev1.dataverse.org/oai,format=dataverse_json com.lyncode.xoai.serviceprovider.exceptions.InvalidOAIResponse com.lyncode.xoai.serviceprovider.exceptions.HttpException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target</message>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants