Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File upload to S3 bucket fails #9285

Closed
golsch opened this issue Jan 13, 2023 · 16 comments
Closed

File upload to S3 bucket fails #9285

golsch opened this issue Jan 13, 2023 · 16 comments

Comments

@golsch
Copy link

golsch commented Jan 13, 2023

What steps does it take to reproduce the issue?
I am using a fresh instance and PowerScale OneFS 9.0 as the backend.
The bucket is configured as follows:

dataverse.files.default.type=s3
dataverse.files.default.label=default
dataverse.files.default.bucket-name=bucket
dataverse.files.default.custom-endpoint-url=<host>:<port>
dataverse.files.default.profile=default
dataverse.files.default.path-style-access=false
dataverse.files.default.payload-signing=false
dataverse.files.default.chunked-encoding=false

As soon as I want to load a text file via HTTP upload, there is an error on the part of Dataverse, but the file is successfully uploaded to the bucket:

  javax.faces.el.EvaluationException: javax.ejb.EJBException: Unable to complete transfer: Index 119 out of bounds for length 103
	at com.sun.faces.application.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:76)
	at com.sun.faces.application.ActionListenerImpl.getNavigationOutcome(ActionListenerImpl.java:82)
	at com.sun.faces.application.ActionListenerImpl.processAction(ActionListenerImpl.java:71)
	at javax.faces.component.UICommand.broadcast(UICommand.java:222)
	at javax.faces.component.UIViewRoot.broadcastEvents(UIViewRoot.java:847)
	at javax.faces.component.UIViewRoot.processApplication(UIViewRoot.java:1396)
	at com.sun.faces.lifecycle.InvokeApplicationPhase.execute(InvokeApplicationPhase.java:58)
	at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:76)
	at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:177)
	at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:707)
	at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
	at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1637)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.glassfish.tyrus.servlet.TyrusServletFilter.doFilter(TyrusServletFilter.java:282)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
	at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
	at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
	at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
	at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:372)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:239)
	at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
	at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
	at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
	at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
	at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:201)
	at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
	at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
	at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
	at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: javax.ejb.EJBException: Unable to complete transfer: Index 119 out of bounds for length 103
	at com.sun.ejb.containers.EJBContainerTransactionManager.processSystemException(EJBContainerTransactionManager.java:723)
	at com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:652)
	at com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:482)
	at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4601)
	at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2134)
	at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2104)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:220)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
	at com.sun.proxy.$Proxy408.saveAndAddFilesToDataset(Unknown Source)
	at edu.harvard.iq.dataverse.ingest.__EJB31_Generated__IngestServiceBean__Intf____Bean__.saveAndAddFilesToDataset(Unknown Source)
	at edu.harvard.iq.dataverse.EditDatafilesPage.save(EditDatafilesPage.java:1063)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.sun.el.util.ReflectionUtil.invokeMethod(ReflectionUtil.java:163)
	at com.sun.el.parser.AstValue.invoke(AstValue.java:261)
	at com.sun.el.MethodExpressionImpl.invoke(MethodExpressionImpl.java:237)
	at org.jboss.weld.module.web.util.el.ForwardingMethodExpression.invoke(ForwardingMethodExpression.java:40)
	at org.jboss.weld.module.web.el.WeldMethodExpression.invoke(WeldMethodExpression.java:50)
	at com.sun.faces.facelets.el.TagMethodExpression.invoke(TagMethodExpression.java:65)
	at com.sun.faces.application.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:66)
	... 46 more
Caused by: com.amazonaws.AmazonClientException: Unable to complete transfer: Index 119 out of bounds for length 103
	at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.unwrapExecutionException(AbstractTransfer.java:286)
	at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.rethrowExecutionException(AbstractTransfer.java:265)
	at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.waitForCompletion(AbstractTransfer.java:103)
	at edu.harvard.iq.dataverse.dataaccess.S3AccessIO.savePath(S3AccessIO.java:329)
	at edu.harvard.iq.dataverse.ingest.IngestServiceBean.saveAndAddFilesToDataset(IngestServiceBean.java:227)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588)
	at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408)
	at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835)
	at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:665)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
	at jdk.internal.reflect.GeneratedMethodAccessor284.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615)
	at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72)
	at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
	at jdk.internal.reflect.GeneratedMethodAccessor281.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
	at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375)
	at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807)
	at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795)
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
	... 61 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 119 out of bounds for length 103
	at com.amazonaws.util.Base16Codec.pos(Base16Codec.java:96)
	at com.amazonaws.util.Base16Codec.decode(Base16Codec.java:87)
	at com.amazonaws.util.Base16Lower.decode(Base16Lower.java:53)
	at com.amazonaws.util.BinaryUtils.fromHex(BinaryUtils.java:48)
	at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1878)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1821)
	at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:169)
	at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:149)
	at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:115)
	at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:45)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	... 1 more

Which version of Dataverse are you using?
Dataverse v5.12.1

@qqmyers
Copy link
Member

qqmyers commented Jan 13, 2023

I'm not sure but this may be a ~known issue with Dell's S3 implementation. Working with one of the community members I think we discovered that Dell doesn't correctly handle tags in signed URLs (treating our two tags as one or something like that). I'll try to bring this issue to their attention to get some details and any status on the Dell side. I think we did discuss a possilbe workaround/optional change we could make to support the way Dell currently works but no work has been done afaik.

@nightowlaz
Copy link

nightowlaz commented Jan 13, 2023

Hi there .. we tried for literally a year to use OneFS S3 and have still not been able to successfully transfer files to it using direct upload. We worked with a Dell Solutions Specialist and currently have a bug-fix request outstanding waiting for Dell to resolve the current bug that @qqmyers described above. We started on v9.1.0.5 and our first issue was related to bug on the Isilon side where it did not send back the full certificate chain, and then there was another bug where it did not send the correct certificate (the specific cert for our server) but just used the default Isilon cert.

Dell sent us simulators to install in a lab environment, and we were able to test with both v 9.1 and v9.2 and see that the bug was present in v9.1 and fixed in v9.2. So, our support team upgraded the Isilon to v9.2.1.0 and we were able to upload files via dataverse successfully using the UI (not direct upload). We could literally see the wrong cert being sent in v9.1 and the correct one sent in v9.2.

Next, we realized that when the bucket was configured with direct upload, files would not upload again. After a LOT of testing and making API calls (using the Direct DataFile Upload/Replace API) to examine the errors, with Jim's expert help we identified that the API call to PUT the file to the bucket was not working because Dell was not decoding "%3B" to ";" and the call wouldn't work unless I manually changed the “%3B” to “;” and send the call again .. then the files would upload. The Dell specialist found a related bug bug where the Dell server doesn't percent decode X-Amz-SignedHeaders correctly, which seems to be the related bug. They submitted a bug-fix to Dell but as far as I know they have not yet fixed it.

We are upgraded on our OneFS server now to v9.3.0.6 and the problem still exists. Dell recommended that we ask the Dataverse dev team to add code to change the %3B in X-Amz-SignedHeaders with ";" or stop adding x-amz-tagging .. I put an issue in and they were looking into some possible workarounds, but we actually kind of gave up on the Isilon storage and went back to AWS S3.

That's our experience .. it doesn't sound like the bugs we were seeing generated the exact same error messages that you are seeing, and you are able to actually copy a file, so I'm not sure if this even matches. We never tried uploading on v9.0 but know that there may be issues if you upgrade. We would be very interested to hear if you are able to successfully utilize the Dell storage! =)

@golsch
Copy link
Author

golsch commented Jan 17, 2023

We are currently using OneFS 9.1.0.12. Unfortunately, I am dependent on a third party vendor and do not have the ability to update. But for the future I have mentioned it.

@nightowlaz I can confirm your experience with direct upload, decoding in header solves the problem.

Problematic for me is still the checksum, which is not generated correctly for a file. If I upload a file directly via the api I get the correct checksum. About the provided link I get a value that I can not explain. I get a sum with consecutive zeros with 32 characters. The checksum is not static and seems to depend on the key/object. Has anyone had any experience with this?

@qqmyers
Copy link
Member

qqmyers commented Jan 17, 2023

For direct upload, the checksum is calculated by the sender. In the UI (and dvwebloader), that is JavaScript in the browser. For DVUploader, it is in Java. If something is happening in the UI, the browser console may have more info about any error.

In both cases, the checksum is sent to Dataverse and the S3 bucket does not have a checksum per se (i.e. fixity is calculated per part in multi-part uploads, etc.)

I haven't tried it, but I'd expect it is possible to create a proxy that could decode the signed URL. It also shouldn't be technically hard to provide a work-around in the Dataverse code.

@golsch
Copy link
Author

golsch commented Jan 17, 2023

@qqmyers By direct upload I meant uploading directly to an S3 bucket via the signed URL... was perhaps a bit imprecisely expressed.

The encoding problem should not be a problem, I think too.

@nightowlaz
Copy link

nightowlaz commented Jan 17, 2023

So, you mean, using the API you got the same encoding issue and were able to fix it by changing the code in the response from the server and sending it? Does the direct upload from the UI work? We assumed that the reason the upload from the UI didn't work either when using direct upload was the same issue, and to fix that it has to be fixed in the core code (I think). Or, were you able to fix it and make it work in the UI as well?

@golsch
Copy link
Author

golsch commented Jan 17, 2023

I think the approaches work independently. Upload via UI does not work and leads to the initial problem. Direct upload into bucket and direct download from bucket via Dataverse API works (except checksum problem), as long as the %3B in the signed URL is decoded (code adjustment on Dataverse side).

@nightowlaz
Copy link

Okay, great, that is the same as our experience. We needed to be able to utilize direct upload via the UI so were never able to actually enable the Isilon storage. We didn't get far enough to experience the checksum issue, so good to know that is a thing, too.

@qqmyers
Copy link
Member

qqmyers commented Jan 17, 2023

@golsch - The only place a checksum is used is in https://guides.dataverse.org/en/latest/developers/s3-direct-upload-api.html#adding-the-uploaded-file-to-the-dataset and in the API you are responsible for creating it. The only thing related to fixity coming from S3 itself are the ETags in a multipart upload. Are those what are incorrect?

@qqmyers
Copy link
Member

qqmyers commented Jan 17, 2023

FWIW: Making the UI work could be handled two ways: change the output of the API call to request the URL(s) or edit the fileupload.js script used in the UI (for the single part call and the multipart Ajax calls) . Both are in the Dataverse codebase but the JavaScript change could be done in the expanded war file on disk as a local fix.

@nightowlaz
Copy link

@qqmyers would you be willing to provide instructions for doing so? Would be good to test to see if that is the only issue, and if fixing that would indeed resolve the issue in the UI.

@qqmyers
Copy link
Member

qqmyers commented Jan 17, 2023

I'd suggest replacing

with url: this.urls.url.replace('%3B',';'), . I think that should work for the single part case. (I haven't tried this to even see if I have a typo.) I think the multipart case can't be handled this way - it's possible it just works as the tag is added in a call made by Dataverse rather than from JavaScript so the escaping could be correct already. In any case, if the above fixes the single part case, it's useful info. If this really is the only thing, and multipart doesn't work, getting a small fix into 5.13 for Dell could probably be done.

@golsch
Copy link
Author

golsch commented Jan 18, 2023

@qqmyers Uploading a file directly via the UI works fine with the fix. I'm not quite clear how to trigger multipart via the UI.

Edit: Multipart via UI seems to work as well, the file is uploaded in parts and can be downloaded. My default part-min-size is 5 MB and not one gigabyte as described in the documentation.

Edit: The checksum problem has magically disappeared.

Edit: I'm not that familiar with S3, but why is the x-amz-tagging header not needed with Mulitpart?

@qqmyers
Copy link
Member

qqmyers commented Jan 18, 2023

Multipart is automatic when the file size is over the part size configured for the store. So any file over 5 MB in your case. (The browser console should let you confirm this.)

The x-amz-tagging is used for multipart, but it is done by the Dataverse Java software prior to sending the browser the signed URLs to upload the parts/ for multipart, the Dataverse server makes the calls to start and end the multipart upload and the browser only sends the parts. For Dell, this means it could be that multipart would have a problem with the tag that can't be fixed in the JavaScript, but it is also possible that the Java code is not sending %3B in the first place.

@golsch
Copy link
Author

golsch commented Jan 19, 2023

Small addendum to the upload to S3 via Dataverse.
After 3.2 in https://www.delltechnologies.com/asset/en-us/products/storage/industry-market/h18292-dell-emc-powerscale-onefs-s3-overview.pdf the ETags are not calculated by default in OneFS 9.0 and OneFS 9.1 as it is the case in the actual implementation. Here the value must be sent with the upload.
The S3 client validates the etag by default and assumes an md5 hex, which then leads to problems. According to aws/aws-sdk-java#1371 as a workaround the validation can be turned off. I will test it.

Edit: It works! Maybe there is a possibility to include the sum in Dataverse when uploading.

@pdurbin
Copy link
Member

pdurbin commented Oct 8, 2023

Edit: It works!

Great! Closing.

Maybe there is a possibility to include the sum in Dataverse when uploading.

You are very welcome to open a fresh issue for this.

@pdurbin pdurbin closed this as completed Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants