Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] R SOMACollection printer doesn't get context #2899

Open
ebezzi opened this issue Aug 15, 2024 · 3 comments
Open

[Bug] R SOMACollection printer doesn't get context #2899

ebezzi opened this issue Aug 15, 2024 · 3 comments
Assignees
Labels
bug Something isn't working priority r-api

Comments

@ebezzi
Copy link
Member

ebezzi commented Aug 15, 2024

Describe the bug
When inspecting a SOMACollection in the Census (using R), an AWS S3 error is raised, which usually happens when the context doesn't have region information.

To Reproduce

> library("cellxgene.census")
> census <- open_soma()
The stable Census release is currently 2024-07-01. Specify census_version = "2024-07-01" in future calls to open_soma() to ensure data consistency.
> census
<SOMACollection>
  uri: s3://cellxgene-census-public-us-west-2/cell-census/2024-07-01/soma/
Error: S3: Error while listing with prefix 's3://cellxgene-census-public-us-west-2/cell-census/2024-07-01/soma/__schema/' and delimiter '/'[Error Type: 100] [HTTP Response Code: 301] [Exception: PermanentRedirect] [Remote IP: ...] [Request ID: ...] [Headers: 'content-type' = 'application/xml' 'date' = 'Thu, 15 Aug 2024 17:25:56 GMT' 'server' = 'AmazonS3' 'transfer-encoding' = 'chunked' 'x-amz-bucket-region' = 'us-west-2' 'x-amz-id-2' = ... 'x-amz-request-id' = ...] : Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

Also note:

> census$tiledbsoma_ctx
<SOMATileDBContext>
...   vfs.s3.region: us-west-2

Versions (please complete the following information):

  • TileDB-SOMA version: 1.13.0
  • Language and language version: R 4.4
  • OS (e.g. MacOS, Ubuntu Linux): MacOS
@johnkerl johnkerl self-assigned this Aug 15, 2024
@johnkerl
Copy link
Member

A repro script:

$ unset AWS_ACCESS_KEY_ID
$ unset AWS_SECRET_ACCESS_KEY
$ unset AWS_DEFAULT_REGION
Rscript cen.r

where cen.r is

#!/usr/bin/env Rscript

library(tiledbsoma)

exp_uri  <- "s3://cellxgene-data-public/cell-census/2024-08-12/soma/census_data/homo_sapiens/"

ctx <- tiledbsoma::SOMATileDBContext$new(c(
  vfs.s3.region="us-west-2",
  vfs.s3.aws_access_key_id="redacted",
  vfs.s3.aws_secret_access_key="redacted"
))

exp <- SOMAExperimentOpen(exp_uri, tiledbsoma_ctx=ctx)
print(exp)

Output:

<SOMAExperiment>
  uri: s3://cellxgene-data-public/cell-census/2024-08-12/soma/census_data/homo_sapiens/
Error: S3: Error while listing with prefix 's3://cellxgene-data-public/cell-census/2024-08-12/soma/census_data/homo_sapiens/__schema/' and delimiter '/'[Error Type: 100] [HTTP Response Code: 301] [Exception: PermanentRedirect] [Remote IP: 52.216.33.234] [Request ID: JXV207CJX6Z3WGQ4] [Headers: 'content-type' = 'application/xml' 'date' = 'Thu, 15 Aug 2024 18:47:49 GMT' 'server' = 'AmazonS3' 'transfer-encoding' = 'chunked' 'x-amz-bucket-region' = 'us-west-2' 'x-amz-id-2' = 'vokf+lXJDe7zd2Y8cLHpnK/e9acppnv8eSDRUH0pkp9mfssmY7FdqMVaU1jWAY89SxXfdEhxIdE=' 'x-amz-request-id' = 'JXV207CJX6Z3WGQ4'] : Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
Execution halted

Analysis diff:

diff --git a/apis/r/R/Factory.R b/apis/r/R/Factory.R
index 92706f8e..064fbcef 100644
--- a/apis/r/R/Factory.R
+++ b/apis/r/R/Factory.R
@@ -390,6 +390,7 @@ SOMAExperimentOpen <- function(
   tiledbsoma_ctx = NULL,
   tiledb_timestamp = NULL
 ) {
+  cat("001 SOMAExperimentOpen", tiledbsoma_ctx$get('vfs.s3.region'), "\n")
   exp <- SOMAExperiment$new(
     uri,
     platform_config,
diff --git a/apis/r/R/SOMACollectionBase.R b/apis/r/R/SOMACollectionBase.R
index fdc510e6..263c015a 100644
--- a/apis/r/R/SOMACollectionBase.R
+++ b/apis/r/R/SOMACollectionBase.R
@@ -21,6 +21,7 @@ SOMACollectionBase <- R6::R6Class(
     #' as `new()` is considered internal and should not be called directly.
     initialize = function(uri, platform_config = NULL, tiledbsoma_ctx = NULL, tiledb_timestamp = NULL,
                           internal_use_only = NULL) {
+      cat("002 SOMACollectionBase initialize", tiledbsoma_ctx$get('vfs.s3.region'), "\n")
       super$initialize(uri=uri, platform_config=platform_config,
                        tiledbsoma_ctx=tiledbsoma_ctx, tiledb_timestamp = tiledb_timestamp,
                        internal_use_only=internal_use_only)
diff --git a/apis/r/R/TileDBObject.R b/apis/r/R/TileDBObject.R
index 9f94c763..89ed337e 100644
--- a/apis/r/R/TileDBObject.R
+++ b/apis/r/R/TileDBObject.R
@@ -16,6 +16,7 @@ TileDBObject <- R6::R6Class(
     #' as `new()` is considered internal and should not be called directly.
     initialize = function(uri, platform_config = NULL, tiledbsoma_ctx = NULL,
                           tiledb_timestamp = NULL, internal_use_only = NULL) {
+      cat("003 TileDBObject initialize", tiledbsoma_ctx$get('vfs.s3.region'), "\n")
       if (is.null(internal_use_only) || internal_use_only != "allowed_use") {
         stop(paste("Use of the new() method is for internal use only. Consider using a",
                    "factory method as e.g. 'SOMADataFrameOpen()'."), call. = FALSE)
@@ -111,7 +112,12 @@ TileDBObject <- R6::R6Class(
       } else {
         stop("Unknown object type", call. = FALSE)
       }
-      get_tiledb_object_type(self$uri, ctx = soma_context()) %in% expected_type
+      cat("004 TileDBObject exists", private$.tiledbsoma_ctx$get('vfs.s3.region'), "\n")
+      cat("005 TileDBObject exists", typeof(private$.tiledbsoma_ctx), "\n")
+      cat("006 TileDBObject exists", typeof(soma_context()), "\n")
+      #cat("007 TileDBObject exists", soma_context()$get('vfs.s3.region'), "\n")
+      #get_tiledb_object_type(self$uri, ctx = soma_context()) %in% expected_type
+      get_tiledb_object_type(self$uri, ctx = private$.tiledbsoma_ctx) %in% expected_type
     }
   ),

Output:

> source("cen1.r")
001 SOMAExperimentOpen us-west-2
002 SOMACollectionBase initialize us-west-2
003 TileDBObject initialize us-west-2
<SOMAExperiment>
  uri: s3://cellxgene-data-public/cell-census/2024-08-12/soma/census_data/homo_sapiens/
004 TileDBObject exists us-west-2
005 TileDBObject exists environment
006 TileDBObject exists externalptr

The construction-time context is saved here:
https://github.com/single-cell-data/TileDB-SOMA/blob/1.13.0/apis/r/R/TileDBObject.R#L38
but when we go to print the experiment, there is a fresh context:

https://github.com/single-cell-data/TileDB-SOMA/blob/1.13.0/apis/r/R/TileDBObject.R#L38

and the naïve developer (myself) cannot simply replace

      get_tiledb_object_type(self$uri, ctx = soma_context()) %in% expected_type

with

      get_tiledb_object_type(self$uri, ctx = private$.tiledbsoma_ctx) %in% expected_type

since these are not of the same type.

@johnkerl johnkerl assigned eddelbuettel and unassigned johnkerl Aug 15, 2024
@johnkerl johnkerl added bug Something isn't working r-api labels Aug 15, 2024
@johnkerl
Copy link
Member

[sc-53021]

@johnkerl johnkerl changed the title [Bug] R SOMACollection repr doesn't get context [Bug] R SOMACollection printer doesn't get context Sep 5, 2024
@eddelbuettel
Copy link
Member

This is being addressed. We will continue a package-global config object until we have a clean way to derive this from platform_config. But simple demo / proof of existence:

> library(tiledbsoma)     # it is immaterial that it is built against 'dev' core, the key is the tiledb-soma branch
TileDB-SOMA R package 1.13.99.8 with TileDB Embedded 2.27.0 on Ubuntu 24.04.1 LTS.
See https://github.com/single-cell-data for more information about the SOMA project.
> sctx <- soma_context(config = c(vfs.s3.region = "us-west-2"))   # package-global core config object, cached
> library("cellxgene.census")
> census <- open_soma()
The stable Census release is currently 2024-07-01. Specify census_version = "2024-07-01" in future calls to open_soma() to ensure data consistency.
> census
<SOMACollection>
  uri: s3://cellxgene-census-public-us-west-2/cell-census/2024-07-01/soma/ 
> 

So the setting of us-west-2 now percolates through as one would expect. "My" default is still us-east-1.

The branch containing the fixes is here, this should be merged relatively quickly. The full integration into the existing configuration scheme is a different task that will follow later.

@johnkerl johnkerl assigned johnkerl and beroy and unassigned eddelbuettel Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority r-api
Projects
None yet
Development

No branches or pull requests

4 participants