Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update schema for table used by mapping1-manual notebook #21

Closed
ianfore opened this issue Jan 24, 2021 · 6 comments · Fixed by #23
Closed

Update schema for table used by mapping1-manual notebook #21

ianfore opened this issue Jan 24, 2021 · 6 comments · Fixed by #23
Labels
FASPHackathon2 January 2021 FASP Hackathon search Related to GA4GH Search

Comments

@ianfore
Copy link
Collaborator

ianfore commented Jan 24, 2021

In order to illustrate the effective use of Search schema, please update schema on the public DNAStack server for search_cloud.cshcodeathon.organoid_profiling_pc_subject_phenotypes_gru
to one derived from the dbGap XML data dictionary.

See fast/data/dbgap for the data_dict.xml to use.

The notebook that uses that table is https://github.com/ga4gh/fasp-scripts/blob/master/notebooks/search/mapping1-manual.ipynb

@ianfore ianfore added FASPHackathon2 January 2021 FASP Hackathon search Related to GA4GH Search labels Jan 24, 2021
@jfuerth
Copy link

jfuerth commented Jan 26, 2021

I will work on this tomorrow.

@jfuerth
Copy link

jfuerth commented Jan 26, 2021

Sorry, I was swamped today. Bumping to first thing tomorrow.

@jfuerth
Copy link

jfuerth commented Jan 27, 2021

Here is the data dictionary in JSON Schema format. Next I will try to insert it into the Search implementation so it is returned in API responses for the correct table.

{
  "$id": "phs001611.v1.pht009160.v1.Organoid_Profiling_PC_Subject_Phenotypes",
  "$schema": "http://json-schema.org/draft-07/schema",
  "description": null,
  "properties": {
    "age": {
      "$comment": "UNIT 'Years'",
      "description": "Subject's age",
      "maximum": 92.0,
      "minimum": 24.0,
      "oneOf": [
        {
          "const": "N/A",
          "title": "Not vailable"
        }
      ],
      "type": "integer, encoded value"
    },
    "race": {
      "description": "Race of participant",
      "oneOf": [
        {
          "const": "AA",
          "title": "African American"
        },
        {
          "const": "A",
          "title": "Asian"
        },
        {
          "const": "W",
          "title": "White, Caucasian"
        },
        {
          "const": "H",
          "title": "Hispanic"
        },
        {
          "const": "N/A",
          "title": "Not vailable"
        }
      ],
      "type": "string"
    },
    "sex": {
      "description": "Sex of participant",
      "oneOf": [
        {
          "const": "F",
          "title": "Female"
        },
        {
          "const": "N/A",
          "title": "Not Applicable"
        },
        {
          "const": "M",
          "title": "Male"
        }
      ],
      "type": "string"
    },
    "subject_id": {
      "description": "De-identified Subject ID",
      "type": "string"
    }
  },
  "type": "object"
}

@ianfore
Copy link
Collaborator Author

ianfore commented Jan 27, 2021

Thanks. Look forward to seeing it in the Search implementation.

In this case there's a typo that occurs twice, "Not vailable" but that tracks back to the data_dict.xml. I don't think we should try and fix that in the transform to XML Schema though. The first intent is to show the description as provided by the investigator and curated under the current mechanisms. That's the intent represented in the notebook referred to above.

There's a need to address variations in type with data_dicts. I have some code which addresses that. Those typos could perhaps be handled by the same route. Started a separate issue on that #22

@jfuerth
Copy link

jfuerth commented Jan 27, 2021

The above schema (mechanically derived from the dbGaP XML data dictionary) is now returned from https://ga4gh-search-adapter-presto-public.prod.dnastack.com/table/search_cloud.cshcodeathon.organoid_profiling_pc_subject_phenotypes_gru/info. Let me know if this is what you were hoping for!

@ianfore
Copy link
Collaborator Author

ianfore commented Jan 27, 2021

That works. Thanks. Updated the notebook to remove the workaround. Have also added capability to the python client to generate a template in which a mapping could be created.

@ianfore ianfore linked a pull request Jan 27, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FASPHackathon2 January 2021 FASP Hackathon search Related to GA4GH Search
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants