Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete implementation #15

Merged
merged 21 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 104 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,104 @@
# PDataViewer
# PDataViewer

PDataViewer is a web application that lets you explore the PD data landscape and identify cohort datasets that suit your research needs.

- [PDataViewer](#pdataviewer)
- [Introduction](#introduction)
- [Requirements](#requirements)
- [Installation](#installation)
- [Clone the Repository](#clone-the-repository)
- [Install the Backend Requirements](#install-the-backend-requirements)
- [Install the Frontend Requirements](#install-the-frontend-requirements)
- [Usage](#usage)
- [Starting the Backend Locally](#starting-the-backend-locally)
- [Run the Backend via Docker](#run-the-backend-via-docker)
- [Starting the Frontend Locally](#starting-the-frontend-locally)
- [Run the Frontend via Docker](#run-the-frontend-via-docker)


## Introduction
Data collected in cohort studies lay the groundwork for a plethora of Parkinson’s disease (PD) research endeavors. PDataViewer lets you explore this PD data landscape and identify cohort datasets that suit your research needs. We accessed and curated major PD cohort datasets in a purely data-driven manner with the aim of:

1) characterizing their underlying data
2) assessing the quantity and availability of data
3) evaluating the interoperability across these distinct cohort datasets.

## Requirements
- Python >= 3.10
- [Angular = 17.1.0](https://angular.io/guide/setup-local)
- [Node.js (LTS) >= 18.13](https://nodejs.org/en/download/package-manager)
- TypeScript >= 5.2.0 < 5.4.0

## Installation
### Clone the Repository

```bash
git clone https://github.com/SCAI-BIO/PDataViewer
cd PDataViewer
```

### Install the Backend Requirements

```bash
cd backend
pip install -r requirements.txt
```

### Install the Frontend Requirements

```bash
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
nvm install 20
npm install -g @angular/cli
```

## Usage

### Starting the Backend Locally
You can access the backend functionalities by accessing the provided REST API. <br>
Run the Backend API on port 5000:

```python
cd backend
uvicorn api.routes:app --reload -port 5000
```

### Run the Backend via Docker
The API can also be run via docker. <br>
You can either build the docker container locally or download the latest build from the PDataViewer GitHub package registry.

```bash
docker build . -t ghcr.io/scai-bio/pdataviewer/backend:latest
docker pull ghcr.io/scai-bio/pdataviewer/backend:latest
```

After build/download you will be able to start the container and access the PDataViewer API per default on [localhost:8000](http://localhost:8000/):

```bash
docker run -p 8000:80 ghcr.io/pdataviewer/scai-bio/backend:latest
```

### Starting the Frontend Locally
You can deploy a local version of the web application via Angular <br>
You can access the web application on [localhost:4200](http://localhost:4200):

```bash
cd frontend
npm install
ng serve
```

### Run the Frontend via Docker
You can deploy a local version of the web application via docker. <br>
You can either build the docker container locally or download the latest build from the PDataViewer GitHub package registry.

```bash
docker build . -t ghcr.io/scai-bio/pdataviewer/frontend:latest
docker pull ghcr.io/scai-bio/pdataviewer/frontend:latest
```

After build/download you will be able to start the container and access the PDataViewer web application per default on [localhost:8080](http://localhost:8080/):

```bash
docker run -p 8080:80 ghcr.io/pdataviewer/scai-bio/frontend:latest
mehmetcanay marked this conversation as resolved.
Show resolved Hide resolved
```
42 changes: 39 additions & 3 deletions backend/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,49 @@
# Backend

- [Backend](#backend)
- [Tutorial](#tutorial)
- [Requirements](#requirements)
- [Installation](#installation)
- [Usage](#usage)
- [Starting the Backend Locally](#starting-the-backend-locally)
- [Run the Backend via Docker](#run-the-backend-via-docker)


## Tutorial

https://fastapi.tiangolo.com/tutorial/

## Requirements

- Python >= 3.10

## Installation

```bash
pip install fastapi
```python
pip install -r requirements.txt
```

## Usage

### Starting the Backend Locally
You can access the backend functionalities by accessing the provided REST API. <br>
Run the Backend API on port 5000:

```python
uvicorn app.routes:app --reload
uvicorn api.routes:app --reload -port 5000
tiadams marked this conversation as resolved.
Show resolved Hide resolved
```

### Run the Backend via Docker
The API can also be run via docker. <br>
You can either build the docker container locally or download the latest build from the PDataViewer GitHub package registry.

```bash
docker build . -t ghcr.io/scai-bio/pdataviewer/backend:latest
docker pull ghcr.io/scai-bio/pdataviewer/backend:latest
```

After build/download you will be able to start the container and access the PDataViewer API per default on [localhost:8000](http://localhost:8000/):

```bash
docker run -p 8000:80 ghcr.io/pdataviewer/scai-bio/backend:latest
mehmetcanay marked this conversation as resolved.
Show resolved Hide resolved
```
46 changes: 44 additions & 2 deletions backend/api/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@

import pandas as pd

from fastapi import FastAPI
from thefuzz import process, fuzz

from fastapi import FastAPI, HTTPException
from fastapi.responses import RedirectResponse

from starlette.middleware.cors import CORSMiddleware
Expand Down Expand Up @@ -60,47 +62,87 @@ def swagger_redirect():

@app.get("/version", tags=["info"])
def get_current_version():
"""
mehmetcanay marked this conversation as resolved.
Show resolved Hide resolved
Get the version of the API.
"""
return app.version


@app.get("/cdm", tags=["info"])
def get_cdm():
"""
mehmetcanay marked this conversation as resolved.
Show resolved Hide resolved
Get PASSIONATE CDM.
"""
cdm = merge_modalities()
return cdm.to_dict()


@app.get("/cdm/cohorts", tags=["info"])
def get_cohorts():
"""
Get all cohorts available in PASSIONATE.
"""
cdm = merge_modalities()
cdm = clean_extra_columns(cdm)
return {idx: cohort for idx, cohort in enumerate(cdm.columns)}


@app.get("/cdm/features", tags=["info"])
def get_features():
"""
Get all features available in PASSIONATE.
"""
features = merge_modalities(usecols=["Feature"])
return features.to_dict()


@app.get("/cdm/modalities", tags=["info"])
def get_modalities():
"""
Get all modalities available in PASSIONATE.
"""
files = [file.replace(".csv", "") for file in os.listdir("./cdm") if file.endswith(".csv")]
return {idx: file for idx, file in enumerate(files)}


@app.get("/cdm/modalities/{modality}", tags=["search"])
def get_modality(modality: str):
mappings = pd.read_csv(f"{'./cdm'}/{modality}.csv", keep_default_na=False)
"""
Get all features of a modality.
"""
if not os.path.exists(f"./cdm/{modality}.csv"):
raise HTTPException(status_code=404, detail="Modality not found")
mappings = pd.read_csv(f"./cdm/{modality}.csv", keep_default_na=False)
return mappings.to_dict()


@app.post("/visualization/chords/{modality}", tags=["visualization"])
def get_chords(modality: str, cohorts: list[str]):
"""
Generates links between mappings to visualize with chord diagram.
"""
if not os.path.exists(f"./cdm/{modality}.csv"):
raise HTTPException(status_code=404, detail="Modality not found")
chords, decoder = generate_chords(modality, cohorts)
return chords, decoder


@app.post("/studypicker/rank", tags=["studypicker"])
def get_ranked_cohorts(features: list[str]):
"""
Ranks cohorts based on the availability of given features.
"""
ranked_cohorts = rank_cohorts(features)
return ranked_cohorts.to_dict()


@app.get("/autocompletion", tags=["autocompletion"])
def autocomplete(text: str):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested this with the prompt "Dia" for the /autocomplete Endpoint. I get:

[
  "Diagnosis",
  "American Indian/Alaskan Native",
  "Diabetes",
  "MoCA - Digit Span Test (Forward)",
  "MoCA - Delayed Recall (Daisy)",
  "MoCA - Orientation (Date)",
  "MoCA - Orientation (Day)",
  "IDEA - Day of Week",
  "MDS-UPDRS - Daytime Sleepiness",
  "RBDSQ - Sleep Is Disturbed"
]

This should only return ["Diagnosis", "Diabetes"]. Please adapt this and write a test case for this input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the query is "dia" autocomplete should only return terms that start with dia.

And even if American InDIAn/Alaskan Native contains the term, this should definitely be below Diabetes in terms of similarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably not use fuzzy matching in autocompletion at all, maybe just macth it with a string based regex instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Query "Di" we will get:

[
  "Diagnosis",
  "MoCA - Digit Span Test (Forward)",
  "MoCA - Digit Span Test (Backward)",
  "FAQ - Pay Attention, Understand, Discuss",
  "Consortium to Establish a Registry for Alzheimer's Disease",
  "ESS - Sitting and Reading",
  "Modified Schwab & England Activities of Daily Living",
  "MDS-UPDRS - Lightheadedness on Standing",
  "MDS-UPDRS - Rigidity Neck",
  "MDS-UPDRS - Rigidity Right Upper Extremity"
]

Here "diabetes" would not even get suggested to the user

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autocomplete user's query.
"""
features = merge_modalities(usecols=["Feature"])
features = features["Feature"].to_list()
threshold = 50
suggestions = process.extract(text, features, scorer=fuzz.partial_token_set_ratio, limit=10)
return [suggestions[0] for suggestions in suggestions if suggestions[1] >= threshold]
6 changes: 4 additions & 2 deletions backend/preprocessing/studypicker.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,11 @@ def rank_cohorts(features: list[str], folder: str="./cdm") -> pd.DataFrame:
ranked_cohorts.loc[len(ranked_cohorts.index)] = [column, found_features, missing_features]
# Sort values based on the number of successfully found features
ranked_cohorts.sort_values(by="Successfully found", ascending=False, inplace=True)
# Reset the indices, otherwise creates an issue in json
ranked_cohorts.reset_index(drop=True, inplace=True)
# Calculate the percentage of features found
percentage_found = ((ranked_cohorts['Successfully found'] / total_features) * 100).round(2)
percentage_found = ((ranked_cohorts["Successfully found"] / total_features) * 100).round(2)
# Format the "Successfully found" column so that it displays the data in
# "(found_features)/(total_features) (percentage_found)" format
ranked_cohorts['Successfully found'] = ranked_cohorts['Successfully found'].astype(str) + '/' + str(total_features) + ' (' + percentage_found.astype(str) + '%)'
ranked_cohorts["Successfully found"] = ranked_cohorts["Successfully found"].astype(str) + "/" + str(total_features) + " (" + percentage_found.astype(str) + "%)"
return ranked_cohorts
6 changes: 6 additions & 0 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,23 @@ fastapi==0.110.2
h11==0.14.0
httptools==0.6.1
idna==3.7
iniconfig==2.0.0
numpy==1.26.4
packaging==24.0
pandas==2.2.2
pluggy==1.5.0
pydantic==2.7.0
pydantic_core==2.18.1
pytest==8.1.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
PyYAML==6.0.1
rapidfuzz==3.8.1
six==1.16.0
sniffio==1.3.1
starlette==0.37.2
thefuzz==0.22.1
typing_extensions==4.11.0
tzdata==2024.1
uvicorn==0.29.0
Expand Down
38 changes: 36 additions & 2 deletions frontend/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,23 @@
# Frontend

- [Frontend](#frontend)
- [Tutorial](#tutorial)
- [Requirements](#requirements)
- [Installation](#installation)
- [Usage](#usage)
- [Starting the Frontend Locally](#starting-the-frontend-locally)
- [Run the Frontend via Docker](#run-the-frontend-via-docker)


## Tutorial

https://angular.io/tutorial/

## Requirements

- [Node.js >= 18.13](https://nodejs.org/en)
- [Angular = 17.1.0](https://angular.io/guide/setup-local)
- [Node.js (LTS) >= 18.13](https://nodejs.org/en/download/package-manager)
- TypeScript >= 5.2.0 < 5.4.0

## Installation

Expand All @@ -12,6 +27,25 @@ npm install

## Usage

```bash
### Starting the Frontend Locally
You can deploy a local version of the web application via Angular <br>
You can access the web application on [localhost:4200](http://localhost:4200):

``` bash
ng serve
```

### Run the Frontend via Docker
You can deploy a local version of the web application via docker. <br>
You can either build the docker container locally or download the latest build from the PDataViewer GitHub package registry.

``` bash
docker build . -t ghcr.io/scai-bio/pdataviewer/frontend:latest
docker pull ghcr.io/scai-bio/pdataviewer/frontend:latest
```

After build/download you will be able to start the container and access the PDataViewer web application per default on [localhost:8080](http://localhost:8080/):

``` bash
docker run -p 8080:80 ghcr.io/pdataviewer/scai-bio/frontend:latest
mehmetcanay marked this conversation as resolved.
Show resolved Hide resolved
```