Working Environment

Note: All commands in this document should be executed from the root directory

Working Environment

To install all required packages for running the files, execute the following command:

pip install -r requirements.txt

Datasets

Training datasets

To obtain the training datasets, execute: datamodule.py Training datasets can be visualized in the folder: Datasets/Training_Datasets

Deididentified datasets

To obtain the deidentified datasets, execute: deid.py Deididentified datasets can be visualized in the folder: Datasets/Deid_Datasets METHODS : IDF, IDF-table-aware, Lexical, Ner

Training Reidentification Models

First Approach (finetuning.py) This method is based on the paper: Unsupervised Text Deidentification John X. Morris Justin T. Chiu Ramin Zabih Alexander M. Rush. However, no satisfactory results were achieved. The attempted model for reidentification in this script is Roberta-Roberta.

Second Approach (model.py) We directly trained our networks using a script similar to model.py. Note: GPUs were used for running this type of script. Sometimes, you may need to adapt the input of your model, as it can take tokens or embeddings.

Results

To obtain the results, run: python result.py Note: The last model is not yet functional in this file.

Appendices

Imports: imports.py Functions: functions.py Paths: locations.py Parameters: parameters.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Working Environment

Datasets

Training datasets

Deididentified datasets

Training Reidentification Models

Results

Appendices

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Datasets		Datasets
Models		Models
Notebooks		Notebooks
Tokenizer		Tokenizer
.gitattributes		.gitattributes
README.md		README.md
best_results.txt		best_results.txt
datamodule.py		datamodule.py
deid.py		deid.py
finetuning.py		finetuning.py
functions.py		functions.py
imports.py		imports.py
locations.py		locations.py
model.py		model.py
parameters.py		parameters.py
presentation.pdf		presentation.pdf
reference.pdf		reference.pdf
requirements.txt		requirements.txt
result.py		result.py

EynardM/unsupervised-deidentification

Folders and files

Latest commit

History

Repository files navigation

Working Environment

Datasets

Training datasets

Deididentified datasets

Training Reidentification Models

Results

Appendices

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages