Tooling to determine original dcm file from anonymised path #1280

rkm · 2022-08-29T15:01:19Z

When investigating issues with an anonymised file in an extraction, it is often useful to review the original file for comparison. This is currently difficult to do as there is no direct link from the anonymised file back to the source file.

A tool, or a new application in the smi binary, could achieve this by looking-up the original path:

Either in the CohortPackager database for the extraction, or
in the metadata database

The text was updated successfully, but these errors were encountered:

tznind · 2022-08-29T16:24:24Z

I think the metadata database would be most powerful. That way it could support identifiable UID or anonymous UID and it wouldn't have to rely on an image having been extracted to be able to look it up.

That would enable answering other use cases like 'for this image in the SR NLP db / mongodb, is it in relational too? or not'

tznind · 2022-08-29T16:24:43Z

Nothing stopping it drawing info from both though.

howff · 2023-03-15T08:42:42Z

At the moment I've just got a big text file of filenames which I grep ;-)

Another method might be to see if MongoDB can give you a list of keys in the index (by quickly reading the index rather than slowly reading the database), which you could then grep. If it only stores hashes then this won't work.

Another method might be to see if MongoDB can create a computed index, you could create a new index called FileName being computed from Basename(dicomFilePath). Postgres has support for computed indexes, maybe MongoDB does too. Then you could replace the -an.dcm in the anonymised filename and look up the result in the computed index.

Unless I've completely misunderstood what you mean by "metadata database", were you referring to one of the mysql or sql-server databases?

howff · 2023-08-15T06:42:52Z

Unless I'm mistaken the anonymised path ends with the SOPinstanceUID plus -an.dcm so adding a MongoDB index on SOPinstanceUID would help immensely. Could also add study and series ids?

rkm added the enhancement New feature or request label Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tooling to determine original dcm file from anonymised path #1280

Tooling to determine original dcm file from anonymised path #1280

rkm commented Aug 29, 2022

tznind commented Aug 29, 2022

tznind commented Aug 29, 2022

howff commented Mar 15, 2023

howff commented Aug 15, 2023

Tooling to determine original dcm file from anonymised path #1280

Tooling to determine original dcm file from anonymised path #1280

Comments

rkm commented Aug 29, 2022

tznind commented Aug 29, 2022

tznind commented Aug 29, 2022

howff commented Mar 15, 2023

howff commented Aug 15, 2023