/
Project Extraction (Data Export)

Project Extraction (Data Export)

Instructions


What is a project extraction


The data export section of RDMP allows you to link a cohort and a dataset(s) to extract only those records in a dataset that belong to patients in your cohort. 

This linkage is done through the user interface by selecting components from the data catalogue (which datasets you want, which columns, which filters etc).  The configuration (which is permenantly stored in the Data Export database for reproducibility) results in an SQL query which will fetch an anonymise data table(s).

Additionally, the RDMP will make a ‘researcher copy’ of supporting documents and lookup tables as well as extracting the descriptive metadata that corresponds to the extracted columns.

Finally the Data Export Manager supports longitudial versioning of extracts to ensure that changes in column/dataset definitions are rationalised when performing an extract over an extended period of time.

Extraction Configuration Model


Data Export Manager stores all it’s configuration data for projects in a relational Data Export database.  This includes entities such as Projects, Extraction Configurations, Extraction/Release Logs etc.  Performing a data extraction starts by identifying a cohort (See Cohort Manager) and importing it into a cohort database.  From the perspective of the Data Export Manager tool, a cohort is fundamentally a list of private identifiers of patients which match those stored in your repository.  In addition to a unique patient identifier, each patient should have a globally unique and project specific extraction identifier.  This project identifier (aka release identifier) will be used as a substitute for the private identifier when the data is extracted and given to a researcher.

Diagram showing data extraction process through Data Export Manager

As part of the extraction process, the Data Export Manager will collect artifacts for the dataset(s) being extracted such as supporting documents, lookup tables etc.  It will also run each extractable aggregate graph against the cohort so the user can see how his dataset extraction compares to the main dataset.  These live vs extract graphs rapidly allow the data analyst and researcher (who will receive the extract) identify any mistakes in the cohort or extraction filter configuration (e.g. study window is wrong, result set accidentally includes Tayside and Fife records when only Tayside records have governance for the project).

Related content

Project specific catalogue
Project specific catalogue
More like this
Multiple Extraction Identifiers
Multiple Extraction Identifiers
More like this
Welcome to RDMP
Welcome to RDMP
More like this
Catalogue
Catalogue
More like this
Breaking RDMP into its component parts
Breaking RDMP into its component parts
More like this
1- Extract and Link Catalogues
1- Extract and Link Catalogues
More like this