Content Comparison

...

As well as storing human readable names/descriptions of what is in the dataset it is the hanging off point for Attachments (SupportingDocument), validation logic, extractable columns, ways of filtering the data, aggregations to help understand the dataset etc.

...

CatalogueItems can be tied to the underlying database via ExtractionInformation . This means that you can have multiple extraction transforms from the same underlying ColumnInfo e.g. PatientDateOfBirth / PatientYearOfBirth (each with different governance categories).

...

Cohort identification is achieved by identifying Sets of patients and performing set operations on them e.g. you might identify "all patients who have been prescribed Diazepam" and then EXCEPT "patients who have been prescribed Diazepam before 2000". This is gives you DISTINCT patients who were FIRST prescribed Diazepam AFTER 2000.

A CohortAggregateContainer is a collection of sets (and subcontainers) which are all combined with the given SetOperation (UNION,INTERSECT or EXCEPT)

...

CohortIdentificationConfiguration

...

Records the last known state of a column in an SQL table (see TableInfo).

A ColumnInfo can belong to an anonymisation group (see [ANOTable]) e.g. ANOGPCode, in this case it will be aware not only of it's name and datatype in LIVE but also it's unanonymised name/datatype during data loading.

...

Records where to store linkage cohorts (see ExtractableCohort).

Since every agency handles cohort management differently RDMP is built to support diverse cohort DBMS table schemas. There are no fixed datatypes / columns for cohort databases.

...

ExternalDatabaseServer are not required to reference datasets you want to link/extract, these should be reference by TableInfo / Catalogue instead.

Servers can have usernames/passwords or use integrated security (windows account). Password are encrypted in the same fashion as in DataAccessCredentials.

...

ExtractableCohort

Records the location and ID of a cohort in an ExternalCohortTable database.

This allows RDMP to record which cohorts are part of which ExtractionConfiguration in a Project without having to move the identifiers into the RDMP application database.

Each ExtractableCohort has an OriginID, this field represents the id of the cohort in the CohortDefinition table of the ExternalCohortTable. Effectively this number is the id of the cohort in your cohort database while the ID property of the ExtractableCohort (as opposed to OriginID) is the RDMP ID assigned to the cohort. This allows you to have two different cohort sources both of which have a cohort id 10 but the RDMP software is able to tell the difference. In addition it allows for the unfortunate situation in which you delete a cohort in your cohort database and leave the ExtractableCohort orphaned - under such circumstances you will at least still have your RDMP configuration and know the location of the original cohort even if it doesn't exist anymore.

...

Represents a collection of datasets (see Catalogue), ExtractableColumns, ExtractionFilters etc and a single ExtractableCohort for a data extraction Project. You can have multiple active ExtractionConfigurations at a time for example a Project might have two cohorts 'Cases' and 'Controls' and you would have two ExtractionConfiguration possibly containing the same datasets and filters but with different cohorts.

Once you have executed, extracted and released an ExtractionConfiguration it becomes 'frozen' (IsReleased) and it is not possible to edit it. This is intended to ensure that once data has gone out the door the configuration that generated the data is immutable. If you need to perform a repeat extraction (e.g. an update of data 5 years on) then you should 'Clone' the ExtractionConfiguration in the Project and give it a new name e.g. 'Cases - 5 year update'.

...

Describes in a single line of SELECT SQL. This can be either the fully qualified name or a transform upon an underlying ColumnInfo. Adding an ExtractionInformation to a CatalogueItem makes it extractable in a linkage Project.

Every ExtractionInformation has an ExtractionCategory which lets you flag the sensitivity of the data being extracted e.g. SpecialApprovalRequired. One (or more) ExtractionInformation in a Catalogue can be flagged as IsExtractionIdentifier. This is the column(s) which will be joined against cohorts in data extraction linkages.

...

When joining between datasets on different DBMS IsExtractionIdentifier columns are compatible as long as the distinct datatypes are semantically similar (e.g. Oracle varchar2(10) could be INTERSECTED with Sql Server varchar(10) - providing a query cache was used)

...

JoinInfo

Records how to join two TableInfo together.

A JoinInfo can include multiple columns. Each JoinInfo has a direction (e.g. LEFT / RIGHT) and optional collation (for resolving collation conflicts during joins).

...

A LoadMetadata contains at least one ProcessTask which is an ETL step e.g. Unzip files called *.zip / Dowload all files from FTP server X.

...

Each Pipeline is composed of a sequence of PipelineComponents which can each perform specific jobs e.g. 'clean strings', 'substitute column X for column Y by mapping values off of remote server B'.

...

Blueprint for a specific task that can be run in a Pipeline. A component has one of the following roles:

...

The ProjectNumber must match the project number of the ExtractableCohort in your ExternalCohortTable.

...

ProcessTask

Describes a specific operation carried out during a LoadMetadata execution (DLE run). This could be 'unzip all files called *.zip in for loading' or 'after loading the data to live, call sp_clean_table1' or 'Connect to webservice X and download 1,000,000 records which will be serialized into XML'

...

This can be used as an alternative to definining Lookups or to extract other useful administrative data etc to be provided to researchers

...

Describes an sql table (or table valued function) on a given DBMS Server from which you intend to either extract and/or load / curate data. A TableInfo represents a cached state of the live database table schema. You can synchronize a TableInfo at any time to handle schema changes (e.g. dropping columns).

...

Version	Old Version 22	New Version 23
Changes made by	Magalie Guignard-Duff	Magalie Guignard-Duff
Saved on	Mar 31, 2020	Mar 31, 2020

Versions Compared

Key

CohortIdentificationConfiguration

ExtractableCohort

JoinInfo

ProcessTask