Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Anonymisation occurs at ColumnInfo level during data load. Each ANOTable points to a corresponding table on an ANO server in which mappings are persisted. This server should be part of your normal backup strategy.

...

Catalogue

...

The central class for the RDMP, a Catalogue is a virtual dataset e.g. 'Hospital Admissions'.

...

Catalogues are always flat views although they can be built from multiple relational data tables underneath.

...

CatalogueItem

...

A 'virtual' column that is made available to researchers. Each Catalogue has 1 or more CatalogueItems, these store the columns description as well as any outstanding/resolved issues.

CatalogueItems can be tied to the underlying database via ExtractionInformation . This means that you can have multiple extraction transforms from the same underlying ColumnInfo e.g. PatientDateOfBirth / PatientYearOfBirth (each with different governance categories).

...

CohortAggregateContainer

...

Cohort identification is achieved by identifying Sets of patients and performing set operations on them e.g. you might identify "all patients who have been prescribed Diazepam" and then EXCEPT "patients who have been prescribed Diazepam before 2000". This is gives you DISTINCT patients who were FIRST prescribed Diazepam AFTER 2000.

...

CohortIdentificationConfiguration

...

Describes a configuration for identifying patients fitting a given study criteria. E.g. "I want all patients who have been prescribed Diazepam for the first time after 2000 and who are still alive today".

Each new project/cohort to identify he should result in a new CohortIdentificationConfiguration, it is the entry point for cohort generation and includes a high level description of what the cohort requirements are, an optional ticket and the query that should be used to identify patients.

...

ColumnInfo

...

Records the last known state of a column in an SQL table (see TableInfo).

A ColumnInfo can belong to an anonymisation group (see [ANOTable]) e.g. ANOGPCode, in this case it will be aware not only of it's name and datatype in LIVE but also it's unanonymised name/datatype during data loading.

...

ConnectionStringKeyword

Describes a specific key/value pair that should always be used in connection strings to servers of the given DatabaseType by RDMP.

For example you could specify Encrypt = true to force all connections made to go through SSL (requires certificates / certificate validation etc). Be careful when creating these as they apply to all users of the system and can make servers unreachable if invalid or unresolvable connection strings are created.

...

DataAccessCredentials

Stores a username and encrypted password.

...

A feature that is compatible with multiple DBMS is one which works regardless of the database engine hosting the data and may support drawing data from multiple providers/instances at once.

...

ExternalCohortTable

Records where to store linkage cohorts (see ExtractableCohort).

...

You can have multiple ExternalCohortTable sources in your database for example if you need to support different identifier datatypes / formats.

...

...

ExternalDatabaseServer

Records information about a server. This can be an RDMP platform database e.g. a Logging database or it could be a generic database you use to hold data (e.g. lookups).

...

Servers can have usernames/passwords or use integrated security (windows account). Password are encrypted in the same fashion as in DataAccessCredentials.

...

ExtractableCohort

Records the location and ID of a cohort in an ExternalCohortTable database.

...

Each ExtractableCohort has an OriginID, this field represents the id of the cohort in the CohortDefinition table of the ExternalCohortTable. Effectively this number is the id of the cohort in your cohort database while the ID property of the ExtractableCohort (as opposed to OriginID) is the RDMP ID assigned to the cohort. This allows you to have two different cohort sources both of which have a cohort id 10 but the RDMP software is able to tell the difference. In addition it allows for the unfortunate situation in which you delete a cohort in your cohort database and leave the ExtractableCohort orphaned - under such circumstances you will at least still have your RDMP configuration and know the location of the original cohort even if it doesn't exist anymore.

...

ExtractionConfiguration

Represents a collection of datasets (see Catalogue), ExtractableColumns, ExtractionFilters etc and a single ExtractableCohort for a data extraction Project. You can have multiple active ExtractionConfigurations at a time for example a Project might have two cohorts 'Cases' and 'Controls' and you would have two ExtractionConfiguration possibly containing the same datasets and filters but with different cohorts.

Once you have executed, extracted and released an ExtractionConfiguration it becomes 'frozen' (IsReleased) and it is not possible to edit it. This is intended to ensure that once data has gone out the door the configuration that generated the data is immutable. If you need to perform a repeat extraction (e.g. an update of data 5 years on) then you should 'Clone' the ExtractionConfiguration in the Project and give it a new name e.g. 'Cases - 5 year update'.

...

...

ExtractionFilter

Defines as a single line of WHERE SQL. This is a way of reducing the scope of a data extraction / aggregation etc.

...

At query building time RDMP resolves all the various containers, subcontainers, filters and parameters into one extraction SQL query.

...

...

ExtractionInformation

Describes in a single line of SELECT SQL. This can be either the fully qualified name or a transform upon an underlying ColumnInfo. Adding an ExtractionInformation to a CatalogueItem makes it extractable in a linkage Project.

Every ExtractionInformation has an ExtractionCategory which lets you flag the sensitivity of the data being extracted e.g. SpecialApprovalRequired. One (or more) ExtractionInformation in a Catalogue can be flagged as IsExtractionIdentifier. This is the column(s) which will be joined against cohorts in data extraction linkages.

...

...

GovernanceDocument

Contains the path to a useful file which reflects either a request or a granting of governance e.g. a letter from your local healthboard authorising you to host/use 1 or more datasets for a given period of time.

Also includes a name (which should really match the file name) and a description which should be a plain summary of what is in the document such that lay users can appreciate what the document contains/means for the system.

...

...

GovernancePeriod

Tracks the fact that a given set of Catalogues require external approval for your agency to hold.

...

When joining between datasets on different DBMS IsExtractionIdentifier columns are compatible as long as the distinct datatypes are semantically similar (e.g. Oracle varchar2(10) could be INTERSECTED with Sql Server varchar(10) - providing a query cache was used)

...

JoinInfo

Records how to join two TableInfo together.

A JoinInfo can include multiple columns. Each JoinInfo has a direction (e.g. LEFT / RIGHT) and optional collation (for resolving collation conflicts during joins). 

...

...

LoadMetadata

Records how to load data into one or more Catalogues. This includes name, description, scheduled start dates etc.

A LoadMetadata contains at least one ProcessTask which is an ETL step e.g. Unzip files called *.zip / Dowload all files from FTP server X.

...

...

Lookup

Describes a relationship between 3 ColumnInfos in which 2 are from a lookup table (e.g. z_drugName), these are a primary key (e.g. DrugCode) and a description (e.g. HumanReadableDrugName). And a third ColumnInfo from a different table (e.g. Prescribing) which is a foreign key (e.g. DrugPrescribed).

...

Personally Identifiable Information, this is information that could be used to uniquely identify a person. RDMP is designed (when properly configured) to prevent PII information being released in extracts.

...

...

Pipeline

Controls the flow of data from a source to a destination (e.g. extracting linked cohort data into a flat file ).

...

Pipeline components can include user written plugins (e.g. for imaging operations)

...

Project

All extractions through RDMP must be done through Projects. A Project has a name, extraction directory and optionally Tickets (if you have a ticketing system configured). A Project should never be deleted even after all ExtractionConfiguration have been executed as it serves as an audit and a cloning point if you ever need to clone any of the ExtractionConfigurations (e.g. to do an update of project data 5 years on).

The ProjectNumber must match the project number of the ExtractableCohort in your ExternalCohortTable.

...

ProcessTask

Describes a specific operation carried out during a LoadMetadata execution (DLE run). This could be 'unzip all files called *.zip in for loading' or 'after loading the data to live, call sp_clean_table1' or 'Connect to webservice X and download 1,000,000 records which will be serialized into XML'

A ProcessTask has a ProcessTaskType which defines how it is run by RDMP. These include C# classes (which can include plugin components) such as Attachers and DataProviders or traditional ETL steps such as SQL scripts or launching standalone processes.

...

...

SupportingDocument

Describes a document (e.g. PDF / Excel file etc) which is useful for understanding a given dataset (Catalogue). This can be marked as Extractable in which case every time the dataset is extracted the file will also be bundled along with it (so that researchers can also benefit from the file). You can also mark SupportingDocuments as Global in which case they will be provided (if Extractable) to researchers regardless of which datasets they have selected e.g. a PDF on data governance or a copy of an empty 'data use contract document'.

...

SupportingSQLTable

Describes an SQL query that can be run to generate useful information for the understanding of a given Catalogue.

...

If the Global flag is set then the SQL will be run and the result provided to every researcher regardless of what datasets they have asked for in an extraction, this is useful for large lookups like ICD / SNOMED CT which are likely to be used by many datasets.

...

TableInfo

Describes an sql table (or table valued function) on a given DBMS Server from which you intend to either extract and/or load / curate data. A TableInfo represents a cached state of the live database table schema. You can synchronize a TableInfo at any time to handle schema changes (e.g. dropping columns).

...

UNION

Mathematical set operation which matches unique (distinct) identifiers in any datasets being combined (e.g. SetA UNION SetB returns any patient in either SetA or SetB).

...

...

INTERSECT

Mathematical set operation which matches unique (distinct) identifiers only if they appear in all datasets being combined (e.g. SetA INTERSECT SetB returns patients who appear in both SetA and SetB).

...

...

EXCEPT

Mathematical set operation which matches unique (distinct) identifiers in the first dataset only if they do not appear in any of the subsequent datasets being combined (e.g. SetA EXCEPT SetB returns patients who appear in SetA but not SetB).

...