...
A Pipeline can be missing either/both a source and destination. This means that the pipeline can only be used in a context where the source/destination is already fixed (for example if the user is trying to bulk insert a CSV file then the Destination might be a fixed instance of DataTableUploadDestination initialized with a specific server/database that the user had picked on a user interface).
...
PipelineComponent
Blueprint for a specific task that can be run in a Pipeline. A component has one of the following roles:
Icon | Role | Description | Example |
Source | Produces data | executing linkage SQL on a server | |
Middle | Transforms / Audits data | substituting values in a column for an anonymous mapping | |
Destination | Consumes data | writes records out to disk |
Pipeline components can include user written plugins (e.g. for imaging operations)
...
Project
All extractions through RDMP must be done through Projects. A Project has a name, extraction directory and optionally Tickets (if you have a ticketing system configured). A Project should never be deleted even after all ExtractionConfiguration have been executed as it serves as an audit and a cloning point if you ever need to clone any of the ExtractionConfigurations (e.g. to do an update of project data 5 years on).
The ProjectNumber must match the project number of the ExtractableCohort in your ExternalCohortTable.
...
ProcessTask
Describes a specific operation carried out during a LoadMetadata execution (DLE run). This could be 'unzip all files called *.zip in for loading' or 'after loading the data to live, call sp_clean_table1' or 'Connect to webservice X and download 1,000,000 records which will be serialized into XML'
A ProcessTask has a ProcessTaskType which defines how it is run by RDMP. These include C# classes (which can include plugin components) such as Attachers and DataProviders or traditional ETL steps such as SQL scripts or launching standalone processes.
...
SupportingDocument
Describes a document (e.g. PDF / Excel file etc) which is useful for understanding a given dataset (Catalogue). This can be marked as Extractable in which case every time the dataset is extracted the file will also be bundled along with it (so that researchers can also benefit from the file). You can also mark SupportingDocuments as Global in which case they will be provided (if Extractable) to researchers regardless of which datasets they have selected e.g. a PDF on data governance or a copy of an empty 'data use contract document'.
SupportingSQLTable
Describes an SQL query that can be run to generate useful information for the understanding of a given Catalogue.
If it is marked as Extractable then it will be bundled along with the Catalogue every time it is extracted (for this reason it is important to ensure that no PII data is returned by the query).
This can be used as an alternative to definining Lookups or to extract other useful administrative data etc to be provided to researchers
If the Global flag is set then the SQL will be run and the result provided to every researcher regardless of what datasets they have asked for in an extraction, this is useful for large lookups like ICD / SNOMED CT which are likely to be used by many datasets.
TableInfo
Describes an sql table (or table valued function) on a given DBMS Server from which you intend to either extract and/or load / curate data. A TableInfo represents a cached state of the live database table schema. You can synchronize a TableInfo at any time to handle schema changes (e.g. dropping columns)
UNION
Mathematical set operation which matches unique (distinct) identifiers in any datasets being combined (e.g. SetA UNION SetB returns any patient in either SetA or SetB).
INTERSECT
Mathematical set operation which matches unique (distinct) identifiers only if they appear in all datasets being combined (e.g. SetA INTERSECT SetB returns patients who appear in both SetA and SetB).
EXCEPT
Mathematical set operation which matches unique (distinct) identifiers in the first dataset only if they do not appear in any of the subsequent datasets being combined (e.g. SetA EXCEPT SetB returns patients who appear in SetA but not SetB).