Instructions
What is a Catalogue?
A Catalogue is RDMP's representation of one of your datasets e.g. 'Hospital Admissions'. A Catalogue consists of:
Human readable names/descriptions of what is in the dataset it is
A collection of items mapped to underlying columns in your database.
Validation rules for each of the extractable items in the dataset
Graph definitions for viewing the contents of the dataset (and testing filters / cohorts built)
Attachments which help understand the dataset (e.g. a pdf file)
Each of these:
Can be extractable or not, or extractable only with Special Approval
Can involve a transform on the underlying column (E.g. hash on extraction, UPPER etc)
Have a human readable name/description of the column/transform
Can have curated WHERE filters defined on them which can be reused for project extraction/cohort generation etc
A Catalogue can be a part of project extraction configurations, used in cohort identification configurations. They can be marked as Deprecated, Internal etc.
The separation of dataset and underlying table allows you to have multiple datasets both of which draw data from the same table. It also makes it easier to handle moving a table/database (e.g. to a new server or database) / renaming etc.
If you expand a Catalogue (e.g. biochemistry) you can see the ‘Catalogue Items’ node. These are the extractable columns in the dataset ‘biochemistry’. If you expand a Catalogue Item, you can see two nodes:
the first is the Extraction Information logic for the column
the second is the underlying database column reference (ColumnInfo).
Add Comment