Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The demonstration showed how fast and efficient SQL can be, but didn’t go into any details on the querying language itself. This post aims to be a getting ‘getting started guideguide’, and will link to more advanced topics as they come up later. We’ll also focus on the kind of data HIC provide, and try to include examples which are relevant to as many TRE users as possible.

Table of Contents
stylenone

How to Play Along

Whether your TRE workspace is Windows or Ubuntu there are software tools that you can use for SQL. If you have a Windows workspace, we recommend using Microsoft SQL Management Studio - this should be installed by default on all Windows TRE workspaces. For Ubuntu workspaces, Azure Data Studio should be installed.

If you can’t find SQL Management Studio or Azure Data Studio on your workspace, please contact HICSupport@dundee.ac.uk and we will install it for you.

We have a guide on how to connect to the TRE SQL Server, which covers both of these tools: How do I connect to my study's SQL database? Once you’ve logged in, you’ll be able to run some queries by selecting your database and clicking “New Query” from the menu. This will open an editor which you can type your query into. Towards the top, there will be a button to execute the query. A results pane (or error message, if the query has an issue) will be displayed below the editor.

If you can’t find SQL Management Studio or Azure Data Studio on your workspace, please contact HICSupport@dundee.ac.uk and we will install it for youWhilst we demonstrated this with synthetic data generated by HIC Data, focusing on a Prescribing dataset, you will have to edit table and variable names specific to your project dataset.

Returning Results with SELECT

The SELECT query will return the values, columns, or aggregates you request. To start, let’s return 100 entries from the Prescribing table.

...

After clicking Execute, the first 100 results will be returned from the Prescribing table. Note that while keywords are not case sensitive, the names of tables and columns are. You can try changing Prescribing to prescribing which will produce the error “Invalid object name prescribing”, but changing SELECT to select would work fine.

Returning all results (i.e.g removing TOP(100) from the query) is usually possible, but is often unlikely to facilitate your understanding and layout of the schema - instead, it’ll just take longer (sometimes ages!) to load the results. If you do this and it’s taking ages, click on the editor window and then click on the 🟥 icon, which will stop the query.

...

JOIN allows you to find a related row in another table, for each row the current table - it joints joins the columns together based on a provided condition. There are quite a few types of JOIN, but learning just a couple of them will probably work for you 90% of the time. In this example, we’ll stick with the example above and select the first 100 results of prescribed date and quantity, but we’ll link it to the Demography table and return the patient’s sex.

...

The condition specified in the braces brackets can be expanded if necessary. Some tables might use several columns as their primary key (this is called a composite key).

...

The difference is that now every row in the right table (Demography_Current) is being linked to every row in the left table (Prescribing). Whether this makes a difference or not will depend on whether it is a one-to-one mapping, a many-to-one mapping or a one-to-many mapping. For example, the there might be multiple rows for the same PROCHI in the demography table, which would make it appear as though there are more rows in Prescribing than there really are.

...

Working with SQL can feel a bit verbose sometimes - typing “Demography_Current” with the correct case can get tiresome. Instead, SQL supports something called table aliasing. This allows the table to be referenced by a shorter name. The example above can be rewritten to make it more compact (and sometimes, more readable!).

Code Block
languagesql
SELECT TOP(100) prescribed_date, quantity, sex FROM Prescribing P
  LEFT JOIN Demography_Current D ON (P.PROCHI = D.PROCHI)

...

One of the most important features of SQL databases, like Microsoft SQL Server, is the ability to add indexes to columns. This builds a cache of values which enable quicker retrieval from the table by value. Some columns will be indexed by default, this is always the case for the primary key (e.g. PROCHI in the Prescribing table). Additional indexes can be added by the data HIC Data team if your queries are running slower than expected. We’re unable to allow everyone to add indexes as databases are provisioned in the TRE read-only. This avoids accidental modification or deletion of data. We offer a read/write database suffixed with “_Working” which you can use for temporary tables or duplicating the data to add indexes, if necessary.

...