|
|
# BUilding a sample OMOP-CDM database
|
|
|
|
|
|
## Prerequisites
|
|
|
|
|
|
- Install R
|
|
|
- Install Java
|
|
|
- You will need a database to upload the data to. For testing purposes, you can use docker
|
|
|
|
|
|
`docker run --name demo_omop -e POSTGRES_PASSWORD=lollypop -e POSTGRES_USER=postgres -p 5432:5432 -v ${PWD}/postgres:/var/lib/postgresql/data -v ${PWD}/backup:/backup -d postgres `
|
|
|
|
|
|
## Generating Synthetic data
|
|
|
|
|
|
To generate synthetic data we use [Synthea<sup>TM</sup> patient generator](https://github.com/synthetichealth/synthea).
|
|
|
|
|
|
Synthea<sup>TM</sup> generates synthetic data from the medical history of patients. It aims to create high-quality, realistic data related to patients and associated health records without privacy and security constraints.
|
|
|
|
|
|
One of the greatest qualities of Synthea<sup>TM</sup> is having more than 90 different modules, each one containing models for different diseases or medical observations. However, most of these modules have dependencies between them, and it is not recommended to restrict the search for a subset of them.
|
|
|
|
|
|
Download [synthea-with-dependencies.jar](https://github.com/synthetichealth/synthea/releases/download/master-branch-latest/synthea-with-dependencies.jar) or download provided [data.zip](https://github.com/alabarga/pybcn22-modern-data-stack/blob/main/synthea/data.zip)
|
|
|
-
|
|
|
The basic command line to generate data, in Synthea<sup>TM</sup> v3.2.0, is the following:
|
|
|
|
|
|
```
|
|
|
java -jar synthea-with-dependencies.jar -c synthea.properties -p 1000 [-p populationSize]
|
|
|
|
|
|
```
|
|
|
|
|
|
To export the data in CSV format, you need to set the parameter `exporter.csv.export = true` at `synthea.properties`.
|
|
|
|
|
|
To generate different types of data with modules, one must use the `-m` option with the name of your modules. Check the page with an example [here](https://github.com/synthetichealth/synthea/wiki/The--M-Feature).
|
|
|
|
|
|
## Import synthetic data to a relational database
|
|
|
|
|
|
To import the data to a database, we use the [ETL Synthea repo](https://github.com/OHDSI/ETL-Synthea).
|
|
|
|
|
|
First, install the library from github
|
|
|
|
|
|
```
|
|
|
devtools::install_github("OHDSI/ETL-Synthea")
|
|
|
```
|
|
|
|
|
|
And run the following code
|
|
|
|
|
|
```r
|
|
|
library(ETLSyntheaBuilder)
|
|
|
|
|
|
cd <- DatabaseConnector::createConnectionDetails(
|
|
|
dbms = "postgresql",
|
|
|
server = "localhost/demo_omop",
|
|
|
user = "postgres",
|
|
|
password = "lollipop",
|
|
|
port = 5432,
|
|
|
pathToDriver = "..../drivers"
|
|
|
)
|
|
|
|
|
|
cdmSchema <- "cdm"
|
|
|
cdmVersion <- "5.4"
|
|
|
syntheaVersion <- "2.7.0"
|
|
|
syntheaSchema <- "native"
|
|
|
syntheaFileLoc <- "/tmp/synthea/output/csv"
|
|
|
vocabFileLoc <- "/tmp/Vocabulary_20181119"
|
|
|
|
|
|
ETLSyntheaBuilder::CreateCDMTables(connectionDetails = cd, cdmSchema = cdmSchema, cdmVersion = cdmVersion)
|
|
|
|
|
|
ETLSyntheaBuilder::CreateSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaVersion = syntheaVersion)
|
|
|
|
|
|
ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
|
|
|
|
|
|
ETLSyntheaBuilder::LoadVocabFromCsv(connectionDetails = cd, cdmSchema = cdmSchema, vocabFileLoc = vocabFileLoc)
|
|
|
|
|
|
ETLSyntheaBuilder::LoadEventTables(connectionDetails = cd, cdmSchema = cdmSchema, syntheaSchema = syntheaSchema, cdmVersion = cdmVersion, syntheaVersion = syntheaVersion)
|
|
|
|
|
|
```
|
|
|
|
|
|
|