Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • O omop-cdm-toolkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Health Data
  • omop-cdm-toolkit
  • Wiki
  • Tutorial

Tutorial · Changes

Page history
Create Tutorial authored May 22, 2024 by alabarga's avatar alabarga
Hide whitespace changes
Inline Side-by-side
Tutorial.md 0 → 100644
View page @ dd6524ec
# BUilding a sample OMOP-CDM database
## Prerequisites
- Install R
- Install Java
- You will need a database to upload the data to. For testing purposes, you can use docker
`docker run --name demo_omop -e POSTGRES_PASSWORD=lollypop -e POSTGRES_USER=postgres -p 5432:5432 -v ${PWD}/postgres:/var/lib/postgresql/data -v ${PWD}/backup:/backup -d postgres `
## Generating Synthetic data
To generate synthetic data we use [Synthea<sup>TM</sup> patient generator](https://github.com/synthetichealth/synthea).
Synthea<sup>TM</sup> generates synthetic data from the medical history of patients. It aims to create high-quality, realistic data related to patients and associated health records without privacy and security constraints.
One of the greatest qualities of Synthea<sup>TM</sup> is having more than 90 different modules, each one containing models for different diseases or medical observations. However, most of these modules have dependencies between them, and it is not recommended to restrict the search for a subset of them.
Download [synthea-with-dependencies.jar](https://github.com/synthetichealth/synthea/releases/download/master-branch-latest/synthea-with-dependencies.jar) or download provided [data.zip](https://github.com/alabarga/pybcn22-modern-data-stack/blob/main/synthea/data.zip)
-
The basic command line to generate data, in Synthea<sup>TM</sup> v3.2.0, is the following:
```
java -jar synthea-with-dependencies.jar -c synthea.properties -p 1000 [-p populationSize]
```
To export the data in CSV format, you need to set the parameter `exporter.csv.export = true` at `synthea.properties`.
To generate different types of data with modules, one must use the `-m` option with the name of your modules. Check the page with an example [here](https://github.com/synthetichealth/synthea/wiki/The--M-Feature).
## Import synthetic data to a relational database
To import the data to a database, we use the [ETL Synthea repo](https://github.com/OHDSI/ETL-Synthea).
First, install the library from github
```
devtools::install_github("OHDSI/ETL-Synthea")
```
And run the following code
```r
library(ETLSyntheaBuilder)
cd <- DatabaseConnector::createConnectionDetails(
dbms = "postgresql",
server = "localhost/demo_omop",
user = "postgres",
password = "lollipop",
port = 5432,
pathToDriver = "..../drivers"
)
cdmSchema <- "cdm"
cdmVersion <- "5.4"
syntheaVersion <- "2.7.0"
syntheaSchema <- "native"
syntheaFileLoc <- "/tmp/synthea/output/csv"
vocabFileLoc <- "/tmp/Vocabulary_20181119"
ETLSyntheaBuilder::CreateCDMTables(connectionDetails = cd, cdmSchema = cdmSchema, cdmVersion = cdmVersion)
ETLSyntheaBuilder::CreateSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaVersion = syntheaVersion)
ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
ETLSyntheaBuilder::LoadVocabFromCsv(connectionDetails = cd, cdmSchema = cdmSchema, vocabFileLoc = vocabFileLoc)
ETLSyntheaBuilder::LoadEventTables(connectionDetails = cd, cdmSchema = cdmSchema, syntheaSchema = syntheaSchema, cdmVersion = cdmVersion, syntheaVersion = syntheaVersion)
```
Clone repository

Home

Cohort Definition