Commit 23321e97 authored by María Morales Martínez's avatar María Morales Martínez
Browse files

Histone marks distribution tool

# Histone marks distribution in chromatin beds
<div style="text-align: justify">
This pipeline is dedicated to estimate the tendency of histone marks (or other possible transcription factor) in different established regions along the nuclei. Chromatin extension is divided acording to radial "artificial" regions from the periphery (Lamin) to center of the nuclei and statistics from the distribution of each acetylation/methylation mark in those regions are calculated (mean and std). These regions are defined by their distances that moves away to Lamin B1 protein.
## Requirements:
Before running the pipeline is necessary:
### Data preparation
#### Histone Chip-seq
Zipped Chip-seq data must be storaged in */data/chip-seq/unzip*.
List of chip-seq bed files that are going been used to check the chromatin. Grace to this list, it does not mutter if you have several replicates from same sample o more chip-seq files than they required, because only will be selected those which are in the list.
#### Input Chromatin Bed
Chromatin input bed must be storage in */data/chromatin_bed/[category]/* (created by you).
Establishing different categories avoid mix different types of bed files.
### Installation
This tool is programmed under python==3.7.8.
To ensure the success of the installation of all packages and dependencies, it is necesary activate the virtual environment.
source venv/bin/activate
Complete environment packeges are available in **requirements.txt**. It is possible install directly the packages running the next command:
pip install -r requirements.txt
## Usage:
Execute the following command adding the required information for the arguments.
python [-h] -c CAT -b BED -s SMK -l LIST
- **Category (CAT)**: category or type of input bed. This argument identifies the type of mapped DNA elements to which the chosen input bed belongs.
- **Input bed (BED)**: This bed is divided into different ranges of genomic distance from the periphery to the center of the nuclei. The genomic distance that defines these ranges is: In contact with Lamin B1, 0kb-250kb, 250kb-1000kb, 1000kb-2500kb, 2500kb-5000kb, Center. The acetylation/methylation content of this bed in each range will be evaluated according to the marks that overlap with the input bed.
- **Snakemake workflow (SMK)**: snakefile with the processing rules applied to the data. In sanakemake folder.
- **List of selected bed files (LIST)**: selected hisntone mark samples to be overlapped with the input bed. In marklists folder.
*Quick summary:*
python -h
For more detailled information, consult attached documentation.
## Example
The tool provides an example to guarantee the perfect execution that can be carried out by the user.
In the next command example, it is indicated the type of chromatin bed that is goingo to be divided in regions are neutral regions, the bed inside this category, the order of pipeline running and the list with the elements that will be used in the analysis
python -c nre -b hg38_NRE_filt5 -s processing.smk -l selected_files_ids.txt
There is another input data file for highly expressed genes in category transcription that can be run by the same user.
## Next updates:
- Upload complete documentation of all scripts (scheme required)
- Change structure to separate types of histone chip-seq used according the source of each dataset.
- Change command line parameters to config file witho all the parameters.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment