Commit bd49ad62 authored by María Morales Martínez's avatar María Morales Martínez
Browse files

To allow multiple histone chip-seq datasets

parent 23321e97
# Histone marks distribution in chromatin beds
<div style="text-align: justify">
<div style="text-align: justify">
This pipeline is dedicated to estimate the tendency of histone marks (or other possible transcription factor) in different established regions along the nuclei. Chromatin extension is divided acording to radial "artificial" regions from the periphery (Lamin) to center of the nuclei and statistics from the distribution of each acetylation/methylation mark in those regions are calculated (mean and std). These regions are defined by their distances that moves away to Lamin B1 protein.
</div>
......@@ -12,13 +12,13 @@ Before running the pipeline is necessary:
#### Histone Chip-seq
Zipped Chip-seq data must be storaged in */data/chip-seq/unzip*.
Zipped Chip-seq data (".bed.gz" format) must be storaged in */data/marks/all*.
List of chip-seq bed files that are going been used to check the chromatin. Grace to this list, it does not mutter if you have several replicates from same sample o more chip-seq files than they required, because only will be selected those which are in the list.
List of chip-seq bed files that are going been used to check the chromatin. Grace to this list, it does not matter if you have several replicates from same sample o more chip-seq files than they required, because only will be selected those which are in the list.
#### Input Chromatin Bed
Chromatin input bed must be storage in */data/chromatin_bed/[category]/* (created by you).
Chromatin input bed must be storage in */data/chromatin_bed/[category]/* (created by you).
Establishing different categories avoid mix different types of bed files.
......@@ -26,7 +26,7 @@ Establishing different categories avoid mix different types of bed files.
This tool is programmed under python==3.7.8.
To ensure the success of the installation of all packages and dependencies, it is necesary activate the virtual environment.
To ensure the success of the installation of all packages and dependencies, it is necessary activate the virtual environment.
```
source venv/bin/activate
......@@ -43,40 +43,37 @@ pip install -r requirements.txt
Execute the following command adding the required information for the arguments.
```
python run_smk.py [-h] -c CAT -b BED -s SMK -l LIST
python run_smk.py [-h] -c CONFIG -s SMK
```
- **Category (CAT)**: category or type of input bed. This argument identifies the type of mapped DNA elements to which the chosen input bed belongs.
- **Input bed (BED)**: This bed is divided into different ranges of genomic distance from the periphery to the center of the nuclei. The genomic distance that defines these ranges is: In contact with Lamin B1, 0kb-250kb, 250kb-1000kb, 1000kb-2500kb, 2500kb-5000kb, Center. The acetylation/methylation content of this bed in each range will be evaluated according to the marks that overlap with the input bed.
- **Configuration (CONFIG)**: Input file with all the parametters required to run the pipeline: type or regions in chromatin to contrast (chromatin_category), bed in side of this category to be analysed, histone chip-seq dataset with all samples, list of histone chip-seq to use in the analysis.Save new config file (".json") in config folder.
- **Snakemake workflow (SMK)**: snakefile with the processing rules applied to the data. In sanakemake folder.
- **List of selected bed files (LIST)**: selected hisntone mark samples to be overlapped with the input bed. In marklists folder.
- **Snakemake workflow (SMK)**: snake file with the description of the workflow followed by rules that determine the order of data processing. Placed in sanakemake folder.
*Quick summary:*
```
python run_smk.py -h
python run_config.py -h
```
For more detailled information, consult attached documentation.
For more detailled information, consult attached documentation in *pipelines/documentation*.
## Example
The tool provides an example to guarantee the perfect execution that can be carried out by the user.
In the next command example, it is indicated the type of chromatin bed that is goingo to be divided in regions are neutral regions, the bed inside this category, the order of pipeline running and the list with the elements that will be used in the analysis
In the next command example, the configuration file and the script that collects the workflow instructions to run the analysis are indicated.
```
python run_smk.py -c nre -b hg38_NRE_filt5 -s processing.smk -l selected_files_ids.txt
python run_config.py -c encode_nre.json -s analysis.smk
```
There is another input data file for highly expressed genes in category transcription that can be run by the same user.
## Next updates:
There is another configuration file for 3000 top highly expressed genes in *encode_transcription.json* that can be run by the same user.
- Upload complete documentation of all scripts (scheme required)
## Next updates:
- Change structure to separate types of histone chip-seq used according the source of each dataset.
- Add graphics that support the understanding of the internal analysis processes.
- Change command line parameters to config file witho all the parameters.
- Add Hi-C bed to evaluate the real 3D chromatin distribution of histone marks.
- Add test to find bias among histone marks in differet LAD regions.
"https://www.encodeproject.org/metadata/?type=Experiment&files.assembly=GRCh38&files.output_type=pseudo-replicated+peaks&files.output_type=replicated+peaks&files.file_type=bed+narrowPeak" -X GET -H "Accept: text/tsv" -H "Content-Type: application/json" --data '{"elements": ["/experiments/ENCSR880SUY/", "/experiments/ENCSR364BKW/", "/experiments/ENCSR991BTY/", "/experiments/ENCSR130EML/", "/experiments/ENCSR395USV/", "/experiments/ENCSR631RJR/", "/experiments/ENCSR687FDK/", "/experiments/ENCSR476KTK/", "/experiments/ENCSR003SSR/", "/experiments/ENCSR739BZR/", "/experiments/ENCSR928HYM/", "/experiments/ENCSR442DGE/", "/experiments/ENCSR848DUN/", "/experiments/ENCSR301HRV/", "/experiments/ENCSR019SQX/", "/experiments/ENCSR858BOF/", "/experiments/ENCSR768FWX/", "/experiments/ENCSR186OBR/", "/experiments/ENCSR326NQF/", "/experiments/ENCSR462XRE/", "/experiments/ENCSR161NON/", "/experiments/ENCSR057BTG/", "/experiments/ENCSR216OGD/", "/experiments/ENCSR925LJZ/", "/experiments/ENCSR728SZE/", "/experiments/ENCSR813CTI/", "/experiments/ENCSR883AQJ/", "/experiments/ENCSR322MEI/", "/experiments/ENCSR271TFS/", "/experiments/ENCSR891KGZ/", "/experiments/ENCSR952GVX/", "/experiments/ENCSR496DCY/", "/experiments/ENCSR675FLJ/", "/experiments/ENCSR038PFD/", "/experiments/ENCSR443YAS/", "/experiments/ENCSR703KXH/", "/experiments/ENCSR441UHO/", "/experiments/ENCSR000APZ/", "/experiments/ENCSR815LBP/", "/experiments/ENCSR000ANC/", "/experiments/ENCSR814XPE/", "/experiments/ENCSR000ALU/", "/experiments/ENCSR000ANB/", "/experiments/ENCSR000APY/", "/experiments/ENCSR000AMG/", "/experiments/ENCSR000ANP/", "/experiments/ENCSR000ANA/", "/experiments/ENCSR000AMH/", "/experiments/ENCSR000AND/"]}'
https://www.encodeproject.org/files/ENCFF045CUG/@@download/ENCFF045CUG.bed.gz
https://www.encodeproject.org/files/ENCFF315NAV/@@download/ENCFF315NAV.bed.gz
https://www.encodeproject.org/files/ENCFF434PWT/@@download/ENCFF434PWT.bed.gz
https://www.encodeproject.org/files/ENCFF711LQB/@@download/ENCFF711LQB.bed.gz
https://www.encodeproject.org/files/ENCFF250GSY/@@download/ENCFF250GSY.bed.gz
https://www.encodeproject.org/files/ENCFF558IKG/@@download/ENCFF558IKG.bed.gz
https://www.encodeproject.org/files/ENCFF098JFF/@@download/ENCFF098JFF.bed.gz
https://www.encodeproject.org/files/ENCFF093YFN/@@download/ENCFF093YFN.bed.gz
https://www.encodeproject.org/files/ENCFF744ORJ/@@download/ENCFF744ORJ.bed.gz
https://www.encodeproject.org/files/ENCFF997ZDN/@@download/ENCFF997ZDN.bed.gz
https://www.encodeproject.org/files/ENCFF411ESN/@@download/ENCFF411ESN.bed.gz
https://www.encodeproject.org/files/ENCFF385DSV/@@download/ENCFF385DSV.bed.gz
https://www.encodeproject.org/files/ENCFF808BIC/@@download/ENCFF808BIC.bed.gz
https://www.encodeproject.org/files/ENCFF202EZL/@@download/ENCFF202EZL.bed.gz
https://www.encodeproject.org/files/ENCFF408FCY/@@download/ENCFF408FCY.bed.gz
https://www.encodeproject.org/files/ENCFF964FVB/@@download/ENCFF964FVB.bed.gz
https://www.encodeproject.org/files/ENCFF053HKF/@@download/ENCFF053HKF.bed.gz
https://www.encodeproject.org/files/ENCFF156RHD/@@download/ENCFF156RHD.bed.gz
https://www.encodeproject.org/files/ENCFF987CBQ/@@download/ENCFF987CBQ.bed.gz
https://www.encodeproject.org/files/ENCFF628KOM/@@download/ENCFF628KOM.bed.gz
https://www.encodeproject.org/files/ENCFF068IZP/@@download/ENCFF068IZP.bed.gz
https://www.encodeproject.org/files/ENCFF642IBN/@@download/ENCFF642IBN.bed.gz
https://www.encodeproject.org/files/ENCFF084QDP/@@download/ENCFF084QDP.bed.gz
https://www.encodeproject.org/files/ENCFF736WAN/@@download/ENCFF736WAN.bed.gz
https://www.encodeproject.org/files/ENCFF439CWL/@@download/ENCFF439CWL.bed.gz
https://www.encodeproject.org/files/ENCFF443ZMH/@@download/ENCFF443ZMH.bed.gz
https://www.encodeproject.org/files/ENCFF348GGB/@@download/ENCFF348GGB.bed.gz
https://www.encodeproject.org/files/ENCFF441MSJ/@@download/ENCFF441MSJ.bed.gz
https://www.encodeproject.org/files/ENCFF694ENI/@@download/ENCFF694ENI.bed.gz
https://www.encodeproject.org/files/ENCFF760EFQ/@@download/ENCFF760EFQ.bed.gz
https://www.encodeproject.org/files/ENCFF343GTP/@@download/ENCFF343GTP.bed.gz
https://www.encodeproject.org/files/ENCFF422MTS/@@download/ENCFF422MTS.bed.gz
https://www.encodeproject.org/files/ENCFF237OAY/@@download/ENCFF237OAY.bed.gz
https://www.encodeproject.org/files/ENCFF397EFT/@@download/ENCFF397EFT.bed.gz
https://www.encodeproject.org/files/ENCFF277AOQ/@@download/ENCFF277AOQ.bed.gz
https://www.encodeproject.org/files/ENCFF160CEI/@@download/ENCFF160CEI.bed.gz
https://www.encodeproject.org/files/ENCFF806NNI/@@download/ENCFF806NNI.bed.gz
https://www.encodeproject.org/files/ENCFF654ZZO/@@download/ENCFF654ZZO.bed.gz
https://www.encodeproject.org/files/ENCFF851FNE/@@download/ENCFF851FNE.bed.gz
https://www.encodeproject.org/files/ENCFF836LZM/@@download/ENCFF836LZM.bed.gz
https://www.encodeproject.org/files/ENCFF456NIF/@@download/ENCFF456NIF.bed.gz
https://www.encodeproject.org/files/ENCFF296RYM/@@download/ENCFF296RYM.bed.gz
https://www.encodeproject.org/files/ENCFF813VFV/@@download/ENCFF813VFV.bed.gz
https://www.encodeproject.org/files/ENCFF344MEX/@@download/ENCFF344MEX.bed.gz
https://www.encodeproject.org/files/ENCFF668YOE/@@download/ENCFF668YOE.bed.gz
https://www.encodeproject.org/files/ENCFF162HPV/@@download/ENCFF162HPV.bed.gz
https://www.encodeproject.org/files/ENCFF238YJA/@@download/ENCFF238YJA.bed.gz
https://www.encodeproject.org/files/ENCFF997CKL/@@download/ENCFF997CKL.bed.gz
https://www.encodeproject.org/files/ENCFF436XTS/@@download/ENCFF436XTS.bed.gz
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment