4. Extract secondary information from metrics (dTAD, stripeTAD, etc.)
4.1 Quick start
# dense matrix
h1d call stripe ./test_data/GSE104334_Ctrl.chr21.matrix.gz \
50000 chr21 -o testname
# .hic
h1d call stripe ./test_data/GSE104334_Ctrl.hic \
50000 chr21 -o testname --datatype rawhic --gt ./test_data/hg19_genome_table.txt
# .cool
h1d call stripe ./test_data/GSE104334_Ctrl.50000.cool \
50000 chr21 -o testname --datatype cool --gt ./test_data/hg19_genome_table.txt
The output will be: testname_stripe.csv
4.2 Parameters
h1d call -h
usage: __main__.py call [-h] [-o OUTNAME] [-c CONTROLMATRIX]
[--datatype DATATYPE] [--gt GT] [-p PARAMETER]
mode matrix resolution chromosome
Required parameters:
mode
, Running mode,,should be one of {dTAD,stripe,stripeTAD, TAD,hubs}data
, Path of matrix file or raw .hic file.resolution
, resolution (50000, i.e.) of given contact matrix, or choosed resolution for analyzing.hic
file.chromosome
, selected chromosome to be analyzed.
Optional parameters:
-o
, output name, default: defaultname-c
, contact matrix or .hic file of control sample, which is required when using “dTAD” mode.--datatype
, type of input data: “matrix” (default) or “rawhic”.--gt
, genome table file when using raw .hic data.-p
, parameters for particular mode:
mode | Which parameter | Default |
---|---|---|
dTAD | for DRF | 200000-5000000 |
stripe | for 'strong IAS peak' | 0.02 |
stripeTAD | for IAS | 300000 |
TAD | for IS | 300000 |
Hubs | for IF | 0.05 |
!! Please note that “stripe” is different from “stripeTAD”: Stripe is the regions with “stripe” structure, whereas stripe-TAD is asymmetric TAD (may contain many stripes). Thus stripe-TAD is the classfication from all TAD.
4.3 dTAD
h1d provide the function to call dTAD as
h1d call dTAD ./test_data/GSE104334_Rad21KD.chr21.matrix.gz \
50000 chr21 -c ./test_data/GSE104334_Ctrl.chr21.matrix.gz \
--datatype matrix -o testname -p 200000-5000000
The output will be testname_leftdTAD.csv
and testname_rightdTAD.csv
, as:
chr | TADstart | TADend |
---|---|---|
chr21 | 17900000 | 18250000 |
chr21 | 26800000 | 27700000 |
... | ... | ... |
4.4 stripe
This function automatically identify all regions with stripe ‘structure’. The key idea is to find the sharp, strong IAS peaks. We used the similar strategy which use insulation score to identify TAD boundaries. For the stripe calling, after extracting the local maximum positions of IAS, only positions with IAS > IASmean is retained. Then, similar to Crane et.al, Nature 2015, we calculate a delta vector of IAS for each bin to extract only strong IAS peaks. To avoid clustered small peaks, the IAS value of a ‘stripe’ position should be higher than any position around 100kb.
h1d call stripe ./test_data/GSE104334_Ctrl.chr21.matrix.gz 50000 chr21 \
--datatype matrix -o testname
This will output the summit of IAS signal, i.e. the stripes:
chr | start | end | IAS |
---|---|---|---|
chr21 | 15450000 | 15500000 | 2.333717 |
chr21 | 15900000 | 15950000 | 2.561192 |
chr21 | 16300000 | 16350000 | 1.892257 |
... | ... | ... | ... |
4.5 stripe-TAD
This function simply divide all TAD into “loop”, “left-stripe”, “right-stripe” and “other” TAD:
h1d call stripeTAD ./test_data/GSE104334_Ctrl.chr21.matrix.gz 50000 chr21 \
--datatype matrix -o testname -p 300000
The output will be testname_stripe.csv
, as:
chr | TADstart | TADend | TADtype |
---|---|---|---|
chr21 | 9500000 | 10250000 | loopTAD |
chr21 | 14800000 | 15600000 | leftStripe |
chr21 | 15600000 | 16850000 | otherTAD |
... | ... | ... | ... |
4.6 Hubs
This function extract chromatin Hubs as described in PMID: 26272203
h1d call hubs ./test_data/GSE104334_Ctrl.hic 50000 chr21 \
--datatype rawhic --gt ./test_data/hg19_genome_table.txt -o testname -p 0.05
The output will be testname_hubs.csv
in .bed
style, as :
chr21 | 15450000 | 15750000 |
---|---|---|
chr21 | 15850000 | 16000000 |
... | ... | ... |
4.7 TAD
This function will use Insulation Score to simply call TAD:
h1d call TAD ./test_data/GSE104334_Ctrl.chr21.matrix.gz 50000 chr21 \
--datatype matrix -o testname -p 300000