4. Extract secondary information from metrics (dTAD, stripeTAD, etc.)

4.1 Quick start

# dense matrix
h1d call stripe ./test_data/GSE104334_Ctrl.chr21.matrix.gz \
	50000 chr21 -o testname

# .hic
h1d call stripe ./test_data/GSE104334_Ctrl.hic \
	50000 chr21 -o testname --datatype rawhic --gt ./test_data/hg19_genome_table.txt
	
# .cool
h1d call stripe ./test_data/GSE104334_Ctrl.50000.cool \
	50000 chr21 -o testname --datatype cool --gt ./test_data/hg19_genome_table.txt

The output will be: testname_stripe.csv

4.2 Parameters

h1d call -h
usage: __main__.py call [-h] [-o OUTNAME] [-c CONTROLMATRIX]
                        [--datatype DATATYPE] [--gt GT] [-p PARAMETER]
                        mode matrix resolution chromosome

Required parameters:
- mode, Running mode,,should be one of {dTAD,stripe,stripeTAD, TAD,hubs}
- data, Path of matrix file or raw .hic file.
- resolution, resolution (50000, i.e.) of given contact matrix, or choosed resolution for analyzing .hic file.
- chromosome, selected chromosome to be analyzed.
Optional parameters:
- -o, output name, default: defaultname
- -c, contact matrix or .hic file of control sample, which is required when using “dTAD” mode.
- --datatype, type of input data: “matrix” (default) or “rawhic”.
- --gt, genome table file when using raw .hic data.
- -p, parameters for particular mode:

mode	Which parameter	Default
dTAD	for DRF	200000-5000000
stripe	for 'strong IAS peak'	0.02
stripeTAD	for IAS	300000
TAD	for IS	300000
Hubs	for IF	0.05

!! Please note that “stripe” is different from “stripeTAD”: Stripe is the regions with “stripe” structure, whereas stripe-TAD is asymmetric TAD (may contain many stripes). Thus stripe-TAD is the classfication from all TAD.

4.3 dTAD

h1d provide the function to call dTAD as

h1d call dTAD ./test_data/GSE104334_Rad21KD.chr21.matrix.gz \
	50000 chr21 -c ./test_data/GSE104334_Ctrl.chr21.matrix.gz \
  --datatype matrix -o testname -p 200000-5000000

The output will be testname_leftdTAD.csv and testname_rightdTAD.csv, as:

chr	TADstart	TADend
chr21	17900000	18250000
chr21	26800000	27700000
...	...	...

4.4 stripe

This function automatically identify all regions with stripe ‘structure’. The key idea is to find the sharp, strong IAS peaks. We used the similar strategy which use insulation score to identify TAD boundaries. For the stripe calling, after extracting the local maximum positions of IAS, only positions with IAS > IASmean is retained. Then, similar to Crane et.al, Nature 2015, we calculate a delta vector of IAS for each bin to extract only strong IAS peaks. To avoid clustered small peaks, the IAS value of a ‘stripe’ position should be higher than any position around 100kb.

h1d call stripe ./test_data/GSE104334_Ctrl.chr21.matrix.gz 50000 chr21 \
	--datatype matrix -o testname

This will output the summit of IAS signal, i.e. the stripes:

chr	start	end	IAS
chr21	15450000	15500000	2.333717
chr21	15900000	15950000	2.561192
chr21	16300000	16350000	1.892257
...	...	...	...

4.5 stripe-TAD

This function simply divide all TAD into “loop”, “left-stripe”, “right-stripe” and “other” TAD:

h1d call stripeTAD ./test_data/GSE104334_Ctrl.chr21.matrix.gz 50000 chr21 \
	--datatype matrix -o testname -p 300000

The output will be testname_stripe.csv, as:

chr	TADstart	TADend	TADtype
chr21	9500000	10250000	loopTAD
chr21	14800000	15600000	leftStripe
chr21	15600000	16850000	otherTAD
...	...	...	...

4.6 Hubs

This function extract chromatin Hubs as described in PMID: 26272203

Comutation of hubs requires rawhic input

h1d call hubs ./test_data/GSE104334_Ctrl.hic 50000 chr21 \
	--datatype rawhic --gt ./test_data/hg19_genome_table.txt -o testname -p 0.05

The output will be testname_hubs.csv in .bed style, as :

chr21	15450000	15750000
chr21	15850000	16000000
...	...	...

4.7 TAD

This function will use Insulation Score to simply call TAD:

h1d call TAD ./test_data/GSE104334_Ctrl.chr21.matrix.gz 50000 chr21 \
	--datatype matrix -o testname -p 300000