前期文章
一个超简单的转录组项目全过程--iMac+RNA-Seq(一)
一个超简单的转录组项目全过程--iMac+RNA-Seq(二)QC
一个超简单的转录组项目全过程--iMac+RNA-Seq(三)Alignment 比对
4 featureCounts
比对结束后需要进行计数咯!
老规矩,先看说明书。
featureCounts -h ## 好像发现了什么不得了的
$featureCounts -h
featureCounts: invalid option -h
Version 1.6.2
Usage:
featureCounts [options] -a
-o
input
file1 [input
file2] ...
-T
number of threads
-p
If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for
paired-end reads
.
-t
Specify feature type in GTF annotation.
'exon' by default
. Features used for read counting will be extracted from annotation using the provided value.
-g
Specify attribute type in GTF annotation. '
gene_id' by default
. Meta-features used for read counting will be extracted from annotation using the provided value.
-a
Name of an annotation file.
GTF/GFF format by defaul
t. See -F option for more format information. Inbuilt annotations (SAF format) is available in 'annotation' directory of the package. Gzipped file is also accepted.
-o
N
ame of the output file including read counts
. A separate file including summary statistics of counting results is also included in the output ('
.summary')
用法一
## 教程代码,使用for循环
mkdir $wkd/align #这样会生成单个的,每个都有自己的名字
cd $wkd/align
source activate rna
for fn in {512..520}
do
featureCounts
-T 4 -p -t exon -g gene_id \
-a /Users/bioinformatic/reference/hg38/hg38.gtf \
-o counts.SRR1039$fn.txt /home/rna/clean/SRR1039$fn.hisat.bam
done
source deactivate
用法二:酷酷的
vim count.sh ## 创建一个脚本叫做count
#!/bin/bash
set -e
set -u
set -o pipefail
# set PATH 设置路径
HUMAN=/Users/bioinformatic/reference/hg38
OUTDIR
=/Users/Desktop/project/clean/align
# source activate rna
cd $OUTDIR # 打开目标文件夹
pwd ## 显示当前路径
ls *bam | cut -d"_" -f 1| sort -u |\
while read id
do # 用echo语句来提示脚本运行进度
echo "processing ${id}_combined.hisat.bam"
if [ ! -f ${id}.hisat.done ]
then
featureCounts -T 4 -p \
-t exon -g gene_id -a $HUMAN/hg38
.gtf \
-o counts.${id}.txt \
$OUTDIR/${id}_combined.hisat.bam && touch ${id}.hisat.done
fi
done
# source deactivate
Note:脚本很长的时候,记得使用“\”符号加回车,让行文变得整洁易读。
运行脚本
nohup bash count.sh &
查看结果
cat nohup.out就可以看到featureCounts的运行日志,请注意以下两个数据
26834745 (92.89%) aligned concordantly exactly 1 time
97.89% overall alignment rate
小结:本章所需知识背景和技能
(1)基因表达量的计算到底在算什么?
(2)TPM, RPKM, FPKM这几个计量方式的区别
(3)学会查看帮助文档,了解软件用法
(4)nohup是什么,nohup和&的区别等等
用法二的知识点
在酷酷的用法二里面有几个很酷的地方:
#!/bin/bash
set -e
set -u
set -o pipefail
# touch的用法
# if then fi 语句的使用
shell脚本和正则表达式:必学