正文
网罗了一大波生物信息学数据分析方面的工程师,是时候官宣咱们的单细胞数据处理
业务列表
:
我们的单细胞产品线还是蛮丰富的:
-
-
-
-
扫描下面二维码即可添加微信咨询!
(添加好友务必备注 高校或者工作单位+姓名,方便后续认识)
-
冷知识:其实一个10X单细胞转录组样品可以有多达84个fastq文件哦!
我们在单细胞天地多次分享过
cellranger
流程的笔记,大家可以自行前往学习,如下:
因为这个流程其实是需要10X单细胞转录组的fastq文件,而且呢,命名是有规则的!
如果你的样品被分散到了多个library、flowcell,就会出现一个样品有84个fastq文件的情况,恰好我看到了一个文献里面的数据就是这样的情况。该研究于2021年3月发表在《Nature Communications》杂志的文章, 标题是:《Time-resolved single-cell analysis of Brca1 associated mammary tumourigenesis reveals aberrant differentiation of luminal progenitors》,链接是:https://www.nature.com/articles/s41467-021-21783-3
如下所示的一个样品,是SIGAA11, 足足有84个fastq文件,如下所示:
SIGAA11_S37_L003_R1_001.fastq.gz
SIGAA11_S37_L003_R2_001.fastq.gz
SIGAA11_S37_L003_I1_001.fastq.gz
SIGAA11_S37_L004_R1_001.fastq.gz
SIGAA11_S37_L004_R2_001.fastq.gz
SIGAA11_S37_L004_I1_001.fastq.gz
SIGAA11_S37_L005_R1_001.fastq.gz
SIGAA11_S37_L005_R2_001.fastq.gz
SIGAA11_S37_L005_I1_001.fastq.gz
SIGAA11_S37_L006_R1_001.fastq.gz
SIGAA11_S37_L006_R2_001.fastq.gz
SIGAA11_S37_L006_I1_001.fastq.gz
SIGAA11_S37_L007_R1_001.fastq.gz
SIGAA11_S37_L007_R2_001.fastq.gz
SIGAA11_S37_L007_I1_001.fastq.gz
SIGAA11_S37_L008_R1_001.fastq.gz
SIGAA11_S37_L008_R2_001.fastq.gz
SIGAA11_S37_L008_I1_001.fastq.gz
SIGAA11_S37_L009_R1_001.fastq.gz
SIGAA11_S37_L009_R2_001.fastq.gz
SIGAA11_S37_L009_I1_001.fastq.gz
SIGAA11_S38_L003_R1_001.fastq.gz
SIGAA11_S38_L003_R2_001.fastq.gz
SIGAA11_S38_L003_I1_001.fastq.gz
SIGAA11_S38_L004_R1_001.fastq.gz
SIGAA11_S38_L004_R2_001.fastq.gz
SIGAA11_S38_L004_I1_001.fastq.gz
SIGAA11_S38_L005_R1_001.fastq.gz
SIGAA11_S38_L005_R2_001.fastq.gz
SIGAA11_S38_L005_I1_001.fastq.gz
SIGAA11_S38_L006_R1_001.fastq.gz
SIGAA11_S38_L006_R2_001.fastq.gz
SIGAA11_S38_L006_I1_001.fastq.gz
SIGAA11_S38_L007_R1_001.fastq.gz
SIGAA11_S38_L007_R2_001.fastq.gz
SIGAA11_S38_L007_I1_001.fastq.gz
SIGAA11_S38_L008_R1_001.fastq.gz
SIGAA11_S38_L008_R2_001.fastq.gz
SIGAA11_S38_L008_I1_001.fastq.gz
SIGAA11_S38_L009_R1_001.fastq.gz
SIGAA11_S38_L009_R2_001.fastq.gz
SIGAA11_S38_L009_I1_001.fastq.gz
SIGAA11_S39_L003_R1_001.fastq.gz
SIGAA11_S39_L003_R2_001.fastq.gz
SIGAA11_S39_L003_I1_001.fastq.gz
SIGAA11_S39_L004_R1_001.fastq.gz
SIGAA11_S39_L004_R2_001.fastq.gz
SIGAA11_S39_L004_I1_001.fastq.gz
SIGAA11_S39_L005_R1_001.fastq.gz
SIGAA11_S39_L005_R2_001.fastq.gz
SIGAA11_S39_L005_I1_001.fastq.gz
SIGAA11_S39_L006_R1_001.fastq.gz
SIGAA11_S39_L006_R2_001.fastq.gz
SIGAA11_S39_L006_I1_001.fastq.gz
SIGAA11_S39_L007_R1_001.fastq.gz
SIGAA11_S39_L007_R2_001.fastq.gz
SIGAA11_S39_L007_I1_001.fastq.gz
SIGAA11_S39_L008_R1_001.fastq.gz
SIGAA11_S39_L008_R2_001.fastq.gz
SIGAA11_S39_L008_I1_001.fastq.gz
SIGAA11_S39_L009_R1_001.fastq.gz
SIGAA11_S39_L009_R2_001.fastq.gz
SIGAA11_S39_L009_I1_001.fastq.gz
SIGAA11_S40_L003_R1_001.fastq.gz
SIGAA11_S40_L003_R2_001.fastq.gz
SIGAA11_S40_L003_I1_001.fastq.gz
SIGAA11_S40_L004_R1_001.fastq.gz
SIGAA11_S40_L004_R2_001.fastq.gz
SIGAA11_S40_L004_I1_001.fastq.gz
SIGAA11_S40_L005_R1_001.fastq.gz
SIGAA11_S40_L005_R2_001.fastq.gz
SIGAA11_S40_L005_I1_001.fastq.gz
SIGAA11_S40_L006_R1_001.fastq.gz
SIGAA11_S40_L006_R2_001.fastq.gz
SIGAA11_S40_L006_I1_001.fastq.gz
SIGAA11_S40_L007_R1_001.fastq.gz
SIGAA11_S40_L007_R2_001.fastq.gz
SIGAA11_S40_L007_I1_001.fastq.gz
SIGAA11_S40_L008_R1_001.fastq.gz
SIGAA11_S40_L008_R2_001.fastq.gz
SIGAA11_S40_L008_I1_001.fastq.gz
SIGAA11_S40_L009_R1_001.fastq.gz
SIGAA11_S40_L009_R2_001.fastq.gz
SIGAA11_S40_L009_I1_001.fastq.gz
如果你仔细观察这84个fastq文件的名字,就会发现规律,如果以下划线为分隔符,那么
总共就是 4x7x3=84个fastq文件。
当然了,并不是每个10X样品都有84个fastq文件哈。甚至绝大多数情况下,就3个文件,如下所示:
5.6G 1月 21 10:29 YX-Endo-Decidu_S1_L001_I1_001.fastq.gz
44G 1月 21 10:33 YX-Endo-Decidu_S1_L001_R1_001.fastq.gz
118G 1月 21 10:44 YX-Endo-Decidu_S1_L001_R2_001.fastq.gz
2.6G 1月 21 10:44 YX-PBMC-Decidu_S1_L001_I1_001.fastq.gz
21G 1月 21 10:46 YX-PBMC-Decidu_S1_L001_R1_001.fastq.gz
56G 1月 21 10:51 YX-PBMC-Decidu_S1_L001_R2_001.fastq.gz
2.3G 1月 21 12:31 ZZX-PBMC_S1_L001_I1_001.fastq.gz
17G 1月 21 12:32 ZZX-PBMC_S1_L001_R1_001.fastq.gz
50G 1月 21 12:38 ZZX-PBMC_S1_L001_R2_001.fastq.gz
2.6G 1月 21 12:38 ZZX-yuan-2_S1_L001_I1_001.fastq.gz
19G 1月 21 12:40 ZZX-yuan-2_S1_L001_R1_001.fastq.gz
56G 1月 21 12:46 ZZX-yuan-2_S1_L001_R2_001.fastq.gz
如上所示,可以看到每个样品都是3个文件,分别是 R1,R2,I1 这样的标记!
极端情况下,2个文件也OK,跑cellranger流程没有问题!
14G 3月 1 16:19 test_L3_X37-1.R1.fastq.gz
11G 3月 1 16:41 test_L3_X37-1.R2.fastq.gz
14G 3月 1 16:21 test_L3_X37-2.R1.fastq.gz
11G 3月 1 16:43 test_L3_X37-2.R2.fastq.gz
14G 3月 1 16:24 test_L3_X37-3.R1.fastq.gz
11G 3月 1 16:37 test_L3_X37-3.R2.fastq.gz
13G 3月 1 16:27 test_L3_X37-4.R1.fastq.gz
11G 3月 1 16:36 test_L3_X37-4.R2.fastq.gz
如下所示,一个样品其实就R1和R2两个文件,但是上面这个文件名并不是符合Cell Ranger流程哦,需要修改文件名!
现在,关于10X单细胞转录组原始测序数据,你懂了吗?
在2020的7月我看到了其更新到了V4,也里面写了一个总结,见:
cellranger更新到4啦(全新使用教程)
,但是马上有升级了,目前是Cell Ranger - 5.0.1 (December 16, 2020),现在已经是V6,不过笔记其实是大同小异啦!
跑完了Cell Ranger仅仅是
拿到了表达量矩阵文件
而已,10x单细胞转录组数据的每个样品都有3个文件的表达矩阵,如下所示: