R绘图系列-ComplexHeatmap使用记录

这篇文章主要学习了使用ComplexHeatmap来绘制想要的热图的过程,首先以绘制差异表达基因的热图为例学习了常规热图的绘制方法,后续又详细学习了使用oncoPrint绘制突变谱的方法

输入数据类型

最好是矩阵,如果是dataframe可能会出现warning

1
2
Warning message:
The input is a data frame, convert it to the matrix.

如果想颠倒横纵坐标画图,可以直接使用t(matrix)进行转置。


设置颜色

设置颜色有多种方式,可以使用ComplexHeatmap推荐的colorRamp2函数,其优点在于可以设置不同的breaks对应的颜色值

1
2
3
4
5
library(circlize)
red_green=colorRamp2(breaks=c(min(diff_genes_matrix),
mean(diff_genes_matrix),
max(diff_genes_matrix)),
colors=c("red", "black", "green"))


设置legend

legend在热图的右侧:

1
2
3
4
5
heatmap_legend_param = list(color_bar = "continuous", 
at = quantile(diff_genes_matrix, c(0,1)),
labels = c("Min", "Max"),
legend_height = unit(4, "cm"),
title_position = "leftcenter-rot")

  • color_bar: 连续的
  • at和后面的labels连用,在什么地方标注什么
  • legend_height设置legend高度
  • title_position:设置legend名称的位置,这个需要和Heatmap的参数name连用来控制legend名称,在垂直的legend下可用的包括'topleft', 'topcenter', 'leftcenter-rot' and 'lefttop-rot'
  • legend_direction = "horizontal":设置legend横置

legend在热图的下方:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
heatmap_legend_param = list(color_bar = "continuous", 
at = quantile(diff_genes_matrix_scale, c(0,1)),
labels = c("Min", "Max"),legend_width = unit(4, "cm"),
# 当legend方向为横向的时候,title_position只能取'topleft', 'topcenter', 'lefttop' and 'leftcenter'
title_position = "topcenter",
legend_direction ="horizontal")


# 设置legend的位置
ht=Heatmap(
...,
heatmap_legend_param
)
draw(ht,heatmap_legend_side="bottom")

参考链接:


设置row和col名称的顺序

在设置rowcol名称的顺序时,行和列的聚类自动会去除

1
2
3
4
5
6
# row1、row10、row11、row2...
row_order = sort(rownames(diff_genes_matrix))

# 直接指定顺序:
my_order=c("row1","row2","row10","row11")
row_order = my_order

参考链接:

需要注意的是:如果在设置了顺序的同时也设置了row_split,那么手动设置的row_order就会失效;在设置了cluster_rows = FALSE的情况下,默认的顺序就是matrix行名的顺序,参考How to avoid reordering of matrix components with Split?,所以正确的做法是:

  • 先修改matrix的顺序,可以参考R系列之向量、矩阵、数组、数据框和列表
  • 关闭cluster_rows参数,直接绘图即可
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    # 原始的矩阵顺序
    rownames(diff_genes_matrix)
    [1] "BE_W_1" "BE_W_3" "BE_W_6" "BE_C_2" "BE_C_4" "BE_C_5" "Mutation1" "Mutation2" "Mutation3"

    # 先修改matrix顺序
    diff_genes_matrix_sort=diff_genes_matrix[c("BE_W_1","BE_W_3","BE_W_6","Mutation1","Mutation2","Mutation3","BE_C_2","BE_C_4","BE_C_5"),]
    rownames(diff_genes_matrix_sort)
    [1] "BE_W_1" "BE_W_3" "BE_W_6" "Mutation1" "Mutation2" "Mutation3" "BE_C_2" "BE_C_4" "BE_C_5"

    Heatmap(diff_genes_matrix_sort, show_column_names = FALSE,
    show_row_dend = FALSE, show_column_dend = F,
    col=red_green,
    name="Expression level",

    row_names_side = "left",
    cluster_rows = FALSE,
    heatmap_width = unit(24, "cm"), heatmap_height = unit(8, "cm"),
    row_split = split,
    row_title = NULL,
    row_gap = unit(0.5, "mm"),
    heatmap_legend_param = list(color_bar = "continuous",
    at = quantile(diff_genes_matrix, c(0,1)),
    labels = c("Min", "Max"),legend_height = unit(4, "cm"),
    title_position = "leftcenter-rot")
    )

complexheatmap_order_row_and_split.png


设置rowname的左右

默认情况下rowname是在右边,而聚类是在左边,如果想修改这顺序可以使用:

1
row_names_side = "left", row_dend_side = "right"


设置cell的大小

ComplexHeatmap没有直接设置cell大小的参数,但是可以通过设置整个heatmap的大小来达到目的:

1
Heatmap(mat, ..., heatmap_height = unit(1, "cm")*nrow(mat))

widthheatmap_widthheightheatmap_height 都用于控制热图的大小。默认情况下,所有热图组件都具有固定的宽度或高度,例如 行树形图的宽度为 1cmheatmap_widthheatmap_height 控制整个热图的宽度/高度包括所有热图组件(不包括图例),而 widthheight 仅控制 heamtap 主体的宽度/高度.


对某些行名和列名进行注释

有时候显示全部的行名或者列名可能会比较杂乱,特别是行或者列很多的情况下,这个时候可以选择一些比较重要的进行展示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 在Heatmap中添加如下
# link_height:控制线的长短
labels = "Spp1"
top_annotation = columnAnnotation(link = anno_mark(at = c(33),
labels = labels,
link_height = unit(4, "mm"))
)

# 批量的做法
mark_genes=c("Spp1")
gene_pos <- match(mark_genes,colnames(diff_genes_matrix))
top_annotation = columnAnnotation(link = anno_mark(at = gene_pos,
labels = mark_genes,
link_height = unit(4, "mm"))
)

注意:这个只能放在top_annotation,如果是bottom_annotation会发现那个连接线会出现在下方,而不是个热图连接起来,其实还是比较适合在行上进行注释,会比较好看一些

其他参数:

  • padding:控制相邻label之间的距离,默认为padding = unit(1, "mm")

complexheatmap_anno_col_row_names.png


在行或者列添加注释块

在做热图中不可避免地会遇到需要将多个重复或者一类的样本进行标记的需求,这里就学习在行或者列上添加这种注释块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 先得到注释块的大小和颜色等信息
# 这里的a、b、c会显示成legend的名称
# gap:注释块与热图之间的距离
df = data.frame(type = c(rep("a", 3), rep("b", 3), rep("c", 3)))
ha = rowAnnotation(df = df, col = list(type = c("a" = "red", "b" = "blue","c"="black")),show_legend = F,
width = unit(1, "cm"),show_annotation_name = F, gap = unit(1, "points"))

# 在Heatmap中添加如下
left_annotation = ha

# 完整版
df = data.frame(type = c(rep("a", 3), rep("b", 3), rep("c", 3)))
ha = rowAnnotation(df = df, col = list(type = c("a" = "red", "b" = "blue","c"="black")),show_legend = F,
width = unit(1, "cm"),show_annotation_name = F,gap = unit(1, "points"))

Heatmap(diff_genes_matrix, show_column_names = FALSE,
show_row_dend = FALSE, show_column_dend = F,
col=red_green,
name="Expression level",
row_order = sort(rownames(diff_genes_matrix)),
row_names_side = "left",
heatmap_width = unit(24, "cm"), heatmap_height = unit(8, "cm"),
top_annotation = columnAnnotation(link = anno_mark(at = c(33),
labels = labels,
link_height = unit(4, "mm")
)
),
left_annotation = ha,
heatmap_legend_param = list(color_bar = "continuous",
at = quantile(diff_genes_matrix, c(0,1)),
labels = c("Min", "Max"),legend_height = unit(4, "cm"),
title_position = "leftcenter-rot")
)

complexheatmap_anno_block.png


将不同的注释块分开

在上一步得到不同的注释块了之后可以进一步地将不同的注释块之间分开:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# 设置分开的dataframe
split = data.frame(rep(c("A", "B","C"), each=3))

# Heatmap中相关设置
## 设置分隔
row_split = split,
## 去除分隔后添加的title:A、B、C
row_title = NULL,
## 设置分隔的间隔大小
row_gap = unit(5, "mm")

# 完整设置
split = data.frame(rep(c("A", "B","C"), each=3))
Heatmap(diff_genes_matrix, show_column_names = FALSE,
show_row_dend = FALSE, show_column_dend = F,
col=red_green,
name="Expression level",
row_order = sort(rownames(diff_genes_matrix)),
row_names_side = "left",
heatmap_width = unit(24, "cm"), heatmap_height = unit(8, "cm"),
top_annotation = columnAnnotation(link = anno_mark(at = c(33),
labels = labels,
link_height = unit(4, "mm")
)
),
left_annotation = ha,
row_split = split,
row_title = NULL,
row_gap = unit(5, "mm"),
heatmap_legend_param = list(color_bar = "continuous",
at = quantile(diff_genes_matrix, c(0,1)),
labels = c("Min", "Max"),legend_height = unit(4, "cm"),
title_position = "leftcenter-rot")
)

complexheatmap_anno_block_split.png


完整脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
library(ComplexHeatmap)
library(circlize)

# 对counts做了变换,log2(normalization + 1)
diff_genes_matrix=t(as.matrix(log2(diff_genes_df+1)))
diff_genes_matrix
Cdh4 Sema6b Meox1 Col1a1 Ell2 Tcea3 Arrdc2 Casq1 Bcl2l1 Piezo1 Mov10l1
BE_W_1 5.674636 8.992203 8.271125 10.80915 9.010202 10.84969 8.550702 6.719983 10.534402 9.251238 9.815931
BE_W_3 6.278332 8.937379 7.755463 10.48085 8.901625 10.94128 8.931936 6.613638 10.654491 9.340083 10.054614
BE_W_6 6.053069 9.046550 8.143926 10.77555 9.108047 10.90593 9.411026 6.555065 10.923830 9.369440 10.122322
BE_C_2 6.970798 8.712448 7.825648 10.61191 8.854695 10.90589 8.221351 6.415780 10.006843 9.411790 9.938315
......

labels = "Spp1"

red_green=colorRamp2(breaks=c(min(diff_genes_matrix),
mean(diff_genes_matrix),
max(diff_genes_matrix)),
colors=c("red", "black", "green"))

df = data.frame(type = c(rep("a", 3), rep("b", 3), rep("c", 3)))
ha = rowAnnotation(df = df, col = list(type = c("a" = "red", "b" = "blue","c"="black")),show_legend = F,
width = unit(1, "cm"),show_annotation_name = F,gap = unit(1, "points"))

split = data.frame(rep(c("A", "B","C"), each=3))

Heatmap(diff_genes_matrix, show_column_names = FALSE,
show_row_dend = FALSE, show_column_dend = F,
col=red_green,
name="Expression level",
row_order = sort(rownames(diff_genes_matrix)),
row_names_side = "left",
heatmap_width = unit(24, "cm"), heatmap_height = unit(8, "cm"),
top_annotation = columnAnnotation(link = anno_mark(at = c(33),
labels = labels,
link_height = unit(4, "mm")
)
),
left_annotation = ha,
row_split = split,
row_title = NULL,
row_gap = unit(2, "mm"),
heatmap_legend_param = list(color_bar = "continuous",
at = quantile(diff_genes_matrix, c(0,1)),
labels = c("Min", "Max"),legend_height = unit(4, "cm"),
title_position = "leftcenter-rot")
)

最终版的图见上!


oncoPrint-适合展示突变

ComplexHeatmap专门做了个oncoPrint函数绘制突变相关信息,感觉和maftools的作用很类似,后续可以重点学习和使用一下。

输入数据

使用分隔符分割的多种突变名称

1
2
3
4
5
6
7
8
9
10
11
12
# 示例数据
mat = read.table(textConnection(
"s1,s2,s3
g1,snv;indel,snv,indel
g2,,snv;indel,snv
g3,snv,,indel;snv"), row.names = 1, header = TRUE, sep = ",", stringsAsFactors = FALSE)
mat = as.matrix(mat)
mat
s1 s2 s3
g1 "snv;indel" "snv" "indel"
g2 "" "snv;indel" "snv"
g3 "snv" "" "indel;snv"

使用0/1表示的多个variant列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mat_list = list(snv = matrix(c(1, 0, 1, 1, 1, 0, 0, 1, 1), nrow = 3),
indel = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 0), nrow = 3))
rownames(mat_list$snv) = rownames(mat_list$indel) = c("g1", "g2", "g3")
colnames(mat_list$snv) = colnames(mat_list$indel) = c("s1", "s2", "s3")
mat_list
$snv
s1 s2 s3
g1 1 1 0
g2 0 1 1
g3 1 0 1

$indel
s1 s2 s3
g1 1 0 1
g2 0 1 0
g3 0 0 0

oncoPrint()要求mat_list中的所有元素都有相同的行名和列名,并且绘图也不用添加get_type参数。


得到突变类别

1
2
3
4
5
# 示例数据中一个基因可能同时存在多种突变类别,多种突变类别是使用;分割的
# 得到突变类别的方法
get_type_fun = function(x) strsplit(x, ";")[[1]]
get_type_fun(mat[1, 1])
[1] "snv" "indel"

针对常见的一些分隔符,如;:,|oncoPrint()都会自动的将其分割,不用单独指定;如果自己数据中的分隔符不在常见分隔符中,需要自定义get_type_fun,并将其传递给oncoPrint()get_type参数。


设置variant的颜色

不同类型variant的颜色可以使用oncoPrint()的col参数进行设置,col参数是一个带名称的向量:

1
2
3
4
5
# 带名称的向量
col = c(snv = "red", indel = "blue")
col
snv indel
"red" "blue"


设置variant的形状

一个样本的一个基因可能会存在多种突变形式,在画图的时候如果不做区别就会出现重叠的现象,oncoPrint()提供了alter_fun参数来修改不同突变的展示形状,便于区分。

alter_fun可以是一个包含了逐层添加图形的函数列表(比如:先画snv的图层,接着画indel的图层),这种情况下alter_fun中的函数应该包含四个参数:

  • xy表示坐标位置:positions of the grids on the oncoPrint (x and y)
  • wh表示宽度和高度:widths and heights of the grids (w and h, which is measured in npc unit)

同时,alter_fun也可以通过一个单一的函数来指定,这种情况下alter_fun不再是一个函数的列表,而是一个包含了五个参数的函数,增加的参数是v,这个参数是一个逻辑值,用于判断某个样本中的某个基因是不是包含了某个突变,其也是一个带名称的向量:

1
2
3
4
# 某个样本的某个基因的v参数示例
# 只包括一个snv
snv indel
TRUE FALSE

使用函数列表以及使用单个函数的示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 使用函数列表,包含四个参数
oncoPrint(mat,
alter_fun = list(
snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9,
gp = gpar(fill = col["snv"], col = NA)),
indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4,
gp = gpar(fill = col["indel"], col = NA))
), col = col)

# 使用单个函数,添加了v作为逻辑值来判断
oncoPrint(mat,
alter_fun = function(x, y, w, h, v) {
if(v["snv"]) grid.rect(x, y, w*0.9, h*0.9, # v["snv"] is a logical value
gp = gpar(fill = col["snv"], col = NA))
if(v["indel"]) grid.rect(x, y, w*0.9, h*0.4, # v["indel"] is a logical value
gp = gpar(fill = col["indel"], col = NA))
}, col = col)

关于grid的位置参数

因为绘图需要涉及到x、y、w、h四个grid绘图语法的参数,因为自身对grid语法并不是很了解,所以这里补充一下:

1
2
3
4
5
library(grid)
grid.show.viewport(viewport(
x = 0.6, y = 0.6,
w = unit(1, "inches"), h = unit(1, "inches")
))

complexheatmap_oncoprint_grid.png

从图形中可以看出,(x,y)其实是单个grid的中心点,其实这个主要是由参数just控制的,默认justcentre表明(x,y)位于grid的中心位置。

修改参数just值可以很明显看出来区别:

1
2
3
4
5
6
# 设置(x,y)位于top,也就是上中
grid.show.viewport(viewport(
x = 0.6, y = 0.6,
w = unit(1, "inches"), h = unit(1, "inches"),
just="top"
))

complexheatmap_oncoprint_grid_2.png


按比例绘图

使用单个的函数来绘制图形可以更方便的定制化:

1
2
3
4
5
6
7
8
9
10
# 按比例画图
# 可以直接将mat替换为前面出现过的mat_list
oncoPrint(mat,
alter_fun = function(x, y, w, h, v) {
n = sum(v) # how many alterations for current gene in current sample
h = h*0.9
# use `names(which(v))` to correctly map between `v` and `col`
if(n) grid.rect(x, y - h*0.5 + 1:n/n*h, w*0.9, 1/n*h,
gp = gpar(fill = col[names(which(v))], col = NA), just = "top")
}, col = col)

解释一下:

  • n = sum(v):某个样本在某个基因上的突变数目(不进行类别上的区分)
  • h = h*0.9:设置的是每个图形显示的大小,后面的那个h应该是某个grid的完整大小,0.9倍用于分隔不同的基因。
  • 最后的if语句:假定是一个存在一个snv和一个indel的基因,此时的n=2,另个小grid的坐标分别是(x,y)(x,y+h/2),需要注意的是这里的just参数的值是top,也就是这里的坐标都是上中的坐标

complexheatmap_oncoprint_pro_1.png


绘制三角形图案

三角形图案的优点在于更加直观相比于成比例的块状图案,但是其表征的variant种类会更加少一些,因为一个grid分为两个三角形(更复杂的图案就比较难了),而块状图案可以按比例分为很多小块;并且使用的话也就更加复杂一些,首先是要使用到grid.polygon这个函数,其主要是用于绘制多边形,用其来绘制三角形图案的时候需要提供的xy都是三个参数的向量,所以比较复杂。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
oncoPrint(mat,
alter_fun = list(
background = function(x, y, w, h) {
grid.polygon(
# unit.c(x - 0.5*w, x - 0.5*w, x + 0.5*w)是一组x的坐标,其中每个与后面的unit.c(y - 0.5*h, y + 0.5*h, y - 0.5*h)一一对应
unit.c(x - 0.5*w, x - 0.5*w, x + 0.5*w),
unit.c(y - 0.5*h, y + 0.5*h, y - 0.5*h),
gp = gpar(fill = "grey", col = "white"))
grid.polygon(
unit.c(x + 0.5*w, x + 0.5*w, x - 0.5*w),
unit.c(y + 0.5*h, y - 0.5*h, y + 0.5*h),
gp = gpar(fill = "grey", col = "white"))
},
snv = function(x, y, w, h) {
grid.polygon(
unit.c(x - 0.5*w, x - 0.5*w, x + 0.5*w),
unit.c(y - 0.5*h, y + 0.5*h, y - 0.5*h),
gp = gpar(fill = col["snv"], col = "white"))
},
indel = function(x, y, w, h) {
grid.polygon(
unit.c(x + 0.5*w, x + 0.5*w, x - 0.5*w),
unit.c(y + 0.5*h, y - 0.5*h, y + 0.5*h),
gp = gpar(fill = col["indel"], col = "white"))
}
), col = col)

complexheatmap_oncoprint_tri.png


用于支持更多variant类型

某个样本的某个基因出现多种突变类型是很常见的事,所以如何在单个grid中表征更多的突变类型就非常关键,oncoPrint针对这种情况设计了test_alter_fun()函数来查看设计的alter_fun形状。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
alter_fun = list(
mut1 = function(x, y, w, h)
grid.rect(x, y, w, h, gp = gpar(fill = "red", col = NA)),
mut2 = function(x, y, w, h)
grid.rect(x, y, w, h, gp = gpar(fill = "blue", col = NA)),
mut3 = function(x, y, w, h)
grid.rect(x, y, w, h, gp = gpar(fill = "yellow", col = NA)),
mut4 = function(x, y, w, h)
grid.rect(x, y, w, h, gp = gpar(fill = "purple", col = NA)),
mut5 = function(x, y, w, h)
grid.rect(x, y, w, h, gp = gpar(fill = NA, lwd = 2)),
mut6 = function(x, y, w, h)
grid.points(x, y, pch = 16),
mut7 = function(x, y, w, h)
grid.segments(x - w*0.5, y - h*0.5, x + w*0.5, y + h*0.5, gp = gpar(lwd = 2))
)
test_alter_fun(alter_fun)

虽然下面这种给出了14中不同的组合类型,但是其实存在一些问题,比如前4中mut类型之间如果存在重叠的话就会出现相互遮挡的问题,比较好的解决方法就是使用前面的按比例分配。所以个人的想法是:纯色的用来表征不同的突变类型,然后剩下的几种框、点、线用于表征其他信息,比如表达值、CNV信息。

complexheatmap_oncoprint_test_alter_1.png


设置grid背景颜色

在以函数列表形式设置alter_fun参数的时候,不同的图形是按顺序加上了,所以如果想给每个grid添加背景的话可以在函数列表中第一个就添加一个background函数用于设置grid的背景颜色。

1
2
3
4
5
6
7
8
9
10
# 设置背景颜色为light green,并且添加了框线
oncoPrint(mat,
alter_fun = list(
background = function(x, y, w, h) grid.rect(x, y, w, h,
gp = gpar(fill = "#00FF0020")),
snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9,
gp = gpar(fill = col["snv"], col = NA)),
indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4,
gp = gpar(fill = col["indel"], col = NA))
), col = col)

complexheatmap_oncoprint_bg_2.png

如果想去除任何的框线以及颜色背景,可以将background设置为NULL:

1
2
3
4
5
6
7
8
9
oncoPrint(mat,
alter_fun = list(
# 设置背景为NULL
background = function(...) NULL,
snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9,
gp = gpar(fill = col["snv"], col = NA)),
indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4,
gp = gpar(fill = col["indel"], col = NA))
), col = col)

complexheatmap_oncoprint_bg_1.png


按重要性绘制图层

在绘图时有些信息是主要信息,有些信息可能是次要信息,但是都是必须要展示的图形,这种情况下其实就是前面用于支持更多variant类型最后我提到的用纯色表示不同的突变类型,然后剩下的几种框、点、线用于表征其他信息,比如表达值、CNV信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# 生成虚拟数据
set.seed(123)
x1 = sample(c("", "snv"), 100, replace = TRUE, prob = c(8, 2))
x2 = sample(c("", "indel"), 100, replace = TRUE, prob = c(8, 2))
x2[x1 == "snv"] = ""
x3 = sample(c("", "intronic"), 100, replace = TRUE, prob = c(5, 5))
x4 = sample(c("", "exonic"), 100, replace = TRUE, prob = c(5, 5))
x3[x1 == "" & x2 == ""] = ""
x4[x1 == "" & x2 == ""] = ""
x4[x3 == "intronic"] = ""
x = apply(cbind(x1, x2, x3, x4), 1, function(x) {
x = x[x != ""]
paste(x, collapse = ";")
})
m = matrix(x, nrow = 10, ncol = 10, dimnames = list(paste0("g", 1:10), paste0("s", 1:10)))
m[1:4, 1:4]
s1 s2 s3 s4
g1 "" "snv;intronic" "snv;intronic" "snv"
g2 "" "" "" "snv;intronic"
g3 "" "" "" ""
g4 "snv" "indel;exonic" "snv" ""

# 设置alter_fun
alter_fun = list(
background = function(x, y, w, h)
grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#CCCCCC", col = NA)),
# red rectangles
snv = function(x, y, w, h)
grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "red", col = NA)),
# blue rectangles
indel = function(x, y, w, h)
grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "blue", col = NA)),
# dots
intronic = function(x, y, w, h)
grid.points(x, y, pch = 16),
# crossed lines
exonic = function(x, y, w, h) {
grid.segments(x - w*0.4, y - h*0.4, x + w*0.4, y + h*0.4, gp = gpar(lwd = 2))
grid.segments(x + w*0.4, y - h*0.4, x - w*0.4, y + h*0.4, gp = gpar(lwd = 2))
}
)

# 最终绘图
# we only define color for snv and indel, so barplot annotations only show snv and indel
ht = oncoPrint(m, alter_fun = alter_fun, col = c(snv = "red", indel = "blue"))
draw(ht, heatmap_legend_list = list(
Legend(labels = c("intronic", "exonic"), type = "points", pch = c(16, 28))
))

complexheatmap_oncoprint_intron_exon_1.png

在单个函数中实现上述类似的用法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
oncoPrint(mat_list,
alter_fun = function(x, y, w, h, v) {
n = sum(v) # how many alterations for current gene in current sample
h = h*0.9
# use `names(which(v))` to correctly map between `v` and `col`
if(n) grid.rect(x, y - h*0.5 + 1:n/n*h, w*0.9, 1/n*h,
gp = gpar(fill = col[names(which(v))], col = NA), just = "top")
# 判断总的variant的数目,用于得到正确的坐标
flag=1
# 判断snv类型是不是存在,如果存在indel就会在上层,如果不存在indel就会出现在下层(存在第三种突变的情况)
flag_log=0
# 因为原始的数据中,snv出现在indel之前,所以在indel上增加各种操作会比较麻烦
for (i in names(v)){
if (i =="snv" & v[i]) {
grid.rect(x, y - h*0.5 + flag/n*h, w*0.9, 1/n*h,
gp = gpar(fill = NA, lwd = 2), just = "top")
# 在snv上实现画斜线还是很简单的
grid.segments(x - w*0.9*0.5,
y - h*0.5,
x + w*0.9*0.5,
y - h*0.5 + 1/n*h,
gp = gpar(lwd = 2))
flag=flag+1
flag_log=1
}
# 在indel上实现画斜线就很复杂
else if(flag_log){
if (i =="indel" & v[i] & n==2) {
grid.segments(x - w*0.9*0.5,
y - h*0.5 + 1/n*h,
x + w*0.9*0.5,
y + h*0.5,
gp = gpar(lwd = 2))
}
else if (i =="indel" & v[i]) {
grid.segments(x - w*0.9*0.5,
y - h*0.5 + 1/n*h,
x + w*0.9*0.5,
y + h*0.5 - 1/n*h,
gp = gpar(lwd = 2))
}
}
else if(flag_log==0){
if (i =="indel" & v[i] & n==1) {
grid.segments(x - w*0.9*0.5,
y - h*0.5,
x + w*0.9*0.5,
y + h*0.5,
gp = gpar(lwd = 2))
}
else if (i =="indel" & v[i] & n>1) {
grid.segments(x - w*0.9*0.5,
y - h*0.5,
x + w*0.9*0.5,
y - h*0.5 + 1/n*h,
gp = gpar(lwd = 2))
}
}
}
}, col = col)

complexheatmap_oncoprint_intron_exon_2.png

在indel上仅添加框线会更加简单一些:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
oncoPrint(mat_list,
alter_fun = function(x, y, w, h, v) {
n = sum(v) # how many alterations for current gene in current sample
h = h*0.9
# use `names(which(v))` to correctly map between `v` and `col`
if(n) grid.rect(x, y - h*0.5 + 1:n/n*h, w*0.9, 1/n*h,
gp = gpar(fill = col[names(which(v))], col = NA), just = "top")
flag=1
for (i in names(v)){
if (i =="snv" & v[i]) {
grid.rect(x, y - h*0.5 + flag/n*h, w*0.9, 1/n*h,
gp = gpar(fill = NA, lwd = 2), just = "top")
grid.segments(x - w*0.9*0.5,
y - h*0.5,
x + w*0.9*0.5,
y - h*0.5 + 1/n*h,
gp = gpar(lwd = 2))
flag=flag+1
}
else if (i =="indel" & v[i]){
grid.rect(x, y - h*0.5 + flag/n*h, w*0.9, 1/n*h,
gp = gpar(fill = NA, lwd = 2), just = "top")
}
}
}, col = col)

complexheatmap_oncoprint_intron_exon_3.png


其他相关参数

  • 控制列名(通常是样本名):默认情况是不会显示的,可以通过参数show_column_names = TRUE来显示
  • 控制行名(通常是基因名):默认情况是显示的,可以通过参数show_row_names来控制
  • 控制百分比:默认情况是显示的,可以通过参数show_pct来控制
  • 控制行名和百分比的显示方位pct_siderow_names_side
  • 控制百分比的小数点位数pct_digits
  • 控制barplotanno_oncoprint_barplot(),具体示例请看帮助文档。
  • oncoPrint也是热图,所以所有热图的参数这里也能使用,比如控制legend的参数heatmap_legend_param
  • 删除没有突变的行或者列:默认情况下即使某行或者某列是没有突变也会显示出来,但是可以通过设置remove_empty_columnsremove_empty_rowsTRUE来进行删除。删除空行或者空列的操作最好在使用oncoPrint函数之前进行操作,防止在将oncoPrint结果和其他热图进行并列显示的时候出现问题。
  • 对行或者列重新排序:和前面热图的函数相同,可以使用row_ordercolumn_order来控制进行控制。s

复杂oncoprint示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 输入数据准备
mat = read.table(system.file("extdata", package = "ComplexHeatmap",
"tcga_lung_adenocarcinoma_provisional_ras_raf_mek_jnk_signalling.txt"),
header = TRUE, stringsAsFactors = FALSE, sep = "\t")
mat[is.na(mat)] = ""
rownames(mat) = mat[, 1]
mat = mat[, -1]
mat= mat[, -ncol(mat)]
mat = t(as.matrix(mat))
mat[1:3, 1:3]
TCGA-05-4384-01 TCGA-05-4390-01 TCGA-05-4425-01
KRAS " " "MUT;" " "
HRAS " " " " " "
BRAF " " " " " "
# 设置alter_fun
col = c("HOMDEL" = "blue", "AMP" = "red", "MUT" = "#008000")
alter_fun = list(
background = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = "#CCCCCC", col = NA))
},
# big blue
HOMDEL = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = col["HOMDEL"], col = NA))
},
# bug red
AMP = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = col["AMP"], col = NA))
},
# small green
MUT = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h*0.33,
gp = gpar(fill = col["MUT"], col = NA))
}
)

column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling"
heatmap_legend_param = list(title = "Alternations", at = c("HOMDEL", "AMP", "MUT"),
labels = c("Deep deletion", "Amplification", "Mutation"))
# 绘图
oncoPrint(mat,
alter_fun = alter_fun, col = col,
remove_empty_columns = TRUE, remove_empty_rows = TRUE,
# 这里添加了三个注释,一个是bar图
# 一个是1到172的热图,其实172是样本的数目
top_annotation = HeatmapAnnotation(cbar = anno_oncoprint_barplot(),
foo1 = 1:172,
bar1 = anno_points(1:172)),
left_annotation = rowAnnotation(foo2 = 1:26),
# 26为基因的数目
right_annotation = rowAnnotation(bar2 = anno_barplot(1:26)),
column_title = column_title, heatmap_legend_param = heatmap_legend_param)

complexheatmap_oncoprint_complex_1.png



参考链接



-----本文结束感谢您的阅读-----

本文标题:R绘图系列-ComplexHeatmap使用记录

文章作者:showteeth

发布时间:2019年12月19日 - 21:13

最后更新:2020年06月23日 - 22:33

原始链接:http://showteeth.tech/posts/31202.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%