Showing posts from September, 2017

Upset plots as a replacement to Venn Diagram

I previously posted about different ways to obtain Venn diagrams, but what if you have more than 4 lists to intersect? These plots become messy and not easy to read. One alternative which has become popular is the upset plot. There is an excellent summary of the philosophy behind this approach in this article and academic paper here. An example plot is below:

In this post, I'll describe how to get from lists of genes in text files and present it as an UpSet plot using R. As with most R packages, you'll find that loading in the data is the hardest part, and that data import is the least documented aspect.

First I'll generate some random gene lists using a quick and dirty shell script. My complete list contains 58302 genes and looks like this:
$ head -5 Homo_sapiens.GRCh38.90.gnames.txt ENSG00000000003_TSPAN6 ENSG00000000005_TNMD ENSG00000000419_DPM1 ENSG00000000457_SCYL3 ENSG00000000460_C1orf112
This is the script which generates random subsets of genes with the suffix &quo…