For this type of analysis we could load our data into R and perform the analysis ourselves, but for a simple experiment design with 2 sample groups in triplicate without batch effects or sample pairing I want to share with you an easy solution. DEB is a online service provided by the Interdisciplinary Center for Biotechnology Research (ICBR) University of Florida that will analyse the count matrix for you with either DESeq, edgeR or baySeq. Their Bioinformation paper is also worth a look.
As with all aspects of bioinformatics, format is critical. You need to follow the specified format exactly. Here is what the head of my count matrix looks like:
gene UNTR1 UNTR2 UNTR3 AZA1 AZA2 AZA3
ENSG00000185650_ZFP36L1 479 467 481 89 77 96
ENSG00000116032_GRIN3B 62 45 62 43 54 52
ENSG00000236345_RP11-59D5__B.2 608 512 600 433 406 417
ENSG00000138069_RAB1A 4141 3568 3571 3250 3259 3189
ENSG00000127666_TICAM1 719 650 677 615 648 566
ENSG00000155393_HEATR3 1672 1291 1570 1511 1634 1641
ENSG00000146731_CCT6A 9400 7432 7743 8923 9322 9471
ENSG00000153012_LGI2 273 275 214 299 264 272
ENSG00000253716_RP13-582O9.5 78 53 62 75 77 60
I then uploaded the count matrix to DEB to perform differential gene expression analysis using all three statistical algorithms with a significance threshold of 5% after false discovery rate (FDR) correction.This shows that overall, the three algorithms were very similar, although edgeR was the least conservative and baySeq was the most conservative.
Then I wanted to know what the overlap was between the three algorithms, so I drew Venn diagrams with Venny.
Here is the smear plot of the effect of azacitidine using DESeq. The version generated by edgeR was virtually identical.
Below is the top 20 up- and down-expressed genes by azacitidine.
In the next post, we will draw a heatmap.