BACKGROUND: In this paper, we address the problem of identifying and quantifying polymorphisms in RNA-seq data when no reference genome is available, without assembling the full transcripts. Based on the fundamental idea that each polymorphism corresponds to a recognisable pattern in a De Bruijn graph constructed from the RNA-seq reads, we propose a general model for all polymorphisms in such graphs. We then introduce an exact algorithm, called KISSPLICE, to extract alternative splicing events. RESULTS: We show that KISSPLICE enables to identify more correct events than general purpose transcriptome assemblers. Additionally, on a 71 M reads dataset from human brain and liver tissues, KISSPLICE identified 3497 alternative splicing events, out of which 56% are not present in the annotations, which confirms recent estimates showing that the complexity of alternative splicing has been largely underestimated so far. CONCLUSIONS: We propose new models and algorithms for the detection of polymorphism in RNA-seq data. This opens the way to a new kind of studies on large HTS RNA-seq datasets, where the focus is not the global reconstruction of full-length transcripts, but local assembly of polymorphic regions. KISSPLICE is available for download at http://alcovna.genouest.org/kissplice/.
PubMed ID: 22537044
Projects: CeSGO project
Publication type: Not specified
Journal: BMC Bioinformatics
Citation: BMC Bioinformatics. 2012 Apr 19;13 Suppl 6:S5. doi: 10.1186/1471-2105-13-S6-S5.
Date Published: 2nd May 2012
Registered Mode: Not specified
Created: 28th Sep 2017 at 15:04