Improving allele-specific transcript inference using genome graphs

Name of applicant

Jonas Andreas Sibbesen

Amount

DKK 450,685

Year

2018

Type of grant

Internationalisation Fellowships

What?

Current methods for analysing RNA-seq data are generally based on comparing the reads to a linear reference genome. However, this approach is biased towards the reference, since transcripts in regions which differ markedly between the individual sequenced and the reference are harder to analyse, compared to regions which are more identical. This is also known as mapping-bias and can affect downstream analysis pipelines, such as allele-specific transcript expression estimation. The problem can, however, be mitigated by comparing the reads to a genome graph, that not only contains the linear reference, but also known genetic variation. The aim of this project is thus to develop a method that improves estimation of allele-specific expression, by reducing mapping-bias using genome graphs.

Why?

In allele-specific expression (ASE) analysis, the expression levels of genes or transcripts on the maternal and paternal allele are estimated independently. For this reason, it is extremely important that there is no mapping-bias between the different alleles. Analyses of ASE can, among other things, be used to investigate genomic imprinting or the effect of genomic variation on the expression of genes and transcripts. Genomic imprinting is a specific type of ASE, in which a gene is only expressed on a single allele dependent on whether it is maternal or paternal, and disruption of this process has been shown to be associated with developmental diseases and cancer. Methods that are able to better handle mapping-bias are therefore needed in order to get more sensitive estimates of ASE.

How?

This project will address the problem of RNA-seq mapping-bias by developing a spliced genome graph reference structure. This graph will contain not only known variants and haplotypes, but also transcriptomic information, such as known splice-sites and transcripts. Similar to how known haplotypes can be used to guide the mapping process of genome sequencing reads, haplotype-specific transcripts will be used to improve mapping of reads from RNA-seq. The project will be based on extending the variation graph toolkit (vg) to be able to map and analyse RNA-seq data. vg is a collaborative effort to create a common framework for methods that work on genome graphs. The method will be benchmarked using simulations and long reads from direct RNA Oxford Nanopore sequencing.

Back to listing page