Lipovich lab contributes to comprehensive Nature-published gene expression atlas of human long non-coding RNAs

March 01, 2017

Wayne State University School of Medicine Associate Professor of Molecular Medicine, Genetics, and of Neurology Leonard Lipovich, Ph.D., is among a team of international multidisciplinary scientists to contribute to an atlas published today in Nature that reveals how thousands of long non-coding RNAs in humans, once considered genetic “junk,” are directly involved in diseases and other genetic traits.

Dr. Lipovich is a member of the WSU Center for Molecular Medicine and Genetics, and has been studying long non-coding RNA genes, or lncRNAs, since the late 1990s.

“Two-thirds of human genes encode lncRNAs, not proteins. However, lncRNAs remain poorly understood, because much of the genomics community has been focusing on the genome and epigenetics, instead of the transcriptome, in the decade and a half since the Human Genome Project was completed,” Dr. Lipovich said.

The atlas, which contains 27,919 long non-coding RNAs, summarizes for the first time their expression patterns across the major human cell types and tissues. By intersecting this atlas with genomic and genetic data, results suggest that 19,175 of these RNAs might be functional, hinting that there could be as many – or even more – functional non-coding RNAs than the approximately 20,000 protein-coding genes in the human genome.

“There is strong debate in the scientific community on whether the thousands of long non-coding RNAs generated from our genomes are functional or simply byproducts of a noisy transcriptional machinery,” said Professor Alistair Forrest, Ph.D., of the Harry Perkins Institute of Medical Research at the University of Western Australia and senior visiting scientist at the RIKEN Center for Life Science Technologies. Dr. Forrest is one of the corresponding authors of the paper.

“By integrating the improved gene models with data from gene expression, evolutionary conservation and genetic studies, we find compelling evidence that the majority of these long non-coding RNAs appear to be functional, and for nearly 2,000 of them we reveal their potential involvement in diseases and other genetic traits,” he added.

While it was once believed that genes regulated biological functions almost exclusively by being transcribed to coding RNAs that were then translated into proteins, it is now known that the picture is much more complex. Studies examining the association between genes and diseases have shown that most disease variants are found outside of protein-coding genes.

For nearly two decades, the Japan-based international FANTOM (Functional Annotation of Mammalian cDNA) Consortium has provided a counterpoint to the mainstream, protein-centric narrative of post-genomic biology, focusing on the transcriptome with an emphasis on ncRNAs. FANTOM catalogued the entire human, including lncRNA, promoterome at a single-nucleotide resolution in 1,000 human and 500 mouse tissues, using cutting-edge Cap Analysis of Gene Expression (CAGE) technology in a study published in Nature in 2014, co-authored by Dr. Lipovich.

The FANTOM Consortium, led by RIKEN – Japan's largest research institute for basic and applied research – pioneered the discovery of non-coding RNAs more than a decade ago, revealing the complexity of the transcriptional landscape in mammalian genomes for the first time. Now, the consortium continues to be on the leading edge of studies into the origins and functions of non-coding RNAs. The latest work published today, “An Atlas of Human Long Non-Coding RNAs with Accurate 5’ ends,” (Hon et al 2017, Nature) has generated a comprehensive atlas of human long non-coding RNAs with substantially improved gene models, which allows better assessment of the diversity and functionality of these RNAs. Resources of the lncRNA atlas are available at

Dr. Lipovich joined the FANTOM Consortium in 2004, then a postdoctoral fellow at the Genome Institute of Singapore. He co-authored four high-profile FANTOM3 papers in 2005 and 2006, focusing on evolutionary non-conservation of human and mouse lncRNAs.

In the early years of the FANTOM5 project, Dr. Lipovich contributed to cataloguing full-length long non-coding RNA genes using the FANTOM5 Cap Analysis of Gene Expression, or CAGE, data, reconciling CAGE lncRNA signals with lncRNA sequences from pre-existing public sources by manual annotation, providing feedback to the developer team in charge of a web-based CAGE data viewing resource, and interpreting the CAGE lncRNA data to better understand lncRNA biology, “in particular the central finding of the paper that so-called ‘enhancers’ are often, actually, promoters of lncRNA genes, a major insight that should lead biologists to reassess the very meaning of what an enhancer is and how it functions,” he said.

Attempts to draw a map of RNA transcription rely on sequencing technologies that do not always accurately identify the beginnings or 5’ ends, of the RNA transcripts. The atlas team instead used a technology known as Cap Analysis of Gene Expression, or CAGE, developed at RIKEN, to build an atlas of human long non-coding RNAs with accurate 5’ ends, precisely pinpointing where in the genome their transcription is initiated.

The majority of long non-coding RNAs appear to be generated from enhancer elements, added RIKEN Division of Genomic Technologies Research Scientist Chung-Chau Hon, Ph.D., the paper’s first author. “It deepens our understanding toward the largely heterogeneous origins of long non-coding RNAs,” he said.

Piero Carninci, Ph.D., deputy director of the RIKEN Center for Life Science Technologies, said, “The improved gene models and the broad functional hints of human long non-coding RNAs derived from this atlas could serve as the Rosetta Stone for us to experimentally investigate their functional relevance, which is ongoing in the upcoming edition of the FANTOM consortium. We anticipate these results would further push the boundary of our understanding towards the functions the non-coding portion of our genome.”

The FANTOM project is a RIKEN initiative launched in 1998 by genomics pioneer Professor Yoshihide Hayashizaki. The first FANTOM aimed to build a complete gene catalogue with cDNA technologies. FANTOM5, the latest stage, aimed to provide the first holistic view of transcriptional regulatory network models for the majority of the cell types that make up a human. RIKEN organizers recruited a multidisciplinary network of experts in primary cell biology and bioinformatics to achieve this.