PacBio Announces a New Informatics Analysis Method for Highly Homologous GenesLaboratory
PacBio, a leading developer of highly accurate sequencing solutions, has announced a new informatics method that genotypes gene paralogs and pseudogenes with high accuracy. The new computational tool, named “Paraphase,” enables variant calling, copy number analysis and phasing by identifying the full gene sequence of each of the haplotypes for all genes and pseudogenes of the same gene family. Many medically relevant genes fall into segmental duplications and thus have highly similar gene family members or pseudogenes. The sequence similarity often leads to error prone read alignment and variant calling.
“Through the use of Paraphase, we are able to identify the full sequence of each copy of a gene and, importantly, identify the number of functional and non-functional copies of a gene,” said Mike Eberle, Vice President of Computational Biology at PacBio. “This will allow researchers to conduct more accurate carrier analyses and provide a framework for studying the underlying genetics of these complex genomic regions. We believe that applying this method to larger, diverse, population data will enable researchers to better understand medically important problems, such as silent carriers for spinal muscular atrophy.”
“PacBio announces a new informatics method that genotypes gene paralogs and pseudogenes with high accuracy“
Paraphase has been used on several medically relevant genes with highly similar paralogs or pseudogenes, including, CYP21A2 (21-hydroxylase-deficient congenital adrenal hyperplasia), TNXB (Ehlers-Danlos syndrome), STRC (hereditary hearing loss and deafness) and SMN1 and 2 (spinal muscular atrophy). SMN1 is >99.9 percent similar in sequence to its paralog, SMN2, and both genes have variable copy numbers across populations. Mutations in SMN1 cause spinal muscular atrophy (SMA), a leading cause of early infant death.
High throughput detailed sequence analysis of complete genes is challenging using existing technologies and identifying silent carriers (having two copies of SMN1 on one chromosome and zero copies on the other, accounting for 27 percent of carriers in African populations) is impossible without pedigree information. In a recent peer-reviewed publication, Paraphase detected these pathogenic variants for SMA. The study also identified major SMN1 and SMN2 sequence haplogroups and characterized their co-segregation through pedigree-based analyses. In addition, the authors identified a pair of haplotypes that can serve as a genetic marker for alleles carrying two copies of SMN1 in African populations, demonstrating the potential of haplotype-based screening of silent carriers.
“The fact that HiFi sequencing not only allows access to the most difficult regions of the human genome, but also enables calling of all known variant types, such as SNVs, Indels, and SVs including CNVs, plus phasing of these loci, keeps a great promise for applications in rare disease research,” said Alexander Hoischen, Ph.D., Associate Professor for Genomic Technologies, and Immuno-Genomics at Radboud University Medical Centre.
Paraphase is being extended into a genome-wide generalized paralog caller as more highly homologous genes are included. Paraphase works on whole-genome sequencing and hybrid capture-based enrichment data. It can also be adapted to work with amplicon sequencing data when the full regions of interest are captured or amplified.