CRISPR-VAE: An Interpretable and Efficiency-aware gRNA Sequence Generator

Ahmad Obeid

Hasan AlMarzouqiEmail

Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, 127788, UAE

Abstract

Deep learning has shown significant potential in predicting gRNA efficiency, thereby optimizing engineered gRNAs and enhancing the application of CRISPR-Cas systems in genome editing. However, the black-box nature of these deep learning methods often lacks transparency, hindering our understanding of the factors that boost efficiency. Addressing this issue can significantly expand the use of CRISPR-Cas systems across various domains. We introduce CRISPR-VAE, a framework designed to interpret gRNA efficiency predictions, thereby elucidating the factors that enhance gRNA performance, specifically applied to CRISPR/Cas12a (formerly known as CRISPR/Cpf1). Our framework articulates these factors into position-specific k-mer rules. The methodology involves constructing an efficiency-aware gRNA sequence generator, trained on real-world data, to produce a large volume of synthetic sequences exhibiting desirable traits. These sequences form the basis for explaining gRNA predictions. Additionally, CRISPR-VAE functions as an independent sequence generator, providing users with fine-grained control over the sequences. This versatile framework integrates seamlessly with various CRISPR-Cas tools and datasets, demonstrating its efficacy. The complete code implementation of CRISPR-VAE can be found at github.com/AhmadObeid/CRISPR-VAE.