Automation of Genetic Screening and Reporting with Snakemake


Kalyon O., Müslümanoğlu M. H. , Yılmaz A.

12th International Symposium on Health Informatics and Bioinformatics (HIBIT) 2019, İzmir, Turkey, 17 - 18 October 2019, pp.219

  • Publication Type: Conference Paper / Summary Text
  • City: İzmir
  • Country: Turkey
  • Page Numbers: pp.219

Abstract

Next Generation Sequencing (NGS) technology has boosted genetic research. especially allowing
fast and accurate diagnosis. However, analysis of multiple samples by passing them through multiple steps in
commandline is tedious and error-prone. In this study we aim to facilitate analysis of NGS data with
Snakemake [1] which provides management of Python-based workflows. Additionally, Snakemake benefits
from conda environments thus installation or configuration of numerous softwares becomes effortless..
We modified an existing Snakemake workflow inspired from GATK best practices [2] and we
integrated ENSEMBL Variant Effect Predictor (VEP) [3] into annotation step of the workflow. Additionally,
we integrated Integrative Genome Viewer (IGV) [4] in final report via javascipt library [5]. Final report also
includes interactive HTML tables generated by R script so that end user dynamically analyze the results.
As a result, any user can clone our code and then with any raw fastq file, initiate the mapping,
annotation and report generation steps easily. This approach is reproducible and portable, it can be implemented
in a personal laptop, server or even a cluster. Since Snakemake can run parallel jobs, the SNP analysis can be
done in parallel fashion if multiple CPUs are available. Such an approach will allow a user or genetic analysis
center to save time by running analysis and generating reports in automated way. More importantly, using a
workflow approach will prevent errors even though large number of samples are processed.

References:
[1] Köster J, Rahmann S. “Snakemake - a scalable bioinformatics workflow engine”, Bioinformatics, 2012.
28(19):2520-2522, https://doi.org/10.1093/bioinformatics/bts480
[2] https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling
[3] McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F. “The Ensembl
Variant Effect Predictor”. Genome Biology, 2016. 17(1):122 doi:10.1186/s13059-016-0974-4
[4] Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. “Integrative
Genomics Viewer”. Nature Biotechnology 29, 24–26 (2011)
[5] https://github.com/igvteam/igv-reports