Clydesdale: Structured data processing on Hadoop
Andrey Balmin, Tim Kaldewey, et al.
SIGMOD 2012
Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype-phenotype datasets. © The Author(s) 2012. Published by Oxford University Press.
Andrey Balmin, Tim Kaldewey, et al.
SIGMOD 2012
Roger D. Traub, Miles A. Whittington, et al.
Reviews in the Neurosciences
M. Sprik, U. RÖTHLISBERGER, et al.
Molecular Physics
T.C. Rodman, B.J. Flehinger, et al.
Cytogenetics and Cell Genetics