Publications
- Landau, W., Niemi, J., and Nettleton, D., “Fully Bayesian analysis of RNA-seq counts for the detection of gene expression heterosis”. Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2018.1497496.
- Landau, W. (2018), “The drake R package: a pipeline toolkit for reproducibility and high-performance computing”. Journal of Open Source Software, 3(21), 550, https://doi.org/10.21105/joss.00550.
- Niemi, J., Mittman, E., Landau, W., and Nettleton, D. (2015), “Empirical Bayes Analysis of RNA-seq Data for Detection of Gene Expression Heterosis,” Journal of Agricultural, Biological, and Environmental Statistics, 20, 1-15. Available at
link.springer.com.
- Landau, W. and Liu, P. (2013), “Dispersion Estimation and Its Effect on Test Performance in RNA-seq Data Analysis: A Simulation-Based Comparison of Methods,” PLOS One, 8. Available at
journals.plos.org.
- Ratliff, B., Womack. C., Tang, X., Landau, W., Butler, L., and Szpunar, D. (2010), “Modeling the Rovibrationally Excited C2H4OH Radicals from the Photodissociation of 2-Bromoethanol at 193 nm,” Journal of Physical Chemistry, 114, 4934-4945. Available at
ncbi.nlm.nih.gov.
Presentations
- Landau, W. (2019) “Reproducible workflows at scale with drake”, rOpenSci Community Call, https://ropensci.org/commcalls/2019-09-24/.
- Landau, W. (2019) “Machine learning workflow management with drake”, invited 4-hour workshop, R/Pharma Conference.
- Landau, W. (2019) “Reproducible Computation at Scale in R”, Harverd DataFest.
- Landau, W. (2018) “The drake R package: reproducible data analysis at scale”, R/Pharma Conference.
- Landau, W., and Niemi, J. (2016), “A Fully Bayesian Strategy for High-Dimensional Hierarchical Modeling Using Massively Parallel Computing”. Joint Statistical Meetings, Section on Statistical Computing, Section on Statistical Graphics, Statistical Computing and Graphics Student Awards — Topic Contributed Papers. https://ww2.amstat.org/meetings/jsm/2013/onlineprogram/AbstractDetails.cfm?abstractid=307645.
- Landau, W., and Liu, P. (2013), “Dispersion Estimation and Its Effect on Test Performance in RNA-Seq Data Analysis”. Joint Statistical Meetings, Biometrics Section, contributed poster. https://ww2.amstat.org/meetings/jsm/2013/onlineprogram/AbstractDetails.cfm?abstractid=307645.
Open Source Software
- drake, an R-focused pipeline toolkit for reproducible computation and high-performance computing. Part of rOpenSci.
- txtq, a minimalist, serverless, socketless message queue for interprocess communication.
- proffer, a pprof-based profiler for R code.
- R packages fbseq, fbseqCUDA, and fbseqOpenMP. A toolkit for the fully Bayesian analysis of genomic count data.
- downsize, and R package to toggle between the test and production versions of large workflows.
Awards
- 2020 Lilly Innovator Award. Awarded for the development of efficient clinical trial simulation software needed for complex designs and trials of potential COVID-19 treatments.
- 2019 NumFOCUS New Contributor Recognition. Awarded for inclusive and collaborative work with rOpenSci. https://numfocus.org/blog/2019-numfocus-awards.
- 2017 Lilly Innovator Award. Awarded for leading a successful team effort to modernize Lilly’s internal process for contributing open source software.
- Student Paper Award, American Statistical Association Section on Statistical Computing, Jan 2016. Awarded for an early draft of the preprint at arxiv.org/abs/1606.06659.
- Vince Sposito Statistical Computing Award, Iowa State University, Aug 2013.
- GlaxoSmithKline Industrial Scholarship, Iowa State University, Sep 2011.
- Alumni Scholarship, Iowa State University, Aug 2011.
Skills
- Reproducible research, statistical computing, hierarchical models, Bayesian methods, Markov chain Monte Carlo, high-dimensional data analysis, genomics data analysis, exploratory analysis, visualization, linear and nonlinear models, data mining, machine learning, predictive modeling, multivariate analysis.
- High-performance computing, R, R package development, general-purpose graphics processing unit (GPU) computing, CUDA, shell scripting, LaTeX, HTML, CSS.
- Past experience with C/C++, MPI, OpenMP, Python, JavaScript, AWK, Fortran.
Research statistician
- October 2016 - Present
- Eli Lilly and Company
- Developed internal statistical tools and capabilities for the design, simulation, and analysis of clinical trials.
- Served as the lead statistician in early-phase autoimmune asset teams.
- Supported late-phase clinical trial teams with advanced analytics, including clinical program simulation and tailored therapeutics.
- Published open-source software packages drake, txtq, and proffer.
Research assistant
- May 2013 - Aug 2016
- RNA-sequencing Working Group, Department of Statistics, Iowa State University.
- Funded by NIH grant R01GM109458 with Drs. Dan Nettleton and Jarad Niemi.
- Developed a new fully Bayesian analysis method for high-dimensional genomic datasets using hierarchical models.
- Implemented massively parallelized Markov chain Monte Carlo.
- Created the
fbseq
R package to distribute the analysis method.
- Implemented and distributed parallel computing backends for CUDA GPUs (fbseqCUDA) and OpenMP (fbseqOpenMP).
- Created the
remakeGenerator,
parallelRemake, and
downsize packages to manage, ameliorate, expedite, and accelerate computationally heavy reproducible workflows that are under heavy development.
Seminar instructor
Course instructor
Grader
- Aug - Dec, 2011.
- Department of Statistics, Iowa State University.
- STAT 231: Engineering Probability.
- STAT 105: Introduction to Engineering Statistics.
Leadership at Eli Lilly and Company
Leadership at Iowa State University
- Founder and leader, Cloud Computing Working Group, Sep - Dec 2015.
- Member, Computation Advisory Committee, Sep 2015 - May 2016.
- Volunteer instructor, Office of Precollegiate Programs for Talented and Gifted (OPPTAG), Mar 13, 2014.
- Fellow, Preparing Future Faculty, Aug 2013 - May 2014.
- Assistant Coach, Boxing Club, Aug 2013 - Dec 2013.
References
Hobbies
- Climbing, Brazillian Jiu Jitsu, sailing, windsurfing