My research falls in several disjoint areas.

Genomic Data Science

A genome is the totality of an organism’s DNA and effectively the “instructions” or software that control the development of an organism. One of biology’s central goals is to link changes in physical characteristics (phenotypes) of organisms to causative changes in the DNA. Genomic data, much like compiled software, are not easy for humans to interpret directly. Just as a decompiler can help us reverse engineer compiled software, machine learning can be used to identify, characterize, and annotate features of genomes and make analysis by humans easier. I aim to aid biologists in uncovering new knowledge and insights into the function of genomes by developing new methods that draw on machine learning, data science, and big data techniques.

See the Nowling Lab website for more details on projects, publications, and students.

Data-Intensive Systems

Machine learning systems are incredibly data hungry. Data sets for production systems can easily exceed what can be processed on a single machine. Designing and implementing production systems that incorporate machine learning requires knowledge of: data storage systems, scalable data processing software, machine learning, MLOps, microservices, cloud computing, and algorithms and design patterns for batch and streaming analytics. I developed a foundation of knowledge spanning low-level system implementation through deployment and operation of systems with stringest reliability- and performance-related service level agreements (SLAs) through my experience with high-performance computing (HPC) from my Ph.D., full-time positions as a software and data science engineer from June 2014 to August 2018, and recent consulting work as a developer advocate for I’m developing curricula and realistic implementations of example systems for related courses in MSOE’s new graduate programs in machine learning.

See the MSOE DISE project website for related publications, open-source course materials, and software.