DISE Project
Machine learning systems are incredibly data hungry. Data sets for production systems can easily exceed what can be processed on a single machine. Designing and implementing production systems that incorporate machine learning requires knowledge of: data storage systems, scalable data processing software, machine learning, MLOps, microservices, cloud computing, and algorithms and design patterns for batch and streaming analytics. I developed a foundation of knowledge spanning low-level system implementation through deployment and operation of systems with stringest reliability- and performance-related service level agreements (SLAs) through my experience with high-performance computing (HPC) from my Ph.D., full-time positions as a software and data science engineer from June 2014 to August 2018, and recent consulting work as a developer advocate . I’m developing curricula and realistic implementations of example systems for related courses in MSOE’s new graduate programs in machine learning. These course materials and example system implementations are available under open-source licenses through the MSOE Data-Intensive Systems Education (DISE) project.
The MSOE DISE GitHub organization contains repositories of class materials and example software.
Publications
- RJ Nowling. Experience Report from a Distributed Database Internals Course. Proceedings of the 2025 ACM Southeast Conference (ACMSE 2025). 2025.
- RJ Nowling. Experience Report from a Graduate ML Production Systems Course. Proceedings of the 2024 IEEE International Conference on Electro Information Technology (IEEE EIT 2024). 2024.
- RJ Nowling. ML Production Systems Course at a Polytechnic PUI. Journal of Computing Sciences in Colleges. 39(2):62-71. 2023.
- RJ Nowling and J Vyas. A Domain-Driven, Generative Data Model for Big Pet Store. Proceedings of the IEEE Fourth International Conference on Big Data and Cloud Computing (IEEE BDCloud). 2014.
Course Materials
ML Production Systems
Students will design, implement, deploy, and operate a machine learning-powered service, including components for data processing, model training, modeling serving, model evaluation, and monitoring. Technologies and design patterns for streaming and batch data processing as well as storage systems will be introduced. This course builds on and integrates previous course work in offline machine learning and microservices.
GitHub Repo: https://github.com/msoe-dise-project/ml-prod-sys-course
Distributed Storage Systems
In some applications, data storage and processing needs have vastly exceeded what can be accomplished using a single computer. A number of database and file systems that use distributed computing techniques to provide enhanced scalability and reliability have become available and been widely adopted. This course will cover software architectures, algorithms, and practical implications of approaches for scaling storage systems to large data sizes and high read/write throughputs, providing elasticity in the face of changing loads, and reliability in the face of failures. Relevant papers will be reviewed alongside case studies of industry and open-source implementations. Students will complete a term-long project to implement a functional distributed storage system.
GitHub Repo: https://github.com/msoe-dise-project/distributed-database-internals-course