Last night, I gave a talk titled “Real-World Lessons in Machine Learning Applied to Spam Classification” at the MKE Big Data meetup. In my talk, I used spam classification as a use case for communicating some lessons learnd from my experiences building production machine learning-powered services. In particular, I wanted to get the point across that modeling and algorithm choices are not independent from the requirements of the production system – we need to design our models and choose our algorithms while keeping in mind how those choices will impact the resulting production system.

You can grab my slides here. My slides and source code I used to generate my plots are also available on the MKE BD Talks GitHub repo.

A few attendees had asked for some additional resources related to the topics. Martin Zinkevich of Google recently published an excellent guide based on their experiences titled Rules of Machine Learning: Best Practices for ML Engineering, which I highly recommend. Vowpal Wabbit is a powerful toolkit for online machine learning that incorporates some of the latest algorithms and techniques.