Last night, I gave a talk titled “Real-World Lessons in Machine Learning Applied to Spam Classification” at the MKE Big Data meetup. In my talk, I used spam classification as a use case for communicating some lessons learnd from my experiences building production machine learning-powered services. In particular, I wanted to get the point across that modeling and algorithm choices are not independent from the requirements of the production system – we need to design our models and choose our algorithms while keeping in mind how those choices will impact the resulting production system.
A few attendees had asked for some additional resources related to the topics. Martin Zinkevich of Google recently published an excellent guide based on their experiences titled Rules of Machine Learning: Best Practices for ML Engineering, which I highly recommend. Vowpal Wabbit is a powerful toolkit for online machine learning that incorporates some of the latest algorithms and techniques.