I got an email from a promoter for Predictive Analytics World, a very expensive conference next month in San Francisco for business applications of data mining / machine learning / predictive analytics. I’m not going because I don’t want to spend $1600 of my own money, but it looks like it has a good lineup and all (Andreas Weigend, Netflix BellKor folks, case studies from interesting companies like Linden Labs, etc.). If you’re a cs/statistics person and want a job, this is probably a good place to meet people. If you’re a businessman and want to hire one, this is probably a bad event since it’s too damn expensive for grad school types. I am supposed to have access to a promotional code for a 15% discount, so email me if you want such a thing.
John Langford posted a very interesting email interview with one of the organizers for the event, about how machine learning gets applied in the real world. The guy seemed to think that data integration — getting all the data out of different information systems within an organization and in the same place — is the most critical and hardest step. This aligns with my experiences. What machine learning people actually study, the algorithms and models, is often the 2nd or 3rd or lower priority concern in an applications realm, at least for creating a new system. (Similar points from that Jeff Hammerbacher video — most important thing for Facebook’s internal analytics efforts was data integration, e.g. clever combinations of Scribe and Hadoop). Important exception is if the research is creating a new domain that didn’t exist before. But knowing how to improve document classification f-score by another 2% isn’t going to matter too much unless you have a very mature system already.