…is a tongue-in-cheek phrase from Trevor Hastie’s very fun to read useR-2009 presentation, from the merry trio of Hastie, Friedman, and Tibshirani, who brought us, among other things, the excellent Elements of Statistical Learning textbook. It’s a joy to read sophisticated but well-presented work like this.
This comes from a slide explaining the impressive speed results for their glmnet regression package. Substantively, I’m interested in their observation that coordinate descent works well for sparse data — if you’re optimizing one feature at a time, and that feature is used in only a small percentage of instances, there are some neat optimizations!
But mostly, I had a fun time skimming the glmnet code. It’s written in 2008, but, yes, the core algorithm is written entirely in Fortran, complete with punchcard-style, fixed-width formatting! (This seems gratuitous to me — I thought the modern Fortran-90 had done away with such things?) I’ve felt clever enough making 10x-100x performance gains by switching from R or Python down to C++, but I’m told that this is nothing compared to Fortran with the proprietary Intel compiler — still the fastest language in the world for numeric computing.
(Hat tip: Revolution Computing pointed out the useR-2009 presentations.)
I really like their prior vs. coefficient line graphs (as in their book). I’ve been thinking about implementing the coordinate descent algorithm ever since I looked over Genken, Lewis and Madigan’s Bayesian Regression package.
Another reason people still like Fortran is that it’s such a simple language that the loops are easy to automatically parallelize.
Somehwat counterintuitively, languages in the ML family are also super-fast at simple matrixes because of the ability of the compiler to statically optimize.