There are an increasing number of systems that attempt to allow the user to specify a probabilistic model in a high-level language — for example, declare a (Bayesian) generative model as a hierarchy of various distributions — then automatically run training and inference algorithms on a data set. Now, you could always learn a good math library, and implement every model from scratch, but the motivation for this approach is you’ll avoid doing lots of repetitive and error-prone programming. I’m not yet convinced that any of them completely achieve this goal, but it would be great if they succeeded and we could use high-level frameworks for everything.
Everyone seems to know about only a few of them, so here’s a meager attempt to list together a bunch that can be freely downloaded. There is one package that is far more mature and been around much longer than the rest, so let’s start with:
BUGS – Bayesian Inference under Gibbs Sampling. Specify a generative model, then it does inference with a Gibbs sampler, thus being able to handle a wide variety of different sorts of models. The classic version has extensive GUI diagnostics for convergence and the like. BUGS can also be used from R. (The model definition language itself is R-like but not actually R.)
BUGS has had many users from a variety of fields. There are many books and zillions of courses and other resources showing how to use it to do Bayesian statistical data analysis. BUGS is supposed to be too slow once you get to thousands of parameters. The original implementation, WinBUGS, is written in Delphi, a variant of Pascal (!); its first release was in 1996. There are also two alternative open-source implementations (OpenBUGS, JAGS).
This is clearly very mature and successful software. Any new attempts to make something new should be compared against BUGS.
Next are systems that are much newer, generally less than several years old. Their languages all fall broadly into the category of probabilistic graphical models, but there are plenty of differences and specializations and assumptions that are a project in itself to understand. In lieu of doing a real synthesis, I’ll just list them with brief explanations.
Factorie focuses on factor graphs and discriminative undirected models. Claims to scale to millions of parameters. Written in Scala. New as of 2009. Its introductory paper is interesting. From Andrew McCallum’s group at UMass Amhearst.
Infer.NET. I only just learned of it. New as of 2008. Focuses on message-passing inference. Written in C#. From MSR Cambridge. I actually can’t tell whether you get its source code in the download. All other systems here are clearly open source (except WinBUGS, but OpenBUGS is a real alternative).
Church. Very new (as of 2009?), without much written about it yet. Focuses on generative models. Seems small/limited compared to the first three. Written in Scheme. From MIT.
PMTK – Probabilistic Modeling Toolkit. I actually have no idea whether it does model specification-driven inference, but the author’s previous similar-looking toolkit (BNT) is fairly well-known, so it’s in this list. Written in Matlab. From Kevin Murphy.
HBC – Hierarchical Bayesian Compiler. Similar idea as BUGS, though see webpage for a clear statement of its somewhat different goals. It compiles the Gibbs sampler to C, so it’s much faster. Seems to be unmaintained. Written in Haskell. From Hal Daume.
Finally, there are a few systems that seem to be more specialized. I certainly haven’t listed all of them; see the Factorie paper for a list of a few others.
Alchemy – an implementation of the Markov Logic Network formalism, an undirected graphical model over log-linear-weighted first-order logic. So, unlike BUGS and the above systems, there are no customized probability distributions for anything; everything is a Boltzmann (log-linear) distribution. At least, that’s how I understood it from the original paper. The FOL is essentially a language the user uses to define log-linear features. Alchemy then runs training algorithms to fit the their weights to data.
From Pedros Domingos’ group at UWashington. Written in C++. I’ve heard people complain that Alchemy is too slow. But in fairness, all these systems are slower than customized implementations.
Dyna is specialized for dynamic programming. The formalism is weighted Horn clauses (weighted Prolog). Implements agenda-based training/inference algorithms that generalize Baum-Welch, PCFG chart parsers, and the like. Written in C++, compiles to C++. Seems unmaintained. From Jason Eisner’s group at John Hopkins.
Since it only does dynamic programs, Dyna usefully supports a much more limited set of models than the above systems. But I expect that means it can train and infer with models that the above would be hopeless to handle, since dynamic programming gives you big-O efficiency gains over more general algorithms. (But on the other hand, even dynamic programming can be too generic and slow compared to direct, customized implementations. That’s the danger of all these systems, of course.)
BLOG – first-order logic with probability, though a fairly different formalism than MLNs. Focuses on problems with unknown and unknown numbers of objects. I personally don’t understand the use case very well. Its name stands for “Bayesian logic,” which seems like an unfairly broad characterization given all the other work in this area. From Brian Milch. Seems unmaintained? Written in Java.
An interesting axis of variation of all these is whether the model specification language is Turing-complete or not, and to what extent training and inference can be combined with external code.
- Turing-complete: Factorie, Infer.NET, Church, and Dyna are all Turing complete. The modeling languages of the first three are embedded in general procedural programming languages (Scala, C#, and Scheme respectively). Dyna is Turing complete in two different ways: it has a complete Prolog-ish engine, which is technically Turing complete but is gonna be a pain to do anything normal in (I simply mean, since Prolog is technically Turing-complete but a total pain to do anything non-Prolog-y in); but also, it compiles to C++.
- Not Turing-complete: BUGS, HBC, Alchemy/MLN, and BLOG use specialized mini-languages. BUGS’ and HBC’s languages are essentially the same as standard probabilistic model notation, though BUGS is imperative. Alchemy and BLOG are logic variants.
- Compiles to Turing-complete: HBC compiles to C, and Dyna compiles to C++, which are then intended to be hacked up and/or embedded in larger programs. I imagine this is a maintainability nightmare, but could be fine for one-off projects.
Another interesting variation is to what extent the systems handle probabilistic relations. BUGS and HBC don’t really try at all beyond plates; Alchemy, BLOG, and Factorie basically specialize in this; Dyna kind of does in a way; and the rest I can’t tell.
In summary, lots of interesting variation here. Given how many of these things are new and changing, this area will probably look much different in a few years.