Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution. They can be organized by what inputs or prior knowledge about the distribution they require. This ranges from a low amount of knowledge, as in slice sampling (just give it an unnormalized density function), to a high amount, as in Gibbs sampling (you have to decompose your distribution into individual conditionals).

Typical inputs include \(f(x)\), an unnormalized density or probability function for the target distribution, which returns a real number for a variable value. \(g()\) and \(g(x)\) represent sample generation procedures (that output a variable value); some generators require an input, some do not.

Here are the required inputs for a few algorithms. (For an overview, see e.g. Ch 29 of MacKay.) There are many more out there of course. I’m leaving off tuning parameters.

Black-box samplers: Slice sampling, Affine-invariant ensemble

- unnorm density \(f(x)\)

Metropolis (symmetric proposal)

- unnorm density/pmf \(f(x)\)

- proposal generator \(g(x)\)

Hastings (asymmetric proposal)

- unnorm density/pmf \(f(x)\)

- proposal generator \(g(x)\)

- proposal unnorm density/pmf \(q(x’; x)\) .

… [For the proposal generator at state \(x\), probability it generates \(x'\)]

Importance sampling, rejection sampling

- unnorm density/pmf \(f(x)\)

- proposal generator \(g()\)

- proposal unnorm density/pmf \(q(x)\)

Independent Metropolis-Hastings: the proposal is always the same, but still have to worry about asymmetric corrections

- unnorm density/pmf \(f(x)\)

- proposal generator \(g()\)

- proposal unnorm density/pmf \(q(x’; x)\)

Hamiltonian Monte Carlo

- unnorm density \(f(x)\)

- unnorm density gradient \(gf(x)\)

Gibbs Sampling

- local conditional generators \(g_i(x_{-i})\)

… [which have to give samples from \( p(x_i | x_{-i}) \)]

Note importance/rejection sampling are stateless, but the MCMC algorithms are stateful.

I’m distinguishing a sampling procedure \(g\) from a density evaluation function \(f\) because having the latter doesn’t necessarily give you the former. (For the one-dimension case, having an inverse CDF indeed gives you a a sampler, but multidimensional gets harder — part of why all these techniques were invented in the first place!) Shay points out their relationship is analogous to 3-SAT: it’s easy to evaluate a full variable setting, but hard to generate them. (Or specifically, think about a 3-SAT PMF \(p(x) = 1\{\text{\(x\) is boolean satisfiable}\}\) where only one \(x\) has non-zero probability; PMF evaluation is easy but the best known sampler is exponential time.)

And of course there’s a related organization of optimization algorithms. Here’s a rough look at a few unconstrained optimizers:

Black-box optimizers: Grid search, Nelder-Mead, evolutionary, …

- objective \(f(x)\)

BFGS, CG

- objective \(f(x)\)

- gradient \(gf(x)\)

Newton-Raphson

- objective \(f(x)\)

- gradient \(gf(x)\)

- hessian \(hf(x)\)

Simulated annealing

- objective \(f(x)\)

- proposal generator \(g(x)\)

SGD

- one-example gradient \(gf_i(x)\)

I think it’s neat that Gibbs Sampling and SGD don’t always require you to implement a likelihood/objective function. It’s nice to do to ensure you’re actually optimizing or exploring the posterior, but strictly speaking the algorithms don’t require it.

Very nice summary.

One question: does SGD actually work in cases in which the objective function doesn’t decompose in an obvious way into the sum of one-example objective functions? My vague recollection of the proofs of convergence is that they make heavy use of the decomposition of the objective functions.

Yeah SGD requires decomposition into minibatch gradients you can sum together for the big gradient (and thus the gradient from a minibatch is in expectation same as the full gradient). I’m not aware of SGD for non-sum situations…

Hello would you mind sharing which blog platform you’re using?

I’m going to start my own blog in the near future but I’m having a hard time making a decision between BlogEngine/Wordpress/B2evolution and Drupal.

The reason I ask is because your design seems different then most blogs and

I’m looking for something unique. P.S Sorry

for being off-topic but I had to ask!

Thank you a bunch for sharing this with all people you actually understand what you are talking approximately!

Bookmarked. Please also talk over with my website =). We can have

a link trade contract among us

If you consistently practice the three steps i’ve outlined above, over time you will begin to see a marked improvement in the skin on your forehead and face.No wonder many celebrities gracing the red carpet choose a statement necklace to pull their look together.The mental game and your burning desireeveryone has great ideas, christian louboutin pas cher goals, un-Started business concepts, dreams and unwritten books inside of them.The more you have the less risk you will have.Drinking ralph lau

The biggest difference between the options is the thickness of the screen protector, if they do not state the cheap ralph lauren thickness it is most likely a thin piece of plastic that does not offer the level of protection it should.Buses are great for light traveling and shuttles louboutin chaussures en solde are a surprisingly good option that is much better than cabs and sometimes can even successfully substitute public transport.According to my mother’s description, i refused to get closer

I relish, cause I discovered just what I used to be having a look for.

You have ended my 4 day long hunt! God Bless you man. Have a nice day.

Bye