Comments on: Cosine similarity, Pearson correlation, and OLS coefficients

By: 相似性度量 - CSer之声

相似性度量 - CSer之声 — Tue, 01 Jul 2014 07:17:32 +0000

[...] 直观的解释是：如果 x 高的地方 y 也比较高， x 低的地方 y 也比较低，那么整体的内积是偏大的，也就是说 x 和 y 是相似的。举个例子，在一段长的序列信号 A 中寻找哪一段与短序列信号 a 最匹配，只需要将 a 从 A 信号开头逐个向后平移，每次平移做一次内积，内积最大的相似度最大。信号处理中 DFT 和 DCT 也是基于这种内积运算计算出不同频域内的信号组分（DFT 和 DCT 是正交标准基，也可以看做投影）。向量和信号都是离散值，如果是连续的函数值，比如求区间 [-1, 1] 两个函数之间的相似度，同样也可以得到（系数）组分，这种方法可以应用于多项式逼近连续函数，也可以用到连续函数逼近离散样本点（最小二乘问题， OLS coefficients ）中，扯得有点远了- -!。 [...]

By: Building the connection between cosine similarity and correlation in R | Question and Answer

Thu, 22 May 2014 14:24:16 +0000

[...] to some articles (e.g. here) correlation is just a centered version of cosine similarity. I use the following code to calculate [...]

By: Waylon Flinn

Waylon Flinn — Wed, 11 Dec 2013 04:51:37 +0000

Wonderful post. The more I investigate it the more it looks like every relatedness measure around is just a different normalization of the inner product.
Similar analyses reveal that Lift, Jaccard Index and even the standard Euclidean metric can be viewed as different corrections to the dot product. It’s not a viewpoint I’ve seen a lot of. It was this post that started my investigation of this phenomenon. For that, I’m grateful to you.

The fact that the basic dot product can be seen to underlie all these similarity measures turns out to be convenient. If you stack all the vectors in your space on top of each other to create a matrix, you can produce all the inner products simply by multiplying the matrix by it’s transpose. Furthermore, the extra ingredient in every similarity measure I’ve looked at so far involves the magnitudes (or squared magnitudes) of the individual vectors. These drop out of this matrix multiplication as well. Just extract the diagonal.

Because of it’s exceptional utility, I’ve dubbed the symmetric matrix that results from this product the base similarity matrix. I haven’t been able to find many other references which formulate these metrics in terms of this matrix, or the inner product as you’ve done. Known mathematics is both broad and deep, so it seems likely that I’m stumbling upon something that’s already been investigated.

Do you know of other work that explores this underlying structure of similarity measures? Is the construction of this base similarity matrix a standard technique in the calculation of these measures? Does it have a common name?

Thanks again for sharing your explorations of this topic.

P.S. Here’s the other reference I’ve found that does similar work:
http://arxiv.org/pdf/1308.3740.pdf

By: Machine learning literary genres from 19th century seafaring, horror and western novels | Sub-Subroutine

Sun, 08 Dec 2013 00:52:54 +0000

[...] Cosine similarity: Each book is a vector (in the physics sense, not the computer science sense) starting from the origin and with a direction defined by it’s 30,682 features. The smaller the angle between two vectors = the more similar the texts. (There’s a clear discussion of the formula here). [...]

By: Machine learning literary genres from 19th century seafaring, horror and western novels | Sub-Sub Algorithm

Sat, 07 Dec 2013 10:21:21 +0000

By: Brendan O'Connor

Brendan O'Connor — Mon, 01 Apr 2013 22:27:51 +0000

Hi Peter –

By “invariant to shift in input”, I mean, if you *add* to the input. That is,
f(x, y) = f(x+a, y) for any scalar ‘a’.

By “scale invariant”, I mean, if you *multiply* the input by something.

For (1-corr), the problem is negative correlations. I think maximizing the squared correlation is the same thing as minimizing squared error .. that’s why it’s called R^2, the explained variance ratio.

I don’t understand your question about OLSCoef and have not seen the papers you’re talking about.

By: Peter

Peter — Fri, 29 Mar 2013 03:24:12 +0000

Useful info:

I have a few questions (i am pretty new to that field). You say correlation is invariant of shifts.

i guess you just mean if the x-axis is not 1 2 3 4 but 10 20 30 or 30 20 10.. then it doesn’t change anything.

but you doesn’t mean that if i shift the signal i will get the same correlation right?

ex: [1 2 1 2 1] and [1 2 1 2 1], corr = 1
but if i cyclically shift [1 2 1 2 1] and [2 1 2 1 2], corr = -1
or if i just shift by padding zeros [1 2 1 2 1 0] and [0 1 2 1 2 1] then corr = -0.0588

Please elaborate on that.

Also could we say that distance correlation (1-correlation) can be considered as norm_1 or norm_2 distance somehow? for example when we want to minimize the squared errors, usually we need to use euclidean distance, but could pearson’s correlation also be used?

Ans last, OLSCoef(x,y) can be considered as scale invariant? is very correlated to cosine similarity which is not scale invariant (Pearson’s correlation is right?). Look at: “Patterns of Temporal Variation in Online Media” and “Fast time-series searching with scaling and shifting”. That confuses me.. but maybe i am missing something.

By: Correlation picture | AI and Social Science – Brendan O'Connor

Correlation picture | AI and Social Science – Brendan O'Connor — Mon, 18 Mar 2013 17:31:28 +0000

[...] related to the the post on cosine similarity, correlation and OLS. Anyway, I was just struck by the following diagram. It almost has a pop-art [...]

By: brendano

brendano — Mon, 18 Mar 2013 17:17:49 +0000

Great tip — I remember seeing that once but totally forgot about it.

Here’s a link, http://data.psych.udel.edu/laurenceau/PSYC861Regression%20Spring%202012/READINGS/rodgers-nicewander-1988-r-13-ways.pdf

By: Paul Moore

Paul Moore — Mon, 18 Mar 2013 17:14:00 +0000

A very helpful discussion – thanks.

Have you seen – ‘Thirteen Ways to Look at the Correlation Coefficient’ by Joseph Lee Rodgers; W. Alan Nicewander, The American Statistician, Vol. 42, No. 1. (Feb., 1988), pp. 59-66. It covers a related discussion.