<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Probabilistic interpretation of the B3 coreference resolution metric</title>
	<atom:link href="http://brenocon.com/blog/2013/08/probabilistic-interpretation-of-the-b3-coreference-resolution-metric/feed/" rel="self" type="application/rss+xml" />
	<link>http://brenocon.com/blog/2013/08/probabilistic-interpretation-of-the-b3-coreference-resolution-metric/</link>
	<description>cognition, language, social systems; statistics, visualization, computation</description>
	<lastBuildDate>Tue, 25 Nov 2025 13:11:20 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: scriptogr.am</title>
		<link>http://brenocon.com/blog/2013/08/probabilistic-interpretation-of-the-b3-coreference-resolution-metric/#comment-1658467</link>
		<dc:creator>scriptogr.am</dc:creator>
		<pubDate>Tue, 24 Jun 2014 12:07:04 +0000</pubDate>
		<guid isPermaLink="false">http://brenocon.com/blog/?p=2183#comment-1658467</guid>
		<description><![CDATA[中国と中国人は全世界で犯罪・工作・マナー違反を行い全世界に嫌われているので安心してお嫌いください。 

日本から見た日中戦争　→　中華にいずる日本あり。
右目を外す場合右目の右端の皮を右手で軽く右に引っ張って  梅本とか宮脇はカラコン入れてないとまともに見れたもんじゃない…]]></description>
		<content:encoded><![CDATA[<p>中国と中国人は全世界で犯罪・工作・マナー違反を行い全世界に嫌われているので安心してお嫌いください。 </p>
<p>日本から見た日中戦争　→　中華にいずる日本あり。<br />
右目を外す場合右目の右端の皮を右手で軽く右に引っ張って  梅本とか宮脇はカラコン入れてないとまともに見れたもんじゃない…</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://brenocon.com/blog/2013/08/probabilistic-interpretation-of-the-b3-coreference-resolution-metric/#comment-819592</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Tue, 24 Dec 2013 03:17:48 +0000</pubDate>
		<guid isPermaLink="false">http://brenocon.com/blog/?p=2183#comment-819592</guid>
		<description><![CDATA[Pairwise metrics are like micro-F measures (equally weighted by instance);  B3 is like macro-F measure (equally weighted by category). 

I like the pairwise metrics for evaluation because they&#039;re interpretable as estimates of future performance --- what&#039;s the probability that you recover a link between two mentions?   But that may not be what matters in an application.  If I&#039;m clustering news items to display in Google News, I probably have different cost/benefit for linking/overlinking than I do with electronic health records.  And this is only a problem for clustering/coref, not for linkage to a database of entities;  for the latter, I think individual scores make sense.

The quadratic nature is important because it tells you how breaking 200 mentions down into subgroups gets scored.   With  200 mentions, if I recover two clusters of 100, I recover 19,800 of the 39,800 links.   One cluster of 150 with two more clusters of size 25 is better, scoring 23,550.  That&#039;s not much better than 150 with 50 singletons, which scores 22,350.

Probabilistically, precision = TP/(TP + FP)  is tricky because the denominator depends on  how many positive results the classifier returns (i.e., TP + FP).    Recall (aka sensitivity) = TP / (TP + FN) is different --- the denominator depends only on the number of positive instances in the test data.  So if you do a system evaluation, all systems have the same denominator for recall, but they vary in denominators for precision.   It&#039;s easier to work probabilistically with specificity = TN / (TN + FP), which is like recall for negative cases.  Along with prevalence = (TP + FN) / (TP + FN + TN + FP) it lets you predict precision.]]></description>
		<content:encoded><![CDATA[<p>Pairwise metrics are like micro-F measures (equally weighted by instance);  B3 is like macro-F measure (equally weighted by category). </p>
<p>I like the pairwise metrics for evaluation because they&#8217;re interpretable as estimates of future performance &#8212; what&#8217;s the probability that you recover a link between two mentions?   But that may not be what matters in an application.  If I&#8217;m clustering news items to display in Google News, I probably have different cost/benefit for linking/overlinking than I do with electronic health records.  And this is only a problem for clustering/coref, not for linkage to a database of entities;  for the latter, I think individual scores make sense.</p>
<p>The quadratic nature is important because it tells you how breaking 200 mentions down into subgroups gets scored.   With  200 mentions, if I recover two clusters of 100, I recover 19,800 of the 39,800 links.   One cluster of 150 with two more clusters of size 25 is better, scoring 23,550.  That&#8217;s not much better than 150 with 50 singletons, which scores 22,350.</p>
<p>Probabilistically, precision = TP/(TP + FP)  is tricky because the denominator depends on  how many positive results the classifier returns (i.e., TP + FP).    Recall (aka sensitivity) = TP / (TP + FN) is different &#8212; the denominator depends only on the number of positive instances in the test data.  So if you do a system evaluation, all systems have the same denominator for recall, but they vary in denominators for precision.   It&#8217;s easier to work probabilistically with specificity = TN / (TN + FP), which is like recall for negative cases.  Along with prevalence = (TP + FN) / (TP + FN + TN + FP) it lets you predict precision.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Kummerfeld</title>
		<link>http://brenocon.com/blog/2013/08/probabilistic-interpretation-of-the-b3-coreference-resolution-metric/#comment-426998</link>
		<dc:creator>Jonathan Kummerfeld</dc:creator>
		<pubDate>Sun, 01 Sep 2013 03:28:35 +0000</pubDate>
		<guid isPermaLink="false">http://brenocon.com/blog/?p=2183#comment-426998</guid>
		<description><![CDATA[Interesting post! One quick note, the changes to the scorer are on the 2012 shared task site:

http://conll.cemantix.org/2012/software.html

Hopefully we&#039;ll be seeing a new version with the other metrics fixed soon too.]]></description>
		<content:encoded><![CDATA[<p>Interesting post! One quick note, the changes to the scorer are on the 2012 shared task site:</p>
<p><a href="http://conll.cemantix.org/2012/software.html" rel="nofollow">http://conll.cemantix.org/2012/software.html</a></p>
<p>Hopefully we&#8217;ll be seeing a new version with the other metrics fixed soon too.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
