<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: p-values, CDF&#8217;s, NLP etc.</title>
	<atom:link href="https://brenocon.com/blog/2012/07/p-values-cdfs-nlp-etc/feed/" rel="self" type="application/rss+xml" />
	<link>https://brenocon.com/blog/2012/07/p-values-cdfs-nlp-etc/</link>
	<description>cognition, language, social systems; statistics, visualization, computation</description>
	<lastBuildDate>Tue, 25 Nov 2025 13:11:20 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: comment pirater un compte facebook gratuitement</title>
		<link>https://brenocon.com/blog/2012/07/p-values-cdfs-nlp-etc/#comment-1595256</link>
		<dc:creator>comment pirater un compte facebook gratuitement</dc:creator>
		<pubDate>Fri, 06 Jun 2014 10:11:20 +0000</pubDate>
		<guid isPermaLink="false">http://brenocon.com/blog/?p=1639#comment-1595256</guid>
		<description><![CDATA[Steep,léger  cheminée en ce qui concerne marteau sauf sont 
pirater un compte facebook gratuitement youtube.
gifler proche longueur, pirater un compte facebook gratuit en ligne.]]></description>
		<content:encoded><![CDATA[<p>Steep,léger  cheminée en ce qui concerne marteau sauf sont<br />
pirater un compte facebook gratuitement youtube.<br />
gifler proche longueur, pirater un compte facebook gratuit en ligne.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jannette</title>
		<link>https://brenocon.com/blog/2012/07/p-values-cdfs-nlp-etc/#comment-1532253</link>
		<dc:creator>Jannette</dc:creator>
		<pubDate>Tue, 20 May 2014 06:18:36 +0000</pubDate>
		<guid isPermaLink="false">http://brenocon.com/blog/?p=1639#comment-1532253</guid>
		<description><![CDATA[Hello! Do you use Twitter? I&#039;d like to follow you if that would 
be okay. I&#039;m absolutely enjoying your blog 
and look forward to new updates.]]></description>
		<content:encoded><![CDATA[<p>Hello! Do you use Twitter? I&#8217;d like to follow you if that would<br />
be okay. I&#8217;m absolutely enjoying your blog<br />
and look forward to new updates.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Taylor Berg-Kirkpatrick</title>
		<link>https://brenocon.com/blog/2012/07/p-values-cdfs-nlp-etc/#comment-173449</link>
		<dc:creator>Taylor Berg-Kirkpatrick</dc:creator>
		<pubDate>Tue, 14 Aug 2012 23:06:58 +0000</pubDate>
		<guid isPermaLink="false">http://brenocon.com/blog/?p=1639#comment-173449</guid>
		<description><![CDATA[Lots of good points here. However, I&#039;d like to help fix a confusion: &quot;Every dataset has its own null hypothesis cdf.&quot; This isn&#039;t true. Every dataset AND PAIR OF SYSTEMS has its own null hypothesis cdf. It is true that each point on our plot comes from a cdf... that&#039;s just the definition of p-value. But every point on our plot comes from a DIFFERENT cdf because each point corresponds to a different pair of systems.

I like your example where the complications of non-parametric tests are stripped away. Let me be specific in the context of this example. Because each system (and each pair of systems) is different, every point has a different Var[d(one unit)]. Thus, the normal-cdf used to compute the p-value will be different for every point on the plot. Therefore the &quot;curve-shaped trends&quot; (I&#039;ll admit this bigram was overused in the paper) we see are not the result of a basic statistical fact as you claim. Instead, they tell us that in the region we care about (i.e. near 0.05) the effects of system variation are dominated by the effects of test set size, and as a result we CAN loosely treat these plots as though they arise from a single cdf. This may not be particularly surprising, but it&#039;s also not obvious a priori. By the way, we do find thresholds that loosely imply significance. This is contrary to what you wrote in your blog. Not sure if that was a typo.

Anyway, I do think your points about complicated metrics are interesting. Something that you may already know: BLEU is asymptotically normal, if you ignore the single discontinuity in the derivative of the brevity penalty. You can prove this with Slutsky&#039;s theorem and the delta method. So perhaps a parametric test is just fine for BLEU when the test set is large.]]></description>
		<content:encoded><![CDATA[<p>Lots of good points here. However, I&#8217;d like to help fix a confusion: &#8220;Every dataset has its own null hypothesis cdf.&#8221; This isn&#8217;t true. Every dataset AND PAIR OF SYSTEMS has its own null hypothesis cdf. It is true that each point on our plot comes from a cdf&#8230; that&#8217;s just the definition of p-value. But every point on our plot comes from a DIFFERENT cdf because each point corresponds to a different pair of systems.</p>
<p>I like your example where the complications of non-parametric tests are stripped away. Let me be specific in the context of this example. Because each system (and each pair of systems) is different, every point has a different Var[d(one unit)]. Thus, the normal-cdf used to compute the p-value will be different for every point on the plot. Therefore the &#8220;curve-shaped trends&#8221; (I&#8217;ll admit this bigram was overused in the paper) we see are not the result of a basic statistical fact as you claim. Instead, they tell us that in the region we care about (i.e. near 0.05) the effects of system variation are dominated by the effects of test set size, and as a result we CAN loosely treat these plots as though they arise from a single cdf. This may not be particularly surprising, but it&#8217;s also not obvious a priori. By the way, we do find thresholds that loosely imply significance. This is contrary to what you wrote in your blog. Not sure if that was a typo.</p>
<p>Anyway, I do think your points about complicated metrics are interesting. Something that you may already know: BLEU is asymptotically normal, if you ignore the single discontinuity in the derivative of the brevity penalty. You can prove this with Slutsky&#8217;s theorem and the delta method. So perhaps a parametric test is just fine for BLEU when the test set is large.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.013 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2026-04-04 03:48:12 -->
