Re: [Fis] probability versus power laws

From: Malcolm Forster <mforster@wisc.edu>
Date: Thu 08 Jul 2004 - 21:47:23 CEST

Dear FIS discussants,

I have an informational question about these two references:

(1) Ramon Ferrer i Cancho and Richard V. Sole, 2003. Least effort and the
origins of scaling in human language. PNAS, vol. 100, no. 3. pp: 788-791.
 www.pnas.org/cgi/doi/10.1073/pnas.0335980100

(2) Ulanowicz, R.E. and W.F. Wolff. 1991. Ecosystem flow networks: Loaded
dice? Math. Biosci 103:45-68.

(1) is about explaining Zifp's law, while (2) proposes that the connection
weights in real ecological networks are distributed according to a stable
non-Gaussian distribution, such as the Cauchy distribution (aka the
Lorentzian distribution, aka the Student-t distribution with one degree of
freedom).

(For more a definition of the Cauchy distribution, see
http://mathworld.wolfram.com/CauchyDistribution.html )

(1) and (2) are wonderfully written papers and I found them both fascinating
in their own right. But is there any connection between them?

I understand that stable non-Gaussian distributions are long-tailed in the
sense that they have infinite variances, and that they play the role of
attractors for sums of independent random variables that have distributions
with the same long-tail properties. The Gaussian distribution is a limiting
case of a stable non-Gaussian distribution--it is an attactor for a large
class of distributions that have finite variances, as is made precise in
various versions of the Central Limit Theorem.

(For a relatively short proof of the Central Limit Theorem, see
http://mathworld.wolfram.com/CentralLimitTheorem.html )

In other words, stable non-Gaussian distributions (which have infinite
variances) violate the conditions of the Central Limit Theorem. They have
their own "central limit theorems", as was proven 50 years ago:

Gnedenko, B. V. and A. N. Kolmogorov (1954) Limit Distributions for Sums of
Independent Random Variables, Reading, Mass.: Addison-Wesley.

The only connection I know between power laws and stable non-Gaussian
distributions (and the distributions in their basins of attraction) that is
that the tails are characterized in the limit by a power law of a certain
kind. (I was aware the Pareto distributions have such properties, but I was
unaware that the log-normal distribution is included in the same class.)

Natural language texts have been found to obey Zifp's law, which says that
when you rank words in the text from the highest frequency to the lowest,
the frequency of any word is inversely proportional to its rank. E.g., the
top ranked word has a frequency proportion to 1/1, the second ranked word
has a frequency proportional to 1/2, and so on. I suppose it can be viewed
as a probability function. But then the random variable (word rank) is not
like the random variables in stable non-Gaussian distributions--those random
variables have an unbounded domain that extends to plus infinity. The
unboundedness of the domain is necessary for asymptotic properties of the
tails to be well defined. If the number of words is n, then the rank of any
word must is less than or equal to n. So, the domain of the variable is
bounded. Therefore, Zifp's "probability function" has no asymptotic
properties!

In other words, I don't see any deep connection between Zifp's function and
long-tailed distribution or a power law distributions such as the Cauchy
distribution. (I seem to remember someone else asking similar questions,
but I haven't seen any answers.)

Here is my positive only positive suggestion: Perhaps Zifp's law can be
viewed as a *sampling* distribution taken from a long-tailed distribution?
Samples have finite variances even when sampled from distributions with
infinite variances. So, maybe there is a connection? Does anyone know?

Malcolm Forster

Professor of Philosophy
University of Wisconsin-Madison
http://philosophy.wisc.edu/forster

_______________________________________________
fis mailing list
fis@listas.unizar.es
http://webmail.unizar.es/mailman/listinfo/fis
Received on Thu Jul 8 21:49:08 2004

This archive was generated by hypermail 2.1.8 : Mon 07 Mar 2005 - 10:24:47 CET