[Fis] Data, observations and distributions / was: Re: Probabilistic Entropy

From: Michel Petitjean <ptitjean@itodys.jussieu.fr>
Date: Mon 19 Apr 2004 - 12:58:15 CEST

To: <fis@listas.unizar.es>
Subj: Data, observations and distributions / was: Re: Probabilistic Entropy

Dear Loet,

I see what you mean when you write:
> The data are
> measurement results which can be considered as probability
> distributions. Thus, the expected information content of these
> distributions and the meaning which these are given in (highly codified)
> discourses provide the basis of science.

But I do not agree. Is there any probabilist confusing the empirical
distribution defined from a sample, and the values themselves ?
(are there probabilists among FISers ?).
Would you confuse a distribution with its observations, for a continuous
distribution (e.g. a gaussian) ? Surely no. Same thing for a Poisson
distribution (infinite discrete). The confusion arises in the finite
discrete case. In Probability, the distribution associated to a random
variable X exists, discarding if observations are made or not. For a
finite discrete r.v. taking equiprobable values such that P(X=xi)=1/N,
we still have no observations, unless we perform the "experiment",
and even if getting a sample of size N of X, we are not ensured
to observe exactly one time each value xi (may be x1 appears twice,
may be x2 is not observed,...). The empirical distribution based on
data is a special case, in which the N observed values (I assume the
usual euclidean case) are input in the definition of a probability
law and its distribution, for which each value has probability 1/N.
And even here, we cannot say that the N data "are" a sample of the
empirical law, even if observing such a sample has a certain
probability to occur. The data are measured, then the empirical
distribution is built.
The distributions exist, in general, outside any experiment (in the
probabilistic sense). Once the experiment is done, we have observations
(which merit together the name of "sample" under some conditions).
Now returning to science, and having, say N data xi (e.g. points in R^d):
they are not observations of any probability law: building a r.v. X
with a distribution such that P(X=xi)=1/N, is just a step in assigning
a mathematical model to the physical phenomenon from which the N data
were measured. And most time, the empirical distribution has little
interest by itself: the modeller will look if the N data, considered
as if they are a sample of some parent population, let him to
know something about the parent distribution (may be gaussian, may
be anything) and its associated parameters.
Information is attached to a distribution, with or without any
probabilistic or physical experiment. Measures are just physical data.
The relations between data and distributions exist in the spirit of the
modeller. I would say that the probabilistic entropy H, when it exists,
is just a parameter of a distribution, as the mean, the median or the
extreme values.

Michel Petitjean Email: petitjean@itodys.jussieu.fr
Editor-in-Chief of Entropy entropy@mdpi.org
ITODYS (CNRS, UMR 7086) ptitjean@ccr.jussieu.fr
1 rue Guy de la Brosse Phone: +33 (0)1 44 27 48 57
75005 Paris, France. FAX : +33 (0)1 44 27 68 14
http://www.mdpi.net http://www.mdpi.org
http://petitjeanmichel.free.fr/itoweb.petitjean.html
http://petitjeanmichel.free.fr/itoweb.petitjean.freeware.html
_______________________________________________
fis mailing list
fis@listas.unizar.es
http://webmail.unizar.es/mailman/listinfo/fis
Received on Mon Apr 19 13:01:09 2004

This archive was generated by hypermail 2.1.8 : Mon 07 Mar 2005 - 10:24:46 CET