Distribution
Statistics
Alexander Liss
Analysis of random values is supported with
histograms, which show distribution of a random value.
Analysis of stochastic processes, especially
comparative analysis of stochastic processes with multiple parameters could be
supported with a similar statistics – a Distribution Statistics, which is
defined here.
A Distribution Statistics shows distribution of one
parameter of the stochastic process in relation to another parameter, or it
shows distribution of a parameter over time.
These statistics are useful, for example, in
analysis of trading.
A Distribution Statistics allows
presentation various uneven distributions in a compact and visual form.
When there are two characteristics of
events A and B, and there is a set of N events and hence a set of pairs
(A1,B1),
…, (AN,BN)
a
histogram of “A at B” is created as follows.
First, an interval [z0,z1] selected, which
contains values
B1, …, BN,
z0 < B0
and BN <= z1
Second, it is divided on sub-intervals
using points bi
z0 = b0
< b1 < … < bn = z1
For each of n intervals [bi-1,bi),
with i from 1 to n, one adds-up values Aj of events, where the
characteristic Bj belongs to the interval [bi-1,bi).
The result is ai.
Note, that the very last interval has to be
inclusive: [bn-1,bn].
Now, for each interval [bi-1,bi),
there is a number ai, where i=1,…,n. We denote this set:
(b0)a1(b1)a2(b2)…(bn-1)an(bn)
Instead of adding up values Aj values Aj
of events, where the characteristic Bj belongs to the interval (bi-1,bi], one
could average them. This creates a different type of statistic with averages
over “buckets”.
When there are a few statistics (a1,…,ak)
for the same set of intervals [bi-1,bi), for example
there are a few statistics accumulated for one minute time intervals, they are
presented with series:
(b0)a11,a21,
… ak1(b1)a12,a22,
… ak2 …(bn-1)a1n,a2n,
… akn(bn)
When characteristic B is time of an event, the
same procedure defines a Distribution Statistics over time.
When ti is an end of time interval,
starting with t1 = “start time” + “time interval” and ending with tn
= “end time”, then one presents a set of statistics a1,…,ak
in a compact form:
(t0)a11,a21,
… ak1(t1)a12,a22,
… ak2 … (tn-1)a1n,a2n,
… akn(tn)
To normalize the presentation, instead of values ti,
values
zi
= ti – “start time”
are
used:
(0)a11,a21,
… ak1(z1)a12,a22,
… ak2 … (zn-1)a1n,a2n,
… akn(zn)