Python module#

Summary#

`enstat.scalar`([dtype])	Ensemble average of a scalar.
`enstat.static`(compute_variance, shape, ...)	Ensemble average of an nd-array (of same size for all samples).
`enstat.dynamic1d`(compute_variance, size, ...)	Ensemble average of an 1d-array (which grows depending of the size of the samples).
`enstat.histogram`(bin_edges[, right, bound_error])	Histogram.
`enstat.binned`(bin_edges[, right, ...])	Ensemble average after binning.

scalar#

class enstat.scalar(dtype=<class 'float'>)#

Bases: object

Ensemble average of a scalar. Example:

import enstat

average = enstat.scalar()
average += 1.0
average += 2.0
average += 3.0
print(average.mean())  # 2.0

Add samples to it using scalar.add_sample(), or simply average += datum. The mean, variance, and standard deviation can be obtained at any time. They are derived from the following members:

scalar.first: Sum of the first statistical moment.
scalar.second: Sum of the second statistical moment.
scalar.norm: Number of samples.

To restore data: use scalar.restore(). In short: restored = enstat.scalar.restore(**dict(average)).

Parameters:: dtype – The type to use for the sum of the first (and second) statistical moment. Tip: Python’s int is unbounded, but e.g. np.int64 is not.

add_sample(datum: float | ArrayLike)#

Add a sample. Internally changes the sums of the first and second statistical moments and normalisation.

Parameters:: datum – Sample.

mean() → float#

Current mean. Samples can be added afterwards without any problems.

Returns:: Mean.

classmethod restore(first: float = 0, second: float = 0, norm: float = 0)#

Restore previous data.

Parameters:

first (float) – Sum of the first moment.
second (float) – Sum of the second moment.
norm (int) – Number of samples.

std() → float#

Current standard deviation. Samples can be added afterwards without any problems.

Returns:: Standard deviation.

variance() → float#

Current variance. Samples can be added afterwards without any problems.

Returns:: Variance.

static#

class enstat.static(compute_variance: bool = True, shape: tuple[int] | None = None, dtype: ~numpy.dtype[~typing.Any] | None | type[typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[typing.Any, int] | tuple[typing.Any, typing.Union[typing.SupportsIndex, collections.abc.Sequence[typing.SupportsIndex]]] | list[typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[typing.Any, typing.Any] = <class 'numpy.float64'>)#

Bases: object

Ensemble average of an nd-array (of same size for all samples). .. code-block:: python

import enstat import numpy as np

data = np.random.random(35 * 50).reshape(35, 50)

average = enstat.static() for datum in data:

average += datum

print(average.mean()) # approximately [0.5, 0.5, …]

Add samples to it using static.add_sample(), or simply average += datum. The mean, variance, and standard deviation can be obtained at any time. They are derived from the following members:

static.first: Sum of the first statistical moment.
static.second: Sum of the second statistical moment.
static.norm: Number of samples.

To restore data: use static.restore(). In short: restored = enstat.static.restore(**dict(average)).

For convenience, the following members are available:

static.shape: Shape of the data.
static.size: Size of the data (= prod(shape)).

Parameters:

compute_variance – If set False no second moment will be computed (making things slightly faster). In that case, the variance an standard deviation will not be available.
shape – The shape of the data. If not specified it is determined form the first sample.
dtype – The type of the data. If not specified it is determined form the first sample.

add_point(datum: float | int, index: int)#

Add a single point. Note that:

ensemble.add_point(datum, index)

Is equivalent to:

data = np.empty(ensemble.shape)
mask = np.ones(ensemble.shape, dtype=bool)
data[index] = datum
mask[index] = False
ensemble.add_sample(data, mask)

(but faster).

add_sample(data: ArrayLike, mask: ArrayLike | None = None)#

Add a sample. Internally changes the sums of the first and second statistical moments and normalisation.

Parameters:

data – The sample.
mask – Mask entries (boolean array).

property dtype#: The type of the data.

mean(min_norm: int = 1) → ArrayLike#

Current mean. Samples can be added afterwards without any problems.

Parameters:: min_norm – Minimum number of samples to consider as value output.
Returns:: Mean.

ravel() → scalar#

Return as scalar: all entries are summed.

Returns:: Ensemble average.

classmethod restore(first: ArrayLike | None = None, second: ArrayLike | None = None, norm: ArrayLike | None = None)#

Restore previous data.

Parameters:

first – Continued computation: Sum of the first moment.
second – Continued computation: Sum of the second moment.
norm – Continued computation: Number of samples (integer).

property shape#: The shape of the data.

property size#: The size of the data.

squash(n: int | list[int])#

Squash the data to a smaller size by summing over blocks of size n. For example, suppose that:

>>> avg.norm
[[2, 2, 3, 1, 1],
 [2, 2, 1, 1, 3],
 [1, 1, 2, 2, 1],
 [2, 1, 2, 2, 2]]

Then calling:

>>> avg.squash(2)
>>> avg.norm
[[8, 6, 4],
 [5, 8, 3]]

std(min_norm: int = 2) → ArrayLike#

Current standard deviation. Samples can be added afterwards without any problems.

Parameters:: min_norm – Minimum number of samples to consider as value output.
Returns:: Standard deviation.

variance(min_norm: int = 2) → ArrayLike#

Current variance. Samples can be added afterwards without any problems.

Parameters:: min_norm – Minimum number of samples to consider as value output.
Returns:: Variance.

dynamic1d#

class enstat.dynamic1d(compute_variance: bool = True, size: int | None = None, dtype: ~numpy.dtype[~typing.Any] | None | type[typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[typing.Any, int] | tuple[typing.Any, typing.Union[typing.SupportsIndex, collections.abc.Sequence[typing.SupportsIndex]]] | list[typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[typing.Any, typing.Any] = <class 'numpy.float64'>)#

Bases: static

Ensemble average of an 1d-array (which grows depending of the size of the samples). Add samples to it using dynamic1d.add_sample(), or simply average += datum. The mean, variance, and standard deviation can be obtained at any time. Also the sums of the first and statistical moments, as well as the number of samples can be obtained at any time.

To restore data: use dynamic1d.restore(). In short: restored = enstat.dynamic1d.restore(**dict(average)).

Parameters:

compute_variance – If set False no second moment will be computed. In that case, the variance an standard deviation will not be available.
size – The initial size of the data. If not specified it is determined form the first sample.
dtype – The type of the data. If not specified it is determined form the first sample.

add_point(datum: float | int, index: int)#

Add a single point. Note that:

ensemble.add_point(datum, index)

Is equivalent to:

data = np.empty(ensemble.shape)
mask = np.ones(ensemble.shape, dtype=bool)
data[index] = datum
mask[index] = False
ensemble.add_sample(data, mask)

(but faster).

add_sample(data: ArrayLike)#

Add a sample. Internally changes the sums of the first and second statistical moments and normalisation.

Parameters:

data – The sample.
mask – Mask entries (boolean array).

property dtype#: The type of the data.

mean(min_norm: int = 1) → ArrayLike#

Current mean. Samples can be added afterwards without any problems.

Parameters:: min_norm – Minimum number of samples to consider as value output.
Returns:: Mean.

ravel() → scalar#

Return as scalar: all entries are summed.

Returns:: Ensemble average.

classmethod restore(first: ArrayLike | None = None, second: ArrayLike | None = None, norm: ArrayLike | None = None)#

Restore previous data.

Parameters:

first – Continued computation: Sum of the first moment.
second – Continued computation: Sum of the second moment.
norm – Continued computation: Number of samples (integer).

property shape#: The shape of the data.

property size#: The size of the data.

squash(n: int | list[int])#

Squash the data to a smaller size by summing over blocks of size n. For example, suppose that:

>>> avg.norm
[[2, 2, 3, 1, 1],
 [2, 2, 1, 1, 3],
 [1, 1, 2, 2, 1],
 [2, 1, 2, 2, 2]]

Then calling:

>>> avg.squash(2)
>>> avg.norm
[[8, 6, 4],
 [5, 8, 3]]

std(min_norm: int = 2) → ArrayLike#

Current standard deviation. Samples can be added afterwards without any problems.

Parameters:: min_norm – Minimum number of samples to consider as value output.
Returns:: Standard deviation.

variance(min_norm: int = 2) → ArrayLike#

Current variance. Samples can be added afterwards without any problems.

Parameters:: min_norm – Minimum number of samples to consider as value output.
Returns:: Variance.

histogram#

class enstat.histogram(bin_edges: ArrayLike, right: bool = False, bound_error: str = 'raise')#

Bases: object

Histogram. Example single dataset:

data = [0, 0, 0, 1, 1, 2]
bin_edges = [-0.5, 0.5, 1.5, 2.5]
hist = enstat.histogram.from_data(data, bin_edges)
print(hist.count)

Example ensemble:

data = np.random.random(35 * 50).reshape(35, 50)
bin_edges = np.linspace(0, 1, 11)
hist = enstat.histogram(bin_edges)

for datum in data:
    hist += datum

print(hist.count)

One can add samples to it using Histogram.add_sample(), or simply hist += datum.

Members:

scalar.count: The number of samples in each bin.
scalar.bin_edges: See option bin_edges.
scalar.x: Midpoint of each bin.
scalar.p: Probability density of each bin.
scalar.right: See option right.
scalar.bound_error: See option bound_error
scalar.count_left: Number of samples that fall below the leftmost bin.
scalar.count_right: Number of samples that fall above the rightmost bin.

Parameters:

bin_edges – The bin-edges.
right – Whether the bin includes the right edge (or the left edge) see numpy.digitize.
bound_error –
What to do if a sample falls out of the bin range:
- "raise": raise an error
- "ignore": ignore the data that are out of range
- "norm": change the normalisation of the density

add_sample(data: ArrayLike)#: Add a sample to the histogram. You can also use the + operator.

as_integer()#: Merge bins not encompassing an integer with the preceding bin. For example: a bin with edges [1.1, 1.9] is removed, but [0.9, 1.1] is not removed.

property density: ArrayLike#: The probability density function at the bin.

classmethod from_data(data: ArrayLike, bins: int | None = None, mode: str = 'equal', min_count: int | None = None, min_width: float | None = None, integer: bool = False, bin_edges: ArrayLike | None = None, bound_error: str = 'raise')#

Construct a histogram from data.

Parameters:

data – Data (flattened).
bins – Number of bins.
mode –
Mode with which to compute the bin-edges.
- 'equal': each bin has equal width.
- 'log': logarithmic spacing.
- 'uniform': uniform number of data-points per bin.
- 'voronoi': each bin is the region between two adjacent data-points.
min_count – Minimum number of data-points per bin.
min_width – Minimum width of a bin.
integer – If True, bins not encompassing an integer are removed (e.g. a bin with edges [1.1, 1.9] is removed, but [0.9, 1.1] is not removed).
bin_edges – Specify the bin-edges (overrides bins and mode).
bound_error –
What to do if a sample falls out of the bin range:
- "raise": raise an error
- "ignore": ignore the data that are out of range
- "norm": change the normalisation of the density

Returns:

The Histogram object.

interp(bin_edges: ArrayLike)#

Interpolate the histogram to a new set of bin-edges.

Parameters:: bin_edges – The new bin-edges.

lstrip(min_count: int = 0)#

Strip the histogram of empty bins to the left.

Parameters:: min_count – The minimum count for a bin to be considered non-empty.

merge_left(index: ArrayLike)#

Merge the bins to the left of index into index.

Parameters:: index – The indices of the bin to merge into.

merge_right(index: ArrayLike)#

Merge the bins to the right of index into index.

Parameters:: index – The indices of the bin to merge into.

property p: ArrayLike#: The probability density function at the bin.

property plot: tuple[ArrayLike, ArrayLike]#: Alias for (x, density).

classmethod restore(bin_edges: ArrayLike, count: ArrayLike, count_left: int = 0, count_right: int = 0, bound_left: float | None = None, bound_right: float | None = None, bound_error: str = 'raise', right: bool = False)#

Restore from a previous result:

hist = enstat.histogram...
state = dict(hist)

restored = enstat.histogram.from_histogram(**state)

Parameters:

bin_edges – The bin-edges.
count – The count.
count_left – Number of items below the left bound.
count_right – Number of items above the right bound.
bound_left – The minimum value below the left bound.
bound_right – The maximum value above the right bound.
bound_error – What to do if a sample falls out of the bin range.
right – Whether the bin includes the right edge (or the left edge) see numpy.digitize.

rstrip(min_count: int = 0)#

Strip the histogram of empty bins to the right.

Parameters:: min_count – The minimum count for a bin to be considered non-empty.

squash(n: int)#

Squash the histogram by combining n sequential bins into one (the last bin may be smaller).

Parameters:: n – Number of bins to group.

strip(min_count: int = 0)#

Strip the histogram of empty bins to the left and the right.

Parameters:: min_count – The minimum count for a bin to be considered non-empty.

property x: ArrayLike#: The bin centers.

binned#

class enstat.binned(bin_edges: ArrayLike, right: bool = False, bound_error: str = 'raise', names: list[str] = [])#

Bases: object

Ensemble average after binning. Example:

import numpy as np
import enstat

x = np.array([0.5, 1.5, 2.5])
y = np.array([1, 2, 3])
bin_edges = np.array([0, 1, 2, 3])
binned = enstat.binned.from_data(x, y, bin_edges=bin_edges)
print(binned[0].mean())

Parameters:

bin_edges – The bin-edges.
right – Whether the bin includes the right edge (or the left edge) see numpy.digitize.
bound_error –
What to do if a sample falls out of the bin range:
- "raise": raise an error
- "ignore": ignore the data that are out of range
names – The names of the variables to store.

add_sample(*args: ArrayLike, **kwargs: ArrayLike)#

Add a sample. If you use only one variable, you can also use the + operator.

Parameters:: args – Different variables of data to add. The binning is done on the first argument and applied to all other arguments.

classmethod from_data(*args: ArrayLike, names: list[str] = [], **kwargs)#

Construct from data.

Parameters:

args – Different variables of data to add. The binning is done on the first argument and applied to all other arguments.
kwargs – Automatic binning settings, see histogram.from_data().

Returns:

The Histogram object.