Python module#
Summary#
|
Ensemble average of a scalar. |
|
Ensemble average of an nd-array (of same size for all samples). |
|
Ensemble average of an 1d-array (which grows depending of the size of the samples). |
|
Histogram. |
|
Ensemble average after binning. |
scalar#
- class enstat.scalar(dtype=<class 'float'>)#
Bases:
object
Ensemble average of a scalar. Example:
import enstat average = enstat.scalar() average += 1.0 average += 2.0 average += 3.0 print(average.mean()) # 2.0
Add samples to it using
scalar.add_sample()
, or simply average += datum. The mean, variance, and standard deviation can be obtained at any time. They are derived from the following members:scalar.first
: Sum of the first statistical moment.scalar.second
: Sum of the second statistical moment.scalar.norm
: Number of samples.
To restore data: use
scalar.restore()
. In short: restored = enstat.scalar.restore(**dict(average)).- Parameters:
dtype – The type to use for the sum of the first (and second) statistical moment. Tip: Python’s
int
is unbounded, but e.g.np.int64
is not.
- add_sample(datum: float | ArrayLike)#
Add a sample. Internally changes the sums of the first and second statistical moments and normalisation.
- Parameters:
datum – Sample.
- mean() float #
Current mean. Samples can be added afterwards without any problems.
- Returns:
Mean.
- classmethod restore(first: float = 0, second: float = 0, norm: float = 0)#
Restore previous data.
- Parameters:
first (float) – Sum of the first moment.
second (float) – Sum of the second moment.
norm (int) – Number of samples.
- std() float #
Current standard deviation. Samples can be added afterwards without any problems.
- Returns:
Standard deviation.
- variance() float #
Current variance. Samples can be added afterwards without any problems.
- Returns:
Variance.
static#
- class enstat.static(compute_variance: bool = True, shape: tuple[int] | None = None, dtype: ~numpy.dtype[~typing.Any] | None | type[typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[typing.Any, int] | tuple[typing.Any, typing.Union[typing.SupportsIndex, collections.abc.Sequence[typing.SupportsIndex]]] | list[typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[typing.Any, typing.Any] = <class 'numpy.float64'>)#
Bases:
object
Ensemble average of an nd-array (of same size for all samples). .. code-block:: python
import enstat import numpy as np
data = np.random.random(35 * 50).reshape(35, 50)
average = enstat.static() for datum in data:
average += datum
print(average.mean()) # approximately [0.5, 0.5, …]
Add samples to it using
static.add_sample()
, or simply average += datum. The mean, variance, and standard deviation can be obtained at any time. They are derived from the following members:static.first
: Sum of the first statistical moment.static.second
: Sum of the second statistical moment.static.norm
: Number of samples.
To restore data: use
static.restore()
. In short: restored = enstat.static.restore(**dict(average)).For convenience, the following members are available:
static.shape
: Shape of the data.static.size
: Size of the data (= prod(shape)).
- Parameters:
compute_variance – If set
False
no second moment will be computed (making things slightly faster). In that case, the variance an standard deviation will not be available.shape – The shape of the data. If not specified it is determined form the first sample.
dtype – The type of the data. If not specified it is determined form the first sample.
- add_point(datum: float | int, index: int)#
Add a single point. Note that:
ensemble.add_point(datum, index)
Is equivalent to:
data = np.empty(ensemble.shape) mask = np.ones(ensemble.shape, dtype=bool) data[index] = datum mask[index] = False ensemble.add_sample(data, mask)
(but faster).
- add_sample(data: ArrayLike, mask: ArrayLike | None = None)#
Add a sample. Internally changes the sums of the first and second statistical moments and normalisation.
- Parameters:
data – The sample.
mask – Mask entries (boolean array).
- property dtype#
The type of the data.
- mean(min_norm: int = 1) ArrayLike #
Current mean. Samples can be added afterwards without any problems.
- Parameters:
min_norm – Minimum number of samples to consider as value output.
- Returns:
Mean.
- classmethod restore(first: ArrayLike | None = None, second: ArrayLike | None = None, norm: ArrayLike | None = None)#
Restore previous data.
- Parameters:
first – Continued computation: Sum of the first moment.
second – Continued computation: Sum of the second moment.
norm – Continued computation: Number of samples (integer).
- property shape#
The shape of the data.
- property size#
The size of the data.
- squash(n: int | list[int])#
Squash the data to a smaller size by summing over blocks of size
n
. For example, suppose that:>>> avg.norm [[2, 2, 3, 1, 1], [2, 2, 1, 1, 3], [1, 1, 2, 2, 1], [2, 1, 2, 2, 2]]
Then calling:
>>> avg.squash(2) >>> avg.norm [[8, 6, 4], [5, 8, 3]]
- std(min_norm: int = 2) ArrayLike #
Current standard deviation. Samples can be added afterwards without any problems.
- Parameters:
min_norm – Minimum number of samples to consider as value output.
- Returns:
Standard deviation.
- variance(min_norm: int = 2) ArrayLike #
Current variance. Samples can be added afterwards without any problems.
- Parameters:
min_norm – Minimum number of samples to consider as value output.
- Returns:
Variance.
dynamic1d#
- class enstat.dynamic1d(compute_variance: bool = True, size: int | None = None, dtype: ~numpy.dtype[~typing.Any] | None | type[typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[typing.Any, int] | tuple[typing.Any, typing.Union[typing.SupportsIndex, collections.abc.Sequence[typing.SupportsIndex]]] | list[typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[typing.Any, typing.Any] = <class 'numpy.float64'>)#
Bases:
static
Ensemble average of an 1d-array (which grows depending of the size of the samples). Add samples to it using
dynamic1d.add_sample()
, or simply average += datum. The mean, variance, and standard deviation can be obtained at any time. Also the sums of the first and statistical moments, as well as the number of samples can be obtained at any time.To restore data: use
dynamic1d.restore()
. In short: restored = enstat.dynamic1d.restore(**dict(average)).- Parameters:
compute_variance – If set
False
no second moment will be computed. In that case, the variance an standard deviation will not be available.size – The initial size of the data. If not specified it is determined form the first sample.
dtype – The type of the data. If not specified it is determined form the first sample.
- add_point(datum: float | int, index: int)#
Add a single point. Note that:
ensemble.add_point(datum, index)
Is equivalent to:
data = np.empty(ensemble.shape) mask = np.ones(ensemble.shape, dtype=bool) data[index] = datum mask[index] = False ensemble.add_sample(data, mask)
(but faster).
- add_sample(data: ArrayLike)#
Add a sample. Internally changes the sums of the first and second statistical moments and normalisation.
- Parameters:
data – The sample.
mask – Mask entries (boolean array).
- property dtype#
The type of the data.
- mean(min_norm: int = 1) ArrayLike #
Current mean. Samples can be added afterwards without any problems.
- Parameters:
min_norm – Minimum number of samples to consider as value output.
- Returns:
Mean.
- classmethod restore(first: ArrayLike | None = None, second: ArrayLike | None = None, norm: ArrayLike | None = None)#
Restore previous data.
- Parameters:
first – Continued computation: Sum of the first moment.
second – Continued computation: Sum of the second moment.
norm – Continued computation: Number of samples (integer).
- property shape#
The shape of the data.
- property size#
The size of the data.
- squash(n: int | list[int])#
Squash the data to a smaller size by summing over blocks of size
n
. For example, suppose that:>>> avg.norm [[2, 2, 3, 1, 1], [2, 2, 1, 1, 3], [1, 1, 2, 2, 1], [2, 1, 2, 2, 2]]
Then calling:
>>> avg.squash(2) >>> avg.norm [[8, 6, 4], [5, 8, 3]]
- std(min_norm: int = 2) ArrayLike #
Current standard deviation. Samples can be added afterwards without any problems.
- Parameters:
min_norm – Minimum number of samples to consider as value output.
- Returns:
Standard deviation.
- variance(min_norm: int = 2) ArrayLike #
Current variance. Samples can be added afterwards without any problems.
- Parameters:
min_norm – Minimum number of samples to consider as value output.
- Returns:
Variance.
histogram#
- class enstat.histogram(bin_edges: ArrayLike, right: bool = False, bound_error: str = 'raise')#
Bases:
object
Histogram. Example single dataset:
data = [0, 0, 0, 1, 1, 2] bin_edges = [-0.5, 0.5, 1.5, 2.5] hist = enstat.histogram.from_data(data, bin_edges) print(hist.count)
Example ensemble:
data = np.random.random(35 * 50).reshape(35, 50) bin_edges = np.linspace(0, 1, 11) hist = enstat.histogram(bin_edges) for datum in data: hist += datum print(hist.count)
One can add samples to it using
Histogram.add_sample()
, or simply hist += datum.Members:
scalar.count
: The number of samples in each bin.scalar.bin_edges
: See optionbin_edges
.scalar.x
: Midpoint of each bin.scalar.p
: Probability density of each bin.scalar.right
: See optionright
.scalar.bound_error
: See optionbound_error
scalar.count_left
: Number of samples that fall below the leftmost bin.scalar.count_right
: Number of samples that fall above the rightmost bin.
- Parameters:
bin_edges – The bin-edges.
right – Whether the bin includes the right edge (or the left edge) see numpy.digitize.
bound_error –
What to do if a sample falls out of the bin range:
"raise"
: raise an error"ignore"
: ignore the data that are out of range"norm"
: change the normalisation of the density
- add_sample(data: ArrayLike)#
Add a sample to the histogram. You can also use the
+
operator.
- as_integer()#
Merge bins not encompassing an integer with the preceding bin. For example: a bin with edges
[1.1, 1.9]
is removed, but[0.9, 1.1]
is not removed.
- property density: ArrayLike#
The probability density function at the bin.
- classmethod from_data(data: ArrayLike, bins: int | None = None, mode: str = 'equal', min_count: int | None = None, min_width: float | None = None, integer: bool = False, bin_edges: ArrayLike | None = None, bound_error: str = 'raise')#
Construct a histogram from data.
- Parameters:
data – Data (flattened).
bins – Number of bins.
mode –
Mode with which to compute the bin-edges.
'equal'
: each bin has equal width.'log'
: logarithmic spacing.'uniform'
: uniform number of data-points per bin.'voronoi'
: each bin is the region between two adjacent data-points.
min_count – Minimum number of data-points per bin.
min_width – Minimum width of a bin.
integer – If
True
, bins not encompassing an integer are removed (e.g. a bin with edges[1.1, 1.9]
is removed, but[0.9, 1.1]
is not removed).bin_edges – Specify the bin-edges (overrides
bins
andmode
).bound_error –
What to do if a sample falls out of the bin range:
"raise"
: raise an error"ignore"
: ignore the data that are out of range"norm"
: change the normalisation of the density
- Returns:
The
Histogram
object.
- interp(bin_edges: ArrayLike)#
Interpolate the histogram to a new set of bin-edges.
- Parameters:
bin_edges – The new bin-edges.
- lstrip(min_count: int = 0)#
Strip the histogram of empty bins to the left.
- Parameters:
min_count – The minimum count for a bin to be considered non-empty.
- merge_left(index: ArrayLike)#
Merge the bins to the left of
index
intoindex
.- Parameters:
index – The indices of the bin to merge into.
- merge_right(index: ArrayLike)#
Merge the bins to the right of
index
intoindex
.- Parameters:
index – The indices of the bin to merge into.
- property p: ArrayLike#
The probability density function at the bin.
- property plot: tuple[ArrayLike, ArrayLike]#
Alias for
(x, density)
.
- classmethod restore(bin_edges: ArrayLike, count: ArrayLike, count_left: int = 0, count_right: int = 0, bound_left: float | None = None, bound_right: float | None = None, bound_error: str = 'raise', right: bool = False)#
Restore from a previous result:
hist = enstat.histogram... state = dict(hist) restored = enstat.histogram.from_histogram(**state)
- Parameters:
bin_edges – The bin-edges.
count – The count.
count_left – Number of items below the left bound.
count_right – Number of items above the right bound.
bound_left – The minimum value below the left bound.
bound_right – The maximum value above the right bound.
bound_error – What to do if a sample falls out of the bin range.
right – Whether the bin includes the right edge (or the left edge) see numpy.digitize.
- rstrip(min_count: int = 0)#
Strip the histogram of empty bins to the right.
- Parameters:
min_count – The minimum count for a bin to be considered non-empty.
- squash(n: int)#
Squash the histogram by combining
n
sequential bins into one (the last bin may be smaller).- Parameters:
n – Number of bins to group.
- strip(min_count: int = 0)#
Strip the histogram of empty bins to the left and the right.
- Parameters:
min_count – The minimum count for a bin to be considered non-empty.
- property x: ArrayLike#
The bin centers.
binned#
- class enstat.binned(bin_edges: ArrayLike, right: bool = False, bound_error: str = 'raise', names: list[str] = [])#
Bases:
object
Ensemble average after binning. Example:
import numpy as np import enstat x = np.array([0.5, 1.5, 2.5]) y = np.array([1, 2, 3]) bin_edges = np.array([0, 1, 2, 3]) binned = enstat.binned.from_data(x, y, bin_edges=bin_edges) print(binned[0].mean())
- Parameters:
bin_edges – The bin-edges.
right – Whether the bin includes the right edge (or the left edge) see numpy.digitize.
bound_error –
What to do if a sample falls out of the bin range:
"raise"
: raise an error"ignore"
: ignore the data that are out of range
names – The names of the variables to store.
- add_sample(*args: ArrayLike, **kwargs: ArrayLike)#
Add a sample. If you use only one variable, you can also use the
+
operator.- Parameters:
args – Different variables of data to add. The binning is done on the first argument and applied to all other arguments.
- classmethod from_data(*args: ArrayLike, names: list[str] = [], **kwargs)#
Construct from data.
- Parameters:
args – Different variables of data to add. The binning is done on the first argument and applied to all other arguments.
kwargs – Automatic binning settings, see
histogram.from_data()
.
- Returns:
The
Histogram
object.