Histogram¶

Base histogram¶

The base histogram class from which histograms of a specific dimension inherit.

class heppy.basehistogram(binedges, contents, areas=False, name='', uncorr_variations={}, corr_variations={}, attributes={}, plot_attributes={})¶

Base class for one-dimensional and two-dimensional histograms that keep track of their various uncertainty contributions and arbitrary attributes (useful for labeling and plotting).

Parameters:

binedges (numpy.array, or tuple of numpy.array) – bin edges, including uppermost. For 1D histograms, a numpy.array. For 2D histograms, a tuple of two numpy.array (in the x and y direction, respectively).
contents (numpy.array) – the “bin contents”, which are either bin areas (= what ROOT calls “bin contents”) or bin heights (= bin areas / bin sizes). See also argument areas.
areas (bool) – if True, interpret given contents as bin areas, else as bin heights
name (str) – a name for the histogram. This is only separate from the other attributes because it is so commonly used and is automatically created for histograms produced by mathematically combining two histograms. E.g. dividing two histograms with names 'foo' and 'bar' will return a histogram with name 'foo / bar'.
uncorr_variations (dict) – dictionary of variations that are uncorrelated between bins (e.g. statistical uncertainty). Keys are variation names, values are np.array objects of the same dimension as the nominal contents.
corr_variations (dict) – dictionary of variations that are fully correlated between bins (e.g. systematic uncertainty). Keys are variation names, values are np.array objects of the same dimension as the nominal contents.
attributes (dict) – dictionary of completely arbitrary attributes that the user can provide/change/access. E.g. information about the data sample that produced the histogram.
plot_attributes (dict) – dictionary of completely arbitrary that the user can provide/change/access. This one is more intended for information on how to visualise/plot the histogram. It is especially useful if working with heppy.make_figure, which will assume that all the plot_attributes correspond to keyword arguments that are understood by Matplotlib’s plot() and/or fill_between() functions

extract_variation_histogram(variation, **kwargs)¶

Get a new histogram object that has a given variation as nominal. Useful e.g. for studying a particular systematic variation.

Parameters:	variation (`str`) – name of the variation **kwargs – get passed on to constructor of new histogram, e.g. useful to set a `name` for the new histogram.
Returns:	new `heppy.histogram` that has a given `variation` as nominal
Raises:	KeyError – if variation not found in either uncorrelated or correlated variations RuntimeError – if variation found in both uncorrelated or correlated variations

binsizes¶

Bin sizes.

For a one-dimensional histogram, returns an array of dimension (N, 1), where N is the number of bins. The elements represent the width of each bin.

For a two-dimensional histogram, returns an array of dimension (N, M), where N is the number of bins along the first axis (“x-axis”) and M is the number of bins along the second axis (“y-axis”). The elements represent the area of each bin.

heights¶: Bin heights, equal to bin areas divided by the corresponding bin sizes

set_heights(heights)¶: Set bin heights to an array of the same dimension as the current areas or to a scalar

integral(variations=None, **kwargs)¶

Calculate the integral of the histogram.

Parameters:	variations (`list` of `str` or `str`) – if given, a tuple of the nominal integral and its upper and lower variation is calculated. This argument is passed to `histogram1d.net_variations()` and should be a list of considered variation names or the string `'all'`. **kwargs – additional keyword arguments that get passed to `histogram1d.net_variations()`
Returns:	the integral (as well as upper and lower variation if `variations` is given)
Return type:	`float`, or if `variations` are given, `tuple` of nominal as well as upper and lower variation

net_variations(variations='all', subtract_nominal=False, relative=False)¶

Return upper and lower net heights variation of the histogram as a tuple.

@variations should be a sequence of considered variation names or the string ‘all’ @subtract_nominal: if True, return the differences with respect to the nominal heights @relative: if True, divide by the nominal heights

CAUTION: this method cannot yet deal with systematic uncertainties for which the up- and down-shift lie on the same side of the nominal. This is because the variations are fundamentally treated independently of each other, so there is no sense of the up- and down-shift being related to the same underlying uncertainty source.

errorbars(variations='all')¶: Returns upper and lower error bars, defined as the absolute net variations (taking into account the given variations) with the nominal values subtracted.

__add__(other)¶

Add another histogram or a scalar to this histogram.

Returns the result of the addition as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__sub__(other)¶

Subtract another histogram or a scalar from this histogram.

Returns the result of the subtraction as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__mul__(other)¶

Multiply another histogram or a scalar binwise with this histogram.

Returns the result of the binwise multiplication as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__truediv__(other)¶

Divide by another histogram or a scalar binwise.

Returns the result of the binwise division as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated.

CAUTION: Uncorrelated variations are treated as uncorrelated between the two histograms. If the uncorrelated variations represent statistical uncertainties, this means that the division treats the two histograms as statistically uncorrelated.

One-dimensional histogram¶

A histogram class with bins along one axis.

class heppy.histogram1d(*args, **kwargs)¶

Heppy one-dimensional histogram. This has functionality for rebinning, getting various representations for plotting (curve, points, errorbars, errorbands), as well as performing mathematical operations (these have only been implemented for one-dimensional histograms so far).

nbins¶

Returns the number of bins in the histogram.

Returns:	number of bins in the histogram
Return type:	`int`

binwidths¶: Bin widths is an alias for bin sizes in the case of a one-dimensional histogram

bin_index(x)¶

Returns the index of the bin that contains the given x-value.

Lower bin edges are included in a bin, upper bin edges are excluded (same as in the ROOT convention).

Parameters:	x (`float`) – x-value
Returns:	index of bin that contains the x-value
Return type:	`int`
Raises:	ValueError – if x-value lies outside of the outer bin edges of the histogram

Example:

>>> h = histogram1d([0., 1., 2.], [10., 11.])
>>> h.bin_index(0.5)
0
>>> h.bin_index(0.)
0
>>> h.bin_index(1.0)
1
>>> h.bin_index(2.0)
ValueError: Cannot find index of bin containing x = 2.0, which is outside of histogram x-boundaries [0.0, 2.0)
>>> h.bin_index(-1.0)
ValueError: Cannot find index of bin containing x = -1.0, which is outside of histogram x-boundaries [0.0, 2.0)

curve(variation='')¶: Curve representation of histogram @variation: if given, return the curve for the variation of this name. Otherwise, return the nominal curve

points(variation='', shift=0.0, abs_shift=False)¶: Point representation of histogram If @shift is given, the x-coordinates of the midpoints are given shifted by this absolute x-value (if @abs_shift=True) or relative fraction of the corresponding bin’s width (if @abs_shift=False)

errorband(*args, **kwargs)¶: Basically same as errorbars method, only in curve representation @*args and @**kwargs get passed on to self.net_variations()

rebin(newedges)¶

Rebin to @newedges Each element of @newedges should correspond to an existing binedge, i.e. only existing bins are merged

CAUTION: currently ASSUMES that each uncorrelated variation only has shifts in one direction of the nominal (i.e. it is either higher or lower everywhere)!

merge_bins(xmin, xmax)¶: Merge the bins falling into the given x-range into one bin

squash_highest_bin(squash_above, new_xmax)¶: Merge all bins from @squash_above upwards and set the highest bin edge to @new_xmax.

height(bin_index)¶

Returns the height of the given bin index with uncertainties.

Returns:	height of the indexed bin including its variations
Return type:	`heppy.value`

Usage example:

>>> import heppy as hp
>>> h = hp.histogram1d([0., 1., 3.], [10., 11.], corr_variations={'systematic__up' : [13., 11.5]})
>>> v = h.height(1)
>>> v.nominal
11.0
>>> v.corr_variations['systematic__up']
11.5

iterheights()¶

Generates iterator over heights.

Returns:	bin heights including their variations
Return type:	`heppy.value`

Usage example:

>>> import heppy as hp
>>> h = hp.histogram1d([0., 1., 3.], [10., 11.], corr_variations={'systematic__up' : [13., 11.5]})
>>> for height in h.iterheights(): print(height.nominal, height.corr_variations['systematic__up'])
10.0 13.0
11.0 11.5

iterbins()¶

Generates iterator over bins, yielding bin edges and heights.

Returns:	bin egdes and nominal bin height
Return type:	`tuple` of the following: `tuple` of two `float`, and one `float`

Usage example:

>>> import heppy as hp
>>> h = hp.histogram1d([0., 1., 3.], [10., 11.], corr_variations={'systematic__up' : [13., 11.5]})
>>> for binedges, height in h.iterbins(): print(binedges, height.nominal)
(0.0, 1.0) 10.0
(1.0, 3.0) 11.0
>>> for binedges, height in h.iterbins(): print(binedges, height.nominal, height.corr_variations['systematic__up'])
(0.0, 1.0) 10.0 13.0
(1.0, 3.0) 11.0 11.5

to_yoda(identifier, metadata={})¶

Returns the histogram in YODA output format as a string.

See the websites of YODA and its main user Rivet for more information.

Parameters:	identifier (`str`) – in-file identifier for the histogram, e.g. `'/REF/ATLAS_2017_I1614149/d16-x01-y02'` metadata (`dict`) – optional dictionary of metadata. E.g. for Rivet use, one could have `metadata = {'IsRef' : 1, 'Path' : '/REF/ATLAS_2017_I1614149/d16-x01-y02', 'Title' : 'doi:10.17182/hepdata.80041.v2/t16'}`
Returns:	histogram formatted as YODA input string
Return type:	`str`

to_rivet(identifier, metadata={})¶

Returns the histogram in YODA output format as a string.

See the websites of YODA and its main user Rivet for more information.

Parameters:	identifier (`str`) – in-file identifier for the histogram, e.g. `'/REF/ATLAS_2017_I1614149/d16-x01-y02'` metadata (`dict`) – optional dictionary of metadata. E.g. for Rivet use, one could have `metadata = {'IsRef' : 1, 'Path' : '/REF/ATLAS_2017_I1614149/d16-x01-y02', 'Title' : 'doi:10.17182/hepdata.80041.v2/t16'}`
Returns:	histogram formatted as YODA input string
Return type:	`str`

__add__(other)¶

Add another histogram or a scalar to this histogram.

Returns the result of the addition as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__mul__(other)¶

Multiply another histogram or a scalar binwise with this histogram.

Returns the result of the binwise multiplication as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__sub__(other)¶

Subtract another histogram or a scalar from this histogram.

Returns the result of the subtraction as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__truediv__(other)¶

Divide by another histogram or a scalar binwise.

Returns the result of the binwise division as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated.

CAUTION: Uncorrelated variations are treated as uncorrelated between the two histograms. If the uncorrelated variations represent statistical uncertainties, this means that the division treats the two histograms as statistically uncorrelated.

Two-dimensional histogram¶

A histogram class with bins along two axes.

class heppy.histogram2d(*args, **kwargs)¶

Heppy two-dimensional histogram. This currently has much more limited functionality than the 1D histogram class, although probably most (if not all) of the former’s mathematical operations should also work for the 2D histogram (at least with minor modifications).

Note: only independent binnings of the two axes are supported (i.e. y-bins don’t depend on x-bins and vice versa).

nbins¶

Returns:	tuple of number of bins along x- and y-axis

bin_index_x(x)¶

Returns the index of the x-axis bin that contains the given x-value.

Lower bin edges are included in a bin, upper bin edges are excluded (same as in the ROOT convention).

Parameters:	x (`float`) – x-value
Returns:	index of x-axis bin that contains the x-value
Return type:	`int`
Raises:	ValueError – if x-value lies outside of the outer bin edges of the histogram

bin_index_y(y)¶

Returns the index of the y-axis bin that contains the given y-value.

Lower bin edges are included in a bin, upper bin edges are excluded (same as in the ROOT convention).

Parameters:	y (`float`) – y-value
Returns:	index of y-axis bin that contains the y-value
Return type:	`int`
Raises:	ValueError – if y-value lies outside of the outer bin edges of the histogram

points()¶

Point representation of 2D histogram.

This involves flattening/ravelling the histogram bin midpoints and heights to one-dimensional arrays. The flattening is done in row-major, C-style order, with the y-axis index changing fastest and the x-axis index changing slowest.

Returns:	`tuple` of x-axis bin midpoints, y-axis bin midpoints, and heights

rebin(newedges)¶

Rebin 2D histogram. Correlated and uncorrelated variations will be recalculated to match the new bin edges.

CAUTION: currently ASSUMES that each uncorrelated variation only has shifts in one direction of the nominal (i.e. it is either higher or lower everywhere)!

Parameters:	newedges (`tuple` of two `numpy.array`) – new bin edges. Each new bin edge should correspond to an existing bin edge, i.e. only existing bins are merged
Raises:	`ValueError` if newedges is not of the correct type

as_1d(name='')¶

Return a copied one-dimensional reinterpretation of this histogram. This only works if the histogram only has one bin in one of its dimensions. This dimension will then be ignored.

Parameters:	name (`str`) – name for the reinterpreted histogram

project(axis, name='')¶

Project histogram to one axis by integrating over the other. Correlated and uncorrelated uncertainties are computed for the resulting one-dimensional histogram.

Parameters:	axis (`'x'` or `'y'`) – which axis to project onto, i.e. the axis that is kept name (`str`) – name for the projection histogram
Returns:	`heppy.histogram1d` representing the projection
Raises:	`ValueError` if invalid axis identifier is given

slice(axis, bin_index, name='')¶

Returns 1D histogram of the distribution along one axis in a given bin of the other axis.

Parameters:	axis (`'x'` or `'y'`) – axis along which the slicing is done, i.e. the axis that is kept bin_index (`int`) – index of the bin on the axis that is not kept name (`str`) – name for the slice histogram
Returns:	1D histogram of the slice
Return type:	`heppy.histogram1d`

height(bin_index_x, bin_index_y)¶

Returns the height of the given bin indices with uncertainties.

Parameters:	bin_index_x – bin index along x-axis bin_index_x – `int` bin_index_y – bin index along y-axis bin_index_y – `int`
Returns:	height of the indexed bin including its variations
Return type:	`heppy.value`

iterheights(faster='y')¶

Generates iterator over heights.

Parameters:	faster (`str`; `'x'` or `'y'`) – controls the iteration order by specifying along which axis the bin index changes faster
Returns:	bin heights including their variations
Return type:	`heppy.value`

iterbins()¶

Generates iterator over bins, yielding bin edges and heights.

Returns:	x-axis bin egdes, y-axis bin edges, and bin height with variations
Return type:	`tuple` of the following: `tuple` of two `float`, `tuple` of two `float`, and one `heppy.value`

Usage example:

>>> import heppy as hp
>>> import numpy as np
>>> heights = np.array([             # bin heights
        [1., 5.],
        [2., 6.],
        [3., 7.],
        ])
>>> x = np.array([-7., 0., 5., 50.]) # bin edges in x
>>> y = np.array([-1., 0., 1.])      # bin edges in y
>>> h = hp.histogram2d((x, y), heights)
>>> for binedges_x, binedges_y, height in h.iterbins(): print(binedges_x, binedges_y, height.nominal)
(-7.0, 0.0) (-1.0, 0.0) 1.0
(-7.0, 0.0) (0.0, 1.0) 5.0
(0.0, 5.0) (-1.0, 0.0) 2.0
(0.0, 5.0) (0.0, 1.0) 6.0
(5.0, 50.0) (-1.0, 0.0) 3.0
(5.0, 50.0) (0.0, 1.0) 7.0

__add__(other)¶

Add another histogram or a scalar to this histogram.

Returns the result of the addition as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__mul__(other)¶

Multiply another histogram or a scalar binwise with this histogram.

Returns the result of the binwise multiplication as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__sub__(other)¶

Subtract another histogram or a scalar from this histogram.

Returns the result of the subtraction as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated. Uncorrelated variations are treated as uncorrelated between the two histograms.

__truediv__(other)¶

Divide by another histogram or a scalar binwise.

Returns the result of the binwise division as a histogram.

Correlated variations are treated as fully correlated among the two histograms if they have the same name, otherwise they are treated as uncorrelated.

CAUTION: Uncorrelated variations are treated as uncorrelated between the two histograms. If the uncorrelated variations represent statistical uncertainties, this means that the division treats the two histograms as statistically uncorrelated.

Free functions¶

Free functions related to histograms.

heppy.histdiv(a, b, corr=None, ignore_denominator_uncertainty=False)¶

Sophisticated division of two histograms

Parameters:

a (heppy.basehistogram) – numerator histogram
b (heppy.basehistogram) – denominator histogram
corr – information on how a and b are correlated — NOT YET IMPLEMENTED, do not use
ignore_denominator_uncertainty (bool) – switch to ignore the variations of the denominator histogram. If True, divide all variations of the numerator histogram by the nominal denominator histogram.

NOTE: the returned ratio histogram’s bin heights are not given “per bin size”, but take the role that the areas have for histograms that do not represent a ratio.

Returns:	ratio histogram a/b with variations treated as specified
Raises:	NotImplementedError – if `corr` is not `None` (remains to be implemented)

heppy.from_file(infilename, key)¶

Read histogram written out by heppy (using heppy.basehistogram.to_file).

Parameters:	infilename (`str`) – name of the file that the histogram should be read from key (`str`) – name/key of the histogram inside the input file
Returns:	`heppy.histogram1d` or `heppy.histogram2d`