Overview

sacc is a schema for storing summary statistic data, metadata, and covariances for the Dark Energy Science Collaboration (DESC).

A sacc file can contain all the observational information required to make theoretical predictions for the mean of a measured quantity, and to calculate a likelihood of it.

Currently sacc files can be saved to a FITS format, but the schema is designed to make it easy to change this if neeed; the structure of the data (in memory) is the focus, rather than the format.

Basic Structure

A sacc.Sacc object can contain:

  • a series of DataPoint objects

  • a series of Tracer objects

  • a single Covariance object

  • additional metadata

Creating Sacc objects

A typical workflow for creating new sacc files is:

  • instantiate an empty Sacc object with s = sacc.Sacc().

  • add an tracer objects that will be used with s.add_tracer(type_name, tracer_name, ...)

  • one by one, add data points to it in whatever order you prefer with s.add_data_point(data_type, tracers, value, ...)

  • when finished, add a covariance in the same order with s.add_covariance(C)

  • save to file using s.save_fits(filename)

Reading Sacc objects

If you are using a sacc file, for exampe in an MCMC, or for plotting:

  • load the sacc data into memory with s = sacc.Sacc.load_fits(filename)

  • find what data types are in the file with dts = s.get_data_types()

  • for each data type, find what tracer combinations (e.g. tomographic bin pairs) are available with tracer_sets = s.get_tracer_combinations(dt)

  • for each pair of tracers, get the mean values with data = s.get_mean(dt, tracers), and, for example, window functions using windows = s.get_tag(dt, tracers, "window") or similar for other binning information

You can also select pieces of the data and covariance with various different API methods.

Data Types

Every data point in Sacc has a data type, a string that identifies the type of measurement it refers to.

There are a number of predefined type strings that you can see like this:

import sacc
print(sacc.standard_types)

If your data corresponds to one of these types then it’s better to use the pre-defined name. Otherwise, you can make your own. There is a standard format for these strings:

{sources}_{properties}_{statistic}[_{subtype}]

where the last item, subtype, is optional. If there are multiple sources or properties (as in, for example, cross-correlation measurements) then they are separated by being shown in camelCase.

You can create a type string in the correct format using the command sacc.build_data_type_name:

import sacc
# the astrophysical sources involved.
# We use 'cm21' since '21cm' starts with a number which is not allowed in variable names.
sources = ['quasars', 'cm21']
# the properties of these two sources we are measuring.  If they were the same
# property for the two sources we would not repeat it
properties = ['density', 'Intensity']
# The statistc, Fourier space C_ell values
statistic = 'cl'
# There is no futher specified needed here - everything is scalar.
subtype = None
data_type = sacc.build_data_type_name(sources, properties, statistic, subtype)
print(data_type)
# prints 'quasarsCm21_densityIntensity_cl'

Data Points

Each DataPoint object contains:

  • a data type string

  • a series of strings listing which tracers apply to it

  • a value of the data point

  • a dictionary of tags, for example describing binning information

Tracers & Windows

Different types of tracer each have their own subclass. For example an NZTracer describes a tomographic bin of sources with a given redshift histogram n(z).

Window functions are stored as a tag on data points, and are represented by Window subclass instances.

Covariance

A single covariance applies to the whole data file. It can be specified as block

See the API documentation for full details of what is available now.