Schemas

class deker.schemas.ArraySchema(*, dtype, dimensions, fill_value=None, attributes=None)

Bases: SelfLoggerMixin, BaseArraysSchema

Array schema - a common schema for all the Arrays in Collection.

It describes the structure of the collection arrays.

Parameters

dimensions (Union[List[BaseDimensionSchema], Tuple[BaseDimensionSchema, ...]]) – an ordered sequence of DimensionSchemas and/or TimeDimensionSchemas
dtype (Type[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an object, representing final data type of every array, e.g. int or `numpy.float32`
attributes (Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]) –
an optional sequence of AttributeSchema. If there is a TimeDimensionSchema, which start_value parameter refers to some attribute name, attributes must contain at least such attribute schema, e.g.:
```
AttributeSchema(
    name="forecast_dt",
    dtype=datetime.datetime,
    primary=False  # or True
  )
```
fill_value (Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an optional value for filling in empty cells; If None - default value for each dtype will be used. Numpy nan can be used only for floating numpy dtypes.

attributes: Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]

fill_value: Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]

class deker.schemas.AttributeSchema(name, dtype, primary)

Bases: SelfLoggerMixin, BaseAttributeSchema

Schema of an attribute.

Describes requirements for the primary or custom attribute of Array or VArray.

Parameters

name (str) – attribute name
dtype (Type[Union[int, float, complex, str, tuple, datetime]]) – attribute data type
primary (bool) – boolean flag for setting attribute as a key (True) or custom (False)

property as_dict: dict: Serialize Attribute schema as dict.

class deker.schemas.DimensionSchema(name, size, labels=None, scale=None)

Bases: SelfLoggerMixin, BaseDimensionSchema

Schema of a Dimension for the majority of series except time.

For time series use TimeDimensionSchema.

Parameters

name (str) – dimension unique name
size (int) – dimension cells quantity
labels (Optional[Union[Tuple[Union[str, int, float], ...], List[Union[str, int, float]]]]) –
Represents an ordered sequence of unique cells names or a mapping of unique cells names to their position (index) in dimension row. Size of such sequence or mapping shall be equal to the dimension size.

May be useful if some data from different sources is grouped in one array, e.g.:
```
DimensionSchema(
    name="weather_data",
    size=3,
    labels=["temperature", "pressure", "humidity"]
)
```
what means, that pressure data can be always found in such dimension at index 1 (not 0, nor -1).

scale (Optional[Union[Scale, dict]]) –

optional parameter; represents a regular scale description for dimension axis. For example, we describe dimensions for the Earth’s latitude and longitude grades:

dims = [
       DimensionSchema(name="y", size=721),
       DimensionSchema(name="x", size=1440),
    ]

Such description may exist, but it’s not quite sufficient. We can provide information for the grid:

dims = [
        DimensionSchema(
            name="y",
            size=721,
            scale=Scale(start_value=90, step=-0.25, name="lat")
        ),
        DimensionSchema(
            name="x",
            size=1440,
            scale=Scale(start_value=-180, step=0.25, name="lon")
        ),
    ]

This extra information permits us provide fancy indexing by lat/lon coordinates in degrees:

EarthGridArray[1, 1] == EarthGridArray[89.75, -179.75]

Note

Parameters scale and labels provide a possibility of fancy indexing. If you are bored with calculating index positions, you may slice by labels instead them.

Attention

Either scale or labels parameter or none of them shall be passed to the constructor. Not both of them.

property as_dict: dict: Serialize DimensionSchema into dictionary.

labels: Optional[Labels]

scale: Optional[Union[Scale, dict]]

enum deker.schemas.SchemaTypeEnum(value)

Bases: Enum

Mapping of schema types to strings.

Valid values are as follows:

varray = <SchemaTypeEnum.varray: <class 'deker.schemas.VArraySchema'>>

array = <SchemaTypeEnum.array: <class 'deker.schemas.ArraySchema'>>

class deker.schemas.TimeDimensionSchema(name, size, start_value, step)

Bases: SelfLoggerMixin, BaseDimensionSchema

Dimension schema for time series.

Describes data distribution within some time.

Parameters

name (str) – dimension name
size (int) – dimension cells quantity
start_value (Union[datetime, str]) – time of the dimension’s zero-point.
step (timedelta) – Set a common time step for all the VArrays or VArrays with datetime.timedelta or its dictionary mapping.

Note

For setting a common start date and time for all the arrays use datetime.datetime with explicit timezone (tzinfo) or a string of datetime.isoformat().

For setting an individual start date and time for each array pass a name of the attribute in the attributes list. Such reference shall start with $, e.g. start_value="$my_attr_name".

In this case the schema of such attribute (typed datetime.datetime) must be provided by AttributeSchema and you shall pass datetime.datetime with explicit timezone on a Array or VArray creation to the correspondent attribute.

property as_dict: dict: Serialize TimeDimensionSchema into dictionary.

start_value: Union[datetime, str]

step: timedelta

class deker.schemas.VArraySchema(*, dtype, dimensions, vgrid=None, arrays_shape=None, fill_value=None, attributes=None)

Bases: SelfLoggerMixin, BaseArraysSchema

VArray schema - a common schema for all VArrays in Collection.

Virtual array is an “array of arrays”, or an “image of pixels”.

If we consider VArray as an image - it is split by virtual grid into tiles. In this case each tile - is an ordinary array.

This schema describes the structure of the collection virtual arrays and how it is split into arrays. ArraySchema is automatically constructed from VArraySchema.

Parameters

dtype (Type[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an object, representing final data type of every array, e.g. int or numpy.float32
dimensions (Union[List[BaseDimensionSchema], Tuple[BaseDimensionSchema, ...]]) – an ordered sequence of DimensionSchemas and/or TimeDimensionSchemas;
attributes (Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]) –
an optional sequence of AttributeSchema. If there is a TimeDimensionSchema, which start_value parameter refers to some attribute name, attributes must contain at least such attribute schema, e.g.:
```
AttributeSchema(
    name="forecast_dt",
    dtype=datetime.datetime,
    primary=False  # or True
  )
```
vgrid (Optional[Union[List[int], Tuple[int, ...]]]) –
an ordered sequence of positive integers; used for splitting VArray into ordinary Arrays.

Each VArray dimension “size” shall be divided by the correspondent integer without remainders, thus an Array's shape is created. If there is no need to split any dimension, its vgrid positional integer shall be 1.
arrays_shape (Optional[Union[List[int], Tuple[int, ...]]]) –
an ordered sequence of positive integers; used for setting the shape of ordinary Arrays laying under a VArray.

Each integer in the sequence represents the total quantity of cells in the correspondent dimension of each ordinary Array. Each VArray dimension “size” shall be divided by the correspondent integer without remainders, thus a VArray's vgrid is created.
fill_value (Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an optional value for filling in empty cells; If None - default value for each dtype will be used. Numpy nan can be used only for floating numpy dtypes.

arrays_shape: Optional[Union[List[int], Tuple[int, ...]]]

property as_dict: dict: Serialize as dict.

attributes: Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]

fill_value: Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]

vgrid: Optional[Union[List[int], Tuple[int, ...]]]