Schemas
- class deker.schemas.ArraySchema(*, dtype, dimensions, fill_value=None, attributes=None)
Bases:
SelfLoggerMixin
,BaseArraysSchema
Array schema - a common schema for all the
Arrays
inCollection
.It describes the structure of the collection arrays.
- Parameters
dimensions (Union[List[BaseDimensionSchema], Tuple[BaseDimensionSchema, ...]]) – an ordered sequence of
DimensionSchemas
and/orTimeDimensionSchemas
dtype (Type[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an object, representing final data type of every array, e.g.
int
or`numpy.float32`
attributes (Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]) –
an optional sequence of AttributeSchema. If there is a TimeDimensionSchema, which
start_value
parameter refers to some attribute name, attributes must contain at least such attribute schema, e.g.:AttributeSchema( name="forecast_dt", dtype=datetime.datetime, primary=False # or True )
fill_value (Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an optional value for filling in empty cells; If
None
- default value for each dtype will be used. Numpynan
can be used only for floating numpy dtypes.
- attributes: Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]
- fill_value: Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]
- class deker.schemas.AttributeSchema(name, dtype, primary)
Bases:
SelfLoggerMixin
,BaseAttributeSchema
Schema of an attribute.
Describes requirements for the primary or custom attribute of Array or VArray.
- Parameters
name (str) – attribute name
dtype (Type[Union[int, float, complex, str, tuple, datetime]]) – attribute data type
primary (bool) – boolean flag for setting attribute as a key (
True
) or custom (False
)
- property as_dict: dict
Serialize Attribute schema as dict.
- class deker.schemas.DimensionSchema(name, size, labels=None, scale=None)
Bases:
SelfLoggerMixin
,BaseDimensionSchema
Schema of a
Dimension
for the majority of series except time.For time series use
TimeDimensionSchema
.- Parameters
name (str) – dimension unique name
size (int) – dimension cells quantity
labels (Optional[Union[Tuple[Union[str, int, float], ...], List[Union[str, int, float]]]]) –
Represents an ordered sequence of unique cells names or a mapping of unique cells names to their position (index) in dimension row. Size of such sequence or mapping shall be equal to the dimension size.
May be useful if some data from different sources is grouped in one array, e.g.:
DimensionSchema( name="weather_data", size=3, labels=["temperature", "pressure", "humidity"] )
what means, that pressure data can be always found in such dimension at index
1
(not0
, nor-1
).scale (Optional[Union[Scale, dict]]) –
optional parameter; represents a regular scale description for dimension axis. For example, we describe dimensions for the Earth’s latitude and longitude grades:
dims = [ DimensionSchema(name="y", size=721), DimensionSchema(name="x", size=1440), ]
Such description may exist, but it’s not quite sufficient. We can provide information for the grid:
dims = [ DimensionSchema( name="y", size=721, scale=Scale(start_value=90, step=-0.25, name="lat") ), DimensionSchema( name="x", size=1440, scale=Scale(start_value=-180, step=0.25, name="lon") ), ]
This extra information permits us provide fancy indexing by lat/lon coordinates in degrees:
EarthGridArray[1, 1] == EarthGridArray[89.75, -179.75]
Note
Parameters
scale
andlabels
provide a possibility of fancy indexing. If you are bored with calculating index positions, you may slice by labels instead them.Attention
Either
scale
orlabels
parameter or none of them shall be passed to the constructor. Not both of them.- property as_dict: dict
Serialize DimensionSchema into dictionary.
- labels: Optional[Labels]
- enum deker.schemas.SchemaTypeEnum(value)
Bases:
Enum
Mapping of schema types to strings.
Valid values are as follows:
- varray = <SchemaTypeEnum.varray: <class 'deker.schemas.VArraySchema'>>
- array = <SchemaTypeEnum.array: <class 'deker.schemas.ArraySchema'>>
- class deker.schemas.TimeDimensionSchema(name, size, start_value, step)
Bases:
SelfLoggerMixin
,BaseDimensionSchema
Dimension schema for time series.
Describes data distribution within some time.
- Parameters
name (str) – dimension name
size (int) – dimension cells quantity
start_value (Union[datetime, str]) – time of the dimension’s zero-point.
step (timedelta) – Set a common time step for all the
VArrays
orVArrays
withdatetime.timedelta
or its dictionary mapping.
Note
For setting a common start date and time for all the arrays use
datetime.datetime
with explicit timezone (tzinfo
) or a string ofdatetime.isoformat()
.For setting an individual start date and time for each array pass a name of the attribute in the attributes list. Such reference shall start with
$
, e.g.start_value="$my_attr_name"
.In this case the schema of such attribute (typed
datetime.datetime
) must be provided by AttributeSchema and you shall passdatetime.datetime
with explicittimezone
on aArray
orVArray
creation to the correspondent attribute.- property as_dict: dict
Serialize TimeDimensionSchema into dictionary.
- start_value: Union[datetime, str]
- step: timedelta
- class deker.schemas.VArraySchema(*, dtype, dimensions, vgrid=None, arrays_shape=None, fill_value=None, attributes=None)
Bases:
SelfLoggerMixin
,BaseArraysSchema
VArray schema - a common schema for all VArrays in Collection.
Virtual array is an “array of arrays”, or an “image of pixels”.
If we consider VArray as an image - it is split by virtual grid into tiles. In this case each tile - is an ordinary array.
This schema describes the structure of the collection virtual arrays and how it is split into arrays. ArraySchema is automatically constructed from VArraySchema.
- Parameters
dtype (Type[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an object, representing final data type of every array, e.g.
int
ornumpy.float32
dimensions (Union[List[BaseDimensionSchema], Tuple[BaseDimensionSchema, ...]]) – an ordered sequence of DimensionSchemas and/or TimeDimensionSchemas;
attributes (Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]) –
an optional sequence of AttributeSchema. If there is a TimeDimensionSchema, which
start_value
parameter refers to some attribute name, attributes must contain at least such attribute schema, e.g.:AttributeSchema( name="forecast_dt", dtype=datetime.datetime, primary=False # or True )
vgrid (Optional[Union[List[int], Tuple[int, ...]]]) –
an ordered sequence of positive integers; used for splitting
VArray
into ordinaryArrays
.Each VArray dimension “size” shall be divided by the correspondent integer without remainders, thus an
Array's
shape is created. If there is no need to split any dimension, its vgrid positional integer shall be1
.arrays_shape (Optional[Union[List[int], Tuple[int, ...]]]) –
an ordered sequence of positive integers; used for setting the shape of ordinary
Arrays
laying under aVArray
.Each integer in the sequence represents the total quantity of cells in the correspondent dimension of each ordinary
Array
. EachVArray
dimension “size” shall be divided by the correspondent integer without remainders, thus aVArray's
vgrid is created.fill_value (Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]) – an optional value for filling in empty cells; If
None
- default value for each dtype will be used. Numpynan
can be used only for floating numpy dtypes.
- arrays_shape: Optional[Union[List[int], Tuple[int, ...]]]
- property as_dict: dict
Serialize as dict.
- attributes: Optional[Union[List[AttributeSchema], Tuple[AttributeSchema, ...]]]
- fill_value: Optional[Union[int, float, complex, int8, int16, int32, int64, longlong, uint64, uint8, uint16, uint32, ulonglong, float16, complex128, float32, clongdouble, float64, longdouble, complex64]]
- vgrid: Optional[Union[List[int], Tuple[int, ...]]]