Cubes - Python OLAP Framework Architecture
What is inside the Cubes Python OLAP Framework? Here is a brief overview of the core modules, their purpose and functionality.
The lightweight framework Cubes is composed of four public modules:
- model - Description of data (metadata): dimensions, hierarchies, attributes, labels, localizations.
- browser - Aggregation browsing, slicing-and-dicing, drill-down.
- backends - Actual aggregation implementation and utility functions.
- server - WSGI HTTP server for Cubes
Model
Logical model describes the data from user’s or analyst’s perspective: data how they are being measured, aggregated and reported. Model is independent of physical implementation of data. This physical independence makes it easier to focus on data instead on ways of how to get the data in understandable form.
Cubes model is described by:
- model object (doc)
- list of cubes
- dimensions of cubes (they are shared with all cubes within model) (doc) (doc)
- hierarchies (doc) and hierarchy levels (doc) of dimensions (such as category-subcategory, country-region-city)
- optional mappings from logical model to the physical model (doc)
- optional join specifications for star schemas, used by the SQL denormalizing backend (doc)
There is a utility function provided for loading the model from a JSON file: load_model
.
The model module object are capable of being localized (see Model Localization for more information). The cubes provides localization at the metadata level (the model) and functionality to have localization at the data level.
See also: Model Documentation
Browser
Core of the Cubes analytics functionality is the aggregation browser. The browser
module contains utility classes and functions for the browser to work.
The module components are:
- Cell – specification of the portion of the cube to be explored, sliced or drilled down. Each cell is specified by a set of cuts. A cell without any cuts represents whole cube.
- Cut – definition where the cell is going to be sliced through single dimension. There are three types of cuts: point, range and set.
The types of cuts:
- Point Cut – Defines one single point on a dimension where the cube is going to be sliced. The point might be at any level of hierarchy. The point is specified by "path". Examples of point cut:
[2010]
for year level of Date dimension,[2010,1,7]
for full date point. - Range Cut – Defines two points (dimension paths) on a sortable dimension between whose the cell is going to be sliced from cube.
- Set Cut – Defines list of multiple points (dimension paths) which are going to be included in the sliced cell.
Example of point cut effect:
The module provides couple utility functions:
path_from_string
- construct a dimension path (point) from a stringstring_from_path
- get a string representation of a dimension path (point)string_from_cuts
andcuts_from_string
are for conversion between string and list of cuts. (Currently only list of point cuts are supported in the string representation)
The aggregation browser can:
- aggregate a cell (
aggregate(cell)
) - drill-down through multiple dimensions and aggregate (
aggregate(cell, drilldown="date")
) - get all detailed facts within the cell (
facts(cell)
) - get single fact (
fact(id)
)
There is convenience function report(cell, report)
that can be implemented by backend in more efficient way to get multiple aggregation queries in single call.
More about aggregated browsing can be found in the Cubes documentation.
Backends
Actual aggregation is provided by the backends. The backend should implement aggregation browser interface.
Cubes comes with built-in ROLAP backend which uses SQL database through SQLAlchemy. The backend has two major components:
- aggregation browser that works on single denormalized view or a table
- SQL denormalizer helper class that converts star schema into a denormalized view or table (kind of materialisation).
There was an attempt to write a Mongo DB backend, but it does not work any more, it is included in the sources only as reminder, that there should be a mongo backend sometime in the future.
Anyone can write a backend. If you are interested, drop me a line.
Server
Cubes comes with Slicer - a WSGI HTTP OLAP server with API for most of the cubes framework functionality. The server is based on the Werkzeug framework.
Intended use of the slicer is basically as follows:
- application prepares the cell to be aggregated, drilled, listed... The cell might be whole cube.
- HTTP request is sent to the server
- the server uses appropriate aggregation browser backend (note that currently there is only one: SQL denormalized) to compute the request
- Slicer returns a JSON reply to the application
For more information, please refer to the Cubes Slicer server documentation.
One more thing...
There are plenty things to get improved, of course. Current focus is not on performance, but on achieving simple usability.
The Cubes sources can be found on Github: https://github.com/stiivi/cubes . There is also a IRC channel #databrewery on irc.freenode.net (I try to be there during late evening CET). Issues can be reported on the github project page.
If you have any questions, suggestions, recommendations, just let me know.