2012-02-21 by Stefan Urbanek

Cubes - Python OLAP Framework Architecture

What is inside the Cubes Python OLAP Framework? Here is a brief overview of the core modules, their purpose and functionality.

The lightweight framework Cubes is composed of four public modules:

model - Description of data (metadata): dimensions, hierarchies, attributes, labels, localizations.
browser - Aggregation browsing, slicing-and-dicing, drill-down.
backends - Actual aggregation implementation and utility functions.
server - WSGI HTTP server for Cubes

Model

Logical model describes the data from user’s or analyst’s perspective: data how they are being measured, aggregated and reported. Model is independent of physical implementation of data. This physical independence makes it easier to focus on data instead on ways of how to get the data in understandable form.

Cubes model is described by:

model object (doc)
list of cubes
dimensions of cubes (they are shared with all cubes within model) (doc) (doc)
hierarchies (doc) and hierarchy levels (doc) of dimensions (such as category-subcategory, country-region-city)
optional mappings from logical model to the physical model (doc)
optional join specifications for star schemas, used by the SQL denormalizing backend (doc)

There is a utility function provided for loading the model from a JSON file: load_model.

The model module object are capable of being localized (see Model Localization for more information). The cubes provides localization at the metadata level (the model) and functionality to have localization at the data level.

Browser

Core of the Cubes analytics functionality is the aggregation browser. The browser module contains utility classes and functions for the browser to work.

The module components are:

Cell – specification of the portion of the cube to be explored, sliced or drilled down. Each cell is specified by a set of cuts. A cell without any cuts represents whole cube.
Cut – definition where the cell is going to be sliced through single dimension. There are three types of cuts: point, range and set.

The types of cuts:

Point Cut – Defines one single point on a dimension where the cube is going to be sliced. The point might be at any level of hierarchy. The point is specified by "path". Examples of point cut: [2010] for year level of Date dimension, [2010,1,7] for full date point.
Range Cut – Defines two points (dimension paths) on a sortable dimension between whose the cell is going to be sliced from cube.
Set Cut – Defines list of multiple points (dimension paths) which are going to be included in the sliced cell.

Example of point cut effect:

The module provides couple utility functions:

path_from_string - construct a dimension path (point) from a string
string_from_path - get a string representation of a dimension path (point)
string_from_cuts and cuts_from_string are for conversion between string and list of cuts. (Currently only list of point cuts are supported in the string representation)

The aggregation browser can:

aggregate a cell (aggregate(cell))
drill-down through multiple dimensions and aggregate (aggregate(cell, drilldown="date"))
get all detailed facts within the cell (facts(cell))
get single fact (fact(id))

There is convenience function report(cell, report) that can be implemented by backend in more efficient way to get multiple aggregation queries in single call.

More about aggregated browsing can be found in the Cubes documentation.

Backends

Actual aggregation is provided by the backends. The backend should implement aggregation browser interface.

Cubes comes with built-in ROLAP backend which uses SQL database through SQLAlchemy. The backend has two major components:

aggregation browser that works on single denormalized view or a table
SQL denormalizer helper class that converts star schema into a denormalized view or table (kind of materialisation).

There was an attempt to write a Mongo DB backend, but it does not work any more, it is included in the sources only as reminder, that there should be a mongo backend sometime in the future.

Anyone can write a backend. If you are interested, drop me a line.

Server

Cubes comes with Slicer - a WSGI HTTP OLAP server with API for most of the cubes framework functionality. The server is based on the Werkzeug framework.

Intended use of the slicer is basically as follows:

application prepares the cell to be aggregated, drilled, listed... The cell might be whole cube.
HTTP request is sent to the server
the server uses appropriate aggregation browser backend (note that currently there is only one: SQL denormalized) to compute the request
Slicer returns a JSON reply to the application

For more information, please refer to the Cubes Slicer server documentation.

One more thing...

There are plenty things to get improved, of course. Current focus is not on performance, but on achieving simple usability.

The Cubes sources can be found on Github: https://github.com/stiivi/cubes . There is also a IRC channel #databrewery on irc.freenode.net (I try to be there during late evening CET). Issues can be reported on the github project page.

If you have any questions, suggestions, recommendations, just let me know.

HackerNews Thread