2012-05-12 by Stefan Urbanek

Star Browser, Part 3: Aggregations and Cell Details

Last time I was talking about joins and denormalisation in the Star Browser. This is the last part about the star browser where I will describe the aggregation and what has changed, compared to the old browser.

The Star Browser is new aggregation browser in for the Cubes – lightweight Python OLAP Framework. Next version v0.9 will be released next week.

Aggregation

sum is not the only aggregation. The new browser allows to have other aggregate functions as well, such as min, max.

You can specify the aggregations for each measure separately:

{
    "name": "amount",
    "aggregations": ["sum", "min", "max"]
}

The resulting aggregated attribute name will be constructed from the measure name and aggregation suffix, for example the mentioned amount will have three aggregates in the result: amount_sum, amount_min and amount_max.

Source code reference: see StarQueryBuilder.aggregations_for_measure

Aggregation Result

Result of aggregation is a structure containing: summary - summary for the aggregated cell, drilldown - drill down cells, if was desired, and total_cell_count - total cells in the drill down, regardless of pagination.

Cell Details

When we are browsing the cube, the cell provides current browsing context. For aggregations and selections to happen, only keys and some other internal attributes are necessary. Those can not be presented to the user though. For example we have geography path (country, region) as ['sk', 'ba'], however we want to display to the user Slovakia for the country and Bratislava for the region. We need to fetch those values from the data store. Cell details is basically a human readable description of the current cell.

For applications where it is possible to store state between aggregation calls, we can use values from previous aggregations or value listings. Problem is with web applications - sometimes it is not desirable or possible to store whole browsing context with all details. This is exact the situation where fetching cell details explicitly might come handy.

Note: The Original browser added cut information in the summary, which was ok when only point cuts were used. In other situations the result was undefined and mostly erroneous.

The cell details are now provided separately by method AggregationBrowser.cell_details(cell) which has Slicer HTTP equivalent /details or {"query":"detail", ...} in /report request. The result is a list of

For point cuts, the detail is a list of dictionaries for each level. For example our previously mentioned path ['sk', 'ba'] would have details described as:

[
    {
        "geography.country_code": "sk",
        "geography.country_name": "Slovakia",
        "geography.something_more": "..."
        "_key": "sk",
        "_label": "Slovakia"
    },
    {
        "geography.region_code": "ba",
        "geography.region_name": "Bratislava",
        "geography.something_even_more": "...",
        "_key": "ba",
        "_label": "Bratislava"
    }
]

You might have noticed the two redundant keys: _key and _label - those contain values of a level key attribute and level label attribute respectively. It is there to simplify the use of the details in presentation layer, such as templates. Take for example doing only one-dimensional browsing and compare presentation of "breadcrumbs":

labels = [detail["_label"] for detail in cut_details]

Which is equivalent to:

levels = dimension.hierarchy.levels()
labels = []
for i, detail in enumerate(cut_details):
    labels.append(detail[level[i].label_attribute.full_name()])

Note that this might change a bit: either full detail will be returned or just key and label, depending on an option argument (not yet decided).

Pre-aggregation

The Star Browser is being created with SQL pre-aggregation in mind. This is not possible in the old browser, as it is not flexible enough. It is planned to be integrated when all basic features are finished.

Proposed access from user's perspective will be through configuration options: use_preaggregation, preaggregation_prefix, preaggregation_schema and a method for cube pre-aggregation will be available through the slicer tool.

Summary

The new browser has better internal structure resulting in increased flexibility for future extensions. It fixes not so good architectural decisions of the old browser.

New and fixed features:

direct star/snowflake schema browsing
improved mappings - more transparent and understandable process
ability to explicitly specify database schemas
multiple aggregations

The new backend sources are here and the mapper is here.

To do

To be done in the near future:

DDL generator for denormalized schema, corresponding logical schema and physical schema
explicit list of attributes to be selected (instead of all)
selection of aggregations per-request (now all specified in model are used)

Links

See also Cubes at github, Cubes Documentation, Mailing List and Submit issues. Also there is an IRC channel #databrewery on irc.freenode.net