Data Brewery

analytical data streams & online analytical processing Python frameworks

    by Stefan Urbanek
  • Cubes 0.9.1: Ranges, denormalization and query cell

    The new minor release of Cubes – light-weight Python OLAP framework – brings range cuts, denormalization with the slicer tool and cells in /report query, together with fixes and important changes.

    See the second part of this post for the full list.

    Range Cuts

    Range cuts were implemented in the SQL Star Browser. They are used as follows:

    Python:

    cut = RangeCut("date", [2010], [2012,5,10])
    cut_hi = RangeCut("date", None, [2012,5,10])
    cut_low = RangeCut("date", [2010], None)
    

    To specify a range in slicer server where keys are sortable:

        http://localhost:5000/aggregate?cut=date:2004-2005
        http://localhost:5000/aggregate?cut=date:2004,2-2005,5,1
    

    Open ranges:

        http://localhost:5000/aggregate?cut=date:2010-
        http://localhost:5000/aggregate?cut=date:2004,1,1-
        http://localhost:5000/aggregate?cut=date:-2005,5,10
        http://localhost:5000/aggregate?cut=date:-2012,5
    

    Denormalization with slicer Tool

    Now it is possible to denormalize tour data with the slicer tool. You do not have to denormalize using python script. Data are denormalized in a way how denormalized browser would expect them to be. You can tune the process using command line switches, if you do not like the defaults.

    Denormalize all cubes in the model:

    $ slicer denormalize slicer.ini
    

    Denormalize only one cube::

    $ slicer denormalize -c contracts slicer.ini
    

    Create materialized denormalized view with indexes::

    $ slicer denormalize --materialize --index slicer.ini
    

    Example slicer.ini:

    [workspace]
    denormalized_view_prefix = mft_
    denormalized_view_schema = denorm_views
    
    # This switch is used by the browser:
    use_denormalization = yes
    

    For more information see Cubes slicer tool documentation

    Cells in Report

    Use cell to specify all cuts (type can be range, point or set):

    {
        "cell": [
            {
                "dimension": "date",
                "type": "range",
                "from": [2010,9],
                "to": [2011,9]
            }
        ],
        "queries": {
            "report": {
                "query": "aggregate",
                "drilldown": {"date":"year"}
            }
        }
    }
    

    For more information see the slicer server documentation.

    New Features

    • cut_from_string(): added parsing of range and set cuts from string; introduced requirement for key format: Keys should now have format “alphanumeric character or underscore” if they are going to be converted to strings (for example when using slicer HTTP server)
    • cut_from_dict(): create a cut (of appropriate class) from a dictionary description
    • Dimension.attribute(name): get attribute instance from name
    • added exceptions: CubesError, ModelInconsistencyError, NoSuchDimensionError, NoSuchAttributeError, ArgumentError, MappingError, WorkspaceError and BrowserError

    StarBrowser:

    • implemented RangeCut conditions

    Slicer Server:

    • /report JSON now accepts cell with full cell description as dictionary, overrides URL parameters

    Slicer tool:

    • denormalize option for (bulk) denormalization of cubes (see the the slicer documentation for more information)

    Changes

    • important: all /report JSON requests should now have queries wrapped in the key queries. This was originally intended way of use, but was not correctly implemented. A descriptive error message is returned from the server if the key queries is not present. Despite being rather a bug-fix, it is listed here as it requires your attention for possible change of your code.
    • warn when no backend is specified during slicer context creation

    Fixes

    • Better handling of missing optional packages, also fixes #57 (now works without slqalchemy and without werkzeug as expected)
    • see change above about /report and queries
    • push more errors as JSON responses to the requestor, instead of just failing with an exception

    Links

    Sources can be found on github. Read the documentation.

    Join the Google Group for discussion, problem solving and announcements.

    Submit issues and suggestions on github

    IRC channel #databrewery on irc.freenode.net

    If you have any questions, comments, requests, do not hesitate to ask.

  • Tags: announcement release cubes olap by Stefan Urbanek
  • Star Browser, Part 3: Aggregations and Cell Details

    Last time I was talking about joins and denormalisation in the Star Browser. This is the last part about the star browser where I will describe the aggregation and what has changed, compared to the old browser.

    The Star Browser is new aggregation browser in for the Cubes – lightweight Python OLAP Framework. Next version v0.9 will be released next week.

    Aggregation

    sum is not the only aggregation. The new browser allows to have other aggregate functions as well, such as min, max.

    You can specify the aggregations for each measure separately:

    {
        "name": "amount",
        "aggregations": ["sum", "min", "max"]
    }
    

    The resulting aggregated attribute name will be constructed from the measure name and aggregation suffix, for example the mentioned amount will have three aggregates in the result: amount_sum, amount_min and amount_max.

    Source code reference: see StarQueryBuilder.aggregations_for_measure

    Aggregation Result

    Result of aggregation is a structure containing: summary - summary for the aggregated cell, drilldown - drill down cells, if was desired, and total_cell_count - total cells in the drill down, regardless of pagination.

    Cell Details

    When we are browsing the cube, the cell provides current browsing context. For aggregations and selections to happen, only keys and some other internal attributes are necessary. Those can not be presented to the user though. For example we have geography path (country, region) as ['sk', 'ba'], however we want to display to the user Slovakia for the country and Bratislava for the region. We need to fetch those values from the data store. Cell details is basically a human readable description of the current cell.

    For applications where it is possible to store state between aggregation calls, we can use values from previous aggregations or value listings. Problem is with web applications - sometimes it is not desirable or possible to store whole browsing context with all details. This is exact the situation where fetching cell details explicitly might come handy.

    Note: The Original browser added cut information in the summary, which was ok when only point cuts were used. In other situations the result was undefined and mostly erroneous.

    The cell details are now provided separately by method AggregationBrowser.cell_details(cell) which has Slicer HTTP equivalent /details or {"query":"detail", ...} in /report request. The result is a list of

    For point cuts, the detail is a list of dictionaries for each level. For example our previously mentioned path ['sk', 'ba'] would have details described as:

    [
        {
            "geography.country_code": "sk",
            "geography.country_name": "Slovakia",
            "geography.something_more": "..."
            "_key": "sk",
            "_label": "Slovakia"
        },
        {
            "geography.region_code": "ba",
            "geography.region_name": "Bratislava",
            "geography.something_even_more": "...",
            "_key": "ba",
            "_label": "Bratislava"
        }
    ]
    

    You might have noticed the two redundant keys: _key and _label - those contain values of a level key attribute and level label attribute respectively. It is there to simplify the use of the details in presentation layer, such as templates. Take for example doing only one-dimensional browsing and compare presentation of “breadcrumbs”:

    labels = [detail["_label"] for detail in cut_details]
    

    Which is equivalent to:

    levels = dimension.hierarchy.levels()
    labels = []
    for i, detail in enumerate(cut_details):
        labels.append(detail[level[i].label_attribute.full_name()])
    

    Note that this might change a bit: either full detail will be returned or just key and label, depending on an option argument (not yet decided).

    Pre-aggregation

    The Star Browser is being created with SQL pre-aggregation in mind. This is not possible in the old browser, as it is not flexible enough. It is planned to be integrated when all basic features are finished.

    Proposed access from user’s perspective will be through configuration options: use_preaggregation, preaggregation_prefix, preaggregation_schema and a method for cube pre-aggregation will be available through the slicer tool.

    Summary

    The new browser has better internal structure resulting in increased flexibility for future extensions. It fixes not so good architectural decisions of the old browser.

    New and fixed features:

    • direct star/snowflake schema browsing
    • improved mappings - more transparent and understandable process
    • ability to explicitly specify database schemas
    • multiple aggregations

    The new backend sources are here and the mapper is here.

    To do

    To be done in the near future:

    • DDL generator for denormalized schema, corresponding logical schema and physical schema
    • explicit list of attributes to be selected (instead of all)
    • selection of aggregations per-request (now all specified in model are used)

    Links

    See also Cubes at github, Cubes Documentation, Mailing List and Submit issues. Also there is an IRC channel #databrewery on irc.freenode.net

  • Tags: cubes olap by Stefan Urbanek
  • Star Browser, Part 2: Joins and Denormalization

    Last time I was talking about how logical attributes are mapped to the physical table columns in the Star Browser. Today I will describe how joins are formed and how denormalization is going to be used.

    The Star Browser is new aggregation browser in for the Cubes – lightweight Python OLAP Framework.

    Star, Snowflake, Master and Detail

    Star browser supports a star:

    … and snowflake database schema:

    The browser should know how to construct the star/snowflake and that is why you have to specify the joins of the schema. The join specification is very simple:

    "joins" = [
        { "master": "fact_sales.product_id", "detail": "dim_product.id" }
    ]
    

    Joins support only single-column keys, therefore you might have to create surrogate keys for your dimensions.

    As in mappings, if you have specific needs for explicitly mentioning database schema or any other reason where table.column reference is not enough, you might write:

    "joins" = [
        { 
            "master": "fact_sales.product_id",
            "detail": {
                "schema": "sales",
                "table": "dim_products",
                "column": "id"
            }
    ]
    

    What if you need to join same table twice? For example, you have list of organizations and you want to use it as both: supplier and service consumer. It can be done by specifying alias in the joins:

    "joins" = [
        {
            "master": "contracts.supplier_id", 
            "detail": "organisations.id",
            "alias": "suppliers"
        },
        {
            "master": "contracts.consumer_id", 
            "detail": "organisations.id",
            "alias": "consumers"
        }
    ]
    

    In the mappings you refer to the table by alias specified in the joins, not by real table name:

    "mappings": {
        "supplier.name": "suppliers.org_name",
        "consumer.name": "consumers.org_name"
    }
    

    Relevant Joins and Denormalization

    The new mapper joins only tables that are relevant for given query. That is, if you are browsing by only one dimension, say product, then only product dimension table is joined.

    Joins are slow, expensive and the denormalization can be helpful:

    The old browser is based purely on the denormalized view. Despite having a performance gain, it has several disadvantages. From the join/performance perspective the major one is, that the denormalization is required and it is not possible to browse data in a database that was “read-only”. This requirements was also one unnecessary step for beginners, which can be considered as usability problem.

    Current implementation of the Mapper and StarBrowser allows denormalization to be integrated in a way, that it might be used based on needs and situation:

    It is not yet there and this is what needs to be done:

    • function for denormalization - similar to the old one: will take cube and view name and will create denormalized view (or a table)
    • make mapper accept the view and ignore joins

    Goal is not just to slap denormalization in, but to make it a configurable alternative to default star browsing. From user’s perspective, the workflow will be:

    1. browse star/snowflake until need for denormalization arises
    2. configure denormalization and create denormalized view
    3. browse the denormalized view

    The proposed options are: use_denormalization, denormalized_view_prefix, denormalized_view_schema.

    The Star Browser is half-ready for the denormalization, just few changes are needed in the mapper and maybe query builder. These changes have to be compatible with another, not-yet-included feature: SQL pre-aggregation.

    Conclusion

    The new way of joining is very similar to the old one, but has much more cleaner code and is separated from mappings. Also it is more transparent. New feature is the ability to specify a database schema. Planned feature to be integrated is automatic join detection based on foreign keys.

    In the next post (the last post in this series) about the new StarBrowser, I am going to explain aggregation improvements and changes.

    Links

    Relevant source code is this one (github).

    See also Cubes at github, Cubes Documentation, Mailing List and Submit issues. Also there is an IRC channel #databrewery on irc.freenode.net

  • Tags: cubes olap by Stefan Urbanek
  • Cubes 0.8 Released

    Another minor release of Cubes - Light Weight Python OLAP framework is out. Main change is that backend is no longer hard-wired in the Slicer server and can be selected through configuration file.

    There were lots of documentation changes, for example the reference was separated from the rest of docs. Hello World! example was added.

    The news, changes and fixes are:

    New Features

    • Started writing StarBrowser - another SQL aggregation browser with different approach (see code/docs)

    Slicer Server:

    • added configuration option modules under [server] to load additional modules
    • added ability to specify backend module
    • backend configuration is in [backend] by default, for SQL it stays in [db]
    • added server config option for default prettyprint value (useful for demontration purposes)

    Documentation:

    • Changed license to MIT + small addition. Please refer to the LICENSE file.
    • Updated documentation - added missing parts, made reference more readable, moved class and function reference docs from descriptive part to reference (API) part.
    • added backend documentation
    • Added “Hello World!” example

    Changed Features

    • removed default SQL backend from the server
    • moved worskpace creation into the backend module

    Fixes

    • Fixed create_view to handle not materialized properly (thanks to deytao)
    • Slicer tool header now contains #!/usr/bin/env python

    Links

    If you have any questions, comments, requests, do not hesitate to ask.

  • Tags: cubes announcement olap by Stefan Urbanek
  • Cubes – Python OLAP Framework Architecture

    What is inside the Cubes Python OLAP Framework? Here is a brief overview of the core modules, their purpose and functionality.

    The lightweight framework Cubes is composed of four public modules:

    • model - Description of data (metadata): dimensions, hierarchies, attributes, labels, localizations.
    • browser - Aggregation browsing, slicing-and-dicing, drill-down.
    • backends - Actual aggregation implementation and utility functions.
    • server - WSGI HTTP server for Cubes

    Model

    Logical model describes the data from user’s or analyst’s perspective: data how they are being measured, aggregated and reported. Model is independent of physical implementation of data. This physical independence makes it easier to focus on data instead on ways of how to get the data in understandable form.

    Cubes model is described by:

    • model object (doc)
    • list of cubes
    • dimensions of cubes (they are shared with all cubes within model) (doc) (doc)
    • hierarchies (doc) and hierarchy levels (doc) of dimensions (such as category-subcategory, country-region-city)
    • optional mappings from logical model to the physical model (doc)
    • optional join specifications for star schemas, used by the SQL denormalizing backend (doc)

    There is a utility function provided for loading the model from a JSON file: load_model.

    The model module object are capable of being localized (see Model Localization for more information). The cubes provides localization at the metadata level (the model) and functionality to have localization at the data level.

    See also: Model Documentation

    Browser

    Core of the Cubes analytics functionality is the aggregation browser. The browser module contains utility classes and functions for the browser to work.

    The module components are:

    • Cell – specification of the portion of the cube to be explored, sliced or drilled down. Each cell is specified by a set of cuts. A cell without any cuts represents whole cube.
    • Cut – definition where the cell is going to be sliced through single dimension. There are three types of cuts: point, range and set.

    The types of cuts:

    • Point Cut – Defines one single point on a dimension where the cube is going to be sliced. The point might be at any level of hierarchy. The point is specified by “path”. Examples of point cut: [2010] for year level of Date dimension, [2010,1,7] for full date point.
    • Range Cut – Defines two points (dimension paths) on a sortable dimension between whose the cell is going to be sliced from cube.
    • Set Cut – Defines list of multiple points (dimension paths) which are going to be included in the sliced cell.

    Example of point cut effect:

    The module provides couple utility functions:

    • path_from_string - construct a dimension path (point) from a string
    • string_from_path - get a string representation of a dimension path (point)
    • string_from_cuts and cuts_from_string are for conversion between string and list of cuts. (Currently only list of point cuts are supported in the string representation)

    The aggregation browser can:

    • aggregate a cell (aggregate(cell))
    • drill-down through multiple dimensions and aggregate (aggregate(cell, drilldown="date"))
    • get all detailed facts within the cell (facts(cell))
    • get single fact (fact(id))

    There is convenience function report(cell, report) that can be implemented by backend in more efficient way to get multiple aggregation queries in single call.

    More about aggregated browsing can be found in the Cubes documentation.

    Backends

    Actual aggregation is provided by the backends. The backend should implement aggregation browser interface.

    Cubes comes with built-in ROLAP backend which uses SQL database through SQLAlchemy. The backend has two major components:

    • aggregation browser that works on single denormalized view or a table
    • SQL denormalizer helper class that converts star schema into a denormalized view or table (kind of materialisation).

    There was an attempt to write a Mongo DB backend, but it does not work any more, it is included in the sources only as reminder, that there should be a mongo backend sometime in the future.

    Anyone can write a backend. If you are interested, drop me a line.

    Server

    Cubes comes with Slicer - a WSGI HTTP OLAP server with API for most of the cubes framework functionality. The server is based on the Werkzeug framework.

    Intended use of the slicer is basically as follows:

    • application prepares the cell to be aggregated, drilled, listed… The cell might be whole cube.
    • HTTP request is sent to the server
    • the server uses appropriate aggregation browser backend (note that currently there is only one: SQL denormalized) to compute the request
    • Slicer returns a JSON reply to the application

    For more information, please refer to the Cubes Slicer server documentation.

    One more thing…

    There are plenty things to get improved, of course. Current focus is not on performance, but on achieving simple usability.

    The Cubes sources can be found on Github: https://github.com/stiivi/cubes . There is also a IRC channel #databrewery on irc.freenode.net (I try to be there during late evening CET). Issues can be reported on the github project page.

    If you have any questions, suggestions, recommendations, just let me know.

    HackerNews Thread

  • Tags: cubes olap by Stefan Urbanek
  • Cubes 0.7.1 released

    I am glad to announce new minor release of Cubes - Light Weight Python OLAP framework for multidimensional data aggregation and browsing. The news, changes and fixes are:

    New Features

    • New method: Dimension.attribute_reference: returns full reference to an attribute
    • str(cut) will now return constructed string representation of a cut as it can be used by Slicer

    Slicer server:

    • added /locales to slicer
    • added locales key in /model request
    • added Access-Control-Allow-Origin for JS/jQuery

    Changes

    • Allow dimensions in cube to be a list, noy only a dictionary (internally it is ordered dictionary)
    • Allow cubes in model to be a list, noy only a dictionary (internally it is ordered dictionary)

    Slicer server:

    • slicer does not require default cube to be specified: if no cube is in the request then try default from config or get first from model

    Fixes

    • Slicer not serves right localization regardless of what localization was used first after server was launched (changed model localization copy to be deepcopy (as it should be))
    • Fixes some remnants that used old Cell.foo based browsing to Browser.foo(cell, …) only browsing
    • fixed model localization issues; once localized, original locale was not available
    • Do not try to add locale if not specified. Fixes #11: https://github.com/Stiivi/cubes/issues/11

    Tutorials

    Added tutorials in tutorials/ with models in tutorials/models/ and data in tutorials/data/:

    • Tutorial 1:
      • how to build a model programatically
      • how to create a model with flat dimensions
      • how to aggregate whole cube
      • how to drill-down and aggregate through a dimension
    • Tutorial 2:
      • how to create and use a model file
      • mappings
    • Tutorial 3:
      • how hierarhies work
      • drill-down through a hierarchy
    • Tutorial 4 (not blogged about it yet):
      • how to launch slicer server

    Links

    If you have any questions, comments, requests, do not hesitate to ask.

  • Tags: cubes olap announcement release by Stefan Urbanek
  • How-to: hierarchies, levels and drilling-down

    In this Cubes OLAP how-to we are going to learn:

    • how to create a hierarchical dimension
    • how to do drill-down through a hierarchy
    • detailed level description

    In the previous tutorial we learned how to use model descriptions in a JSON file and how to do physical to logical mappings.

    Data used are similar as in the second tutorial, manually modified IBRD Balance Sheet taken from The World Bank. Difference between second tutorial and this one is added two columns: category code and sub-category code. They are simple letter codes for the categories and subcategories.

    Hierarchy

    Some dimensions can have multiple levels forming a hierarchy. For example dates have year, month, day; geography has country, region, city; product might have category, subcategory and the product.

    Note: Cubes supports multiple hierarchies, for example for date you might have year-month-day or year-quarter-month-day. Most dimensions will have one hierarchy, thought.

    In our example we have the item dimension with three levels of hierarchy: category, subcategory and line item:

    The levels are defined in the model:

    "levels": [
        {
            "name":"category",
            "label":"Category",
            "attributes": ["category"]
        },
        {
            "name":"subcategory",
            "label":"Sub-category",
            "attributes": ["subcategory"]
        },
        {
            "name":"line_item",
            "label":"Line Item",
            "attributes": ["line_item"]
        }
    ]
    

    You can see a slight difference between this model description and the previous one: we didn’t just specify level names and didn’t let cubes to fill-in the defaults. Here we used explicit description of each level. name is level identifier, label is human-readable label of the level that can be used in end-user applications and attributes is list of attributes that belong to the level. The first attribute, if not specified otherwise, is the key attribute of the level.

    Other level description attributes are key and label_attribute. The key specifies attribute name which contains key for the level. Key is an id number, code or anything that uniquely identifies the dimension level. label_attribute is name of an attribute that contains human-readable value that can be displayed in user-interface elements such as tables or charts.

    Preparation

    In this how-to we are going to skip all off-topic code, such as data initialization. The full example can be found in the tutorial sources with suffix 03.

    In short we need:

    • data in a database
    • logical model (see model_03.json) prepared with appropriate mappings
    • denormalized view for aggregated browsing (for current simple SQL browser implementation)

    Drill-down

    Drill-down is an action that will provide more details about data. Drilling down through a dimension hierarchy will expand next level of the dimension. It can be compared to browsing through your directory structure.

    We create a function that will recursively traverse a dimension hierarchy and will print-out aggregations (count of records in this example) at the actual browsed location.

    Attributes

    • cell - cube cell to drill-down
    • dimension - dimension to be traversed through all levels
    • path - current path of the dimension

    Path is list of dimension points (keys) at each level. It is like file-system path.

    def drill_down(cell, dimension, path = []):
    

    Get dimension’s default hierarchy. Cubes supports multiple hierarchies, for example for date you might have year-month-day or year-quarter-month-day. Most dimensions will have one hierarchy, thought.

        hierarchy = dimension.default_hierarchy
    

    Base path is path to the most detailed element, to the leaf of a tree, to the fact. Can we go deeper in the hierarchy?

        if hierarchy.path_is_base(path):
            return
    

    Get the next level in the hierarchy. levels_for_path returns list of levels according to provided path. When drilldown is set to True then one more level is returned.

        levels = hierarchy.levels_for_path(path,drilldown=True)
        current_level = levels[-1]
    

    We need to know name of the level key attribute which contains a path component. If the model does not explicitly specify key attribute for the level, then first attribute will be used:

        level_key = dimension.attribute_reference(current_level.key)
    

    For prettier display, we get name of attribute which contains label to be displayed for the current level. If there is no label attribute, then key attribute is used.

        level_label = dimension.attribute_reference(current_level.label_attribute)
    

    We do the aggregation of the cell… Think of ls $CELL command in commandline, where $CELL is a directory name. In this function we can think of $CELL to be same as current working directory (pwd)

        result = browser.aggregate(cell, drilldown=[dimension])
    
        for record in result.drilldown:
            print "%s%s: %d" % (indent, record[level_label], record["record_count"])
            ...
    

    And now the drill-down magic. First, construct new path by key attribute value appended to the current path:

            drill_path = path[:] + [record[level_key]]
    

    Then get a new cell slice for current path:

            drill_down_cell = cell.slice(dimension, drill_path)
    

    And do recursive drill-down:

            drill_down(drill_down_cell, dimension, drill_path)
    

    The function looks like this:

    Working function example 03 can be found in the tutorial sources.

    Get the full cube (or any part of the cube you like):

    cell = browser.full_cube()
    

    And do the drill-down through the item dimension:

    drill_down(cell, cube.dimension("item"))
    

    The output should look like this:

    a: 32
        da: 8
            Borrowings: 2
            Client operations: 2
            Investments: 2
            Other: 2
        dfb: 4
            Currencies subject to restriction: 2
            Unrestricted currencies: 2
        i: 2
            Trading: 2
        lo: 2
            Net loans outstanding: 2
        nn: 2
            Nonnegotiable, nonintrest-bearing demand obligations on account of subscribed capital: 2
        oa: 6
            Assets under retirement benefit plans: 2
            Miscellaneous: 2
            Premises and equipment (net): 2
    

    Note that because we have changed our source data, we see level codes instead of level names. We will fix that later. Now focus on the drill-down.

    See that nice hierarchy tree?

    Now if you slice the cell through year 2010 and do the exact same drill-down:

        cell = cell.slice("year", [2010])
        drill_down(cell, cube.dimension("item"))
    

    you will get similar tree, but only for year 2010 (obviously).

    Level Labels and Details

    Codes and ids are good for machines and programmers, they are short, might follow some scheme, easy to handle in scripts. Report users have no much use of them, as they look cryptic and have no meaning for the first sight.

    Our source data contains two columns for category and for subcategory: column with code and column with label for user interfaces. Both columns belong to the same dimension and to the same level. The key column is used by the analytical system to refer to the dimension point and the label is just decoration.

    Levels can have any number of detail attributes. The detail attributes have no analytical meaning and are just ignored during aggregations. If you want to do analysis based on an attribute, make it a separate dimension instead.

    So now we fix our model by specifying detail attributes for the levels:

    The model description is:

    "levels": [
            {
                "name":"category",
                "label":"Category",
                "label_attribute": "category_label",
                "attributes": ["category", "category_label"]
            },
            {
                "name":"subcategory",
                "label":"Sub-category",
                "label_attribute": "subcategory_label",
                "attributes": ["subcategory", "subcategory_label"]
            },
            {
                "name":"line_item",
                "label":"Line Item",
                "attributes": ["line_item"]
            }
        ]
    }
    

    Note the label_attribute keys. They specify which attribute contains label to be displayed. Key attribute is by-default the first attribute in the list. If one wants to use some other attribute it can be specified in key_attribute.

    Because we added two new attributes, we have to add mappings for them:

    "mappings": { "item.line_item": "line_item",
                  "item.subcategory": "subcategory",
                  "item.subcategory_label": "subcategory_label",
                  "item.category": "category",
                  "item.category_label": "category_label" 
                 }
    

    In the example tutorial, which can be found in the Cubes sources under tutorial/ directory, change the model file from model/model_03.json to model/model_03-labels.json and run the code again. Or fix the file as specified above.

    Now the result will be:

    Assets: 32
        Derivative Assets: 8
            Borrowings: 2
            Client operations: 2
            Investments: 2
            Other: 2
        Due from Banks: 4
            Currencies subject to restriction: 2
            Unrestricted currencies: 2
        Investments: 2
            Trading: 2
        Loans Outstanding: 2
            Net loans outstanding: 2
        Nonnegotiable: 2
            Nonnegotiable, nonintrest-bearing demand obligations on account of subscribed capital: 2
        Other Assets: 6
            Assets under retirement benefit plans: 2
            Miscellaneous: 2
            Premises and equipment (net): 2
    

    Implicit hierarchy

    Try to remove the last level line_item from the model file and see what happens. Code still works, but displays only two levels. What does that mean? If metadata - logical model - is used properly in an application, then application can handle most of the model changes without any application modifications. That is, if you add new level or remove a level, there is no need to change your reporting application.

    Summary

    • hierarchies can have multiple levels
    • a hierarchy level is identifier by a key attribute
    • a hierarchy level can have multiple detail attributes and there is one special detail attribute: label attribute used for display in user interfaces

    Next: slicing and dicing or slicer server, not sure yet.

    If you have any questions, suggestions, comments, let me know.

  • Tags: cubes tutorial olap howto by Stefan Urbanek
  • Book: Star Schema – The Complete Reference

    Well written book - very understandable even for a beginner, despite being focused on more advanced specialists. Explains multi-dimensional database design: star schemas, snowflakes, fact tables, dimensions, aggregated data browsing and more.

  • Tags: cubes reading olap by Stefan Urbanek
  • Cubes 0.7 released

    I am happy to announce another release of Cubes - Python OLAP framework for multidimensional data aggregation and browsing.

    This release, besides some new features, renames Cuboid to more appropriate Cell. This introduces backward python API incompatibility.

    Main source repository has changed to Github https://github.com/Stiivi/cubes

    Changes

    • Class ‘Cuboid’ was renamed to more correct ‘Cell’. ‘Cuboid’ is a part of cube with subset of dimensions.
    • all APIs with ‘cuboid’ in their name/arguments were renamed to use ‘cell’ instead
    • Changed initialization of model classes: Model, Cube, Dimension, Hierarchy, Level to be more “pythony”: instead of using initialization dictionary, each attribute is listed as parameter, rest is handled from variable list of key word arguments
    • Improved handling of flat and detail-less dimensions (dimensions represented just by one attribute which is also a key)

    Model Initialization Defaults:

    • If no levels are specified during initialization, then dimension name is considered flat, with single attribute.
    • If no hierarchy is specified and levels are specified, then default hierarchy will be created from order of levels
    • If no levels are specified, then one level is created, with name default and dimension will be considered flat

    Note: This initialization defaults might be moved into a separate utility function/class that will populate incomplete model (see Issue #8 )

    New features

    Slicer server:

    • changed to handle multiple cubes within model: you have to specify a cube for /aggregate, /facts,… in form: /cube//
    • reflect change in configuration: removed view, added view_prefix and view_suffix, the cube view name will be constructed by concatenating view prefix + cube name + view suffix
    • in aggregate drill-down: explicit dimension can be specified with drilldown=dimension:level, such as: date:month

    This change is considered final and therefore we can mark it is as API version 1.

    Links:

    If you have any questions, comments, requests, do not hesitate to ask.

  • Tags: announcement release cubes olap by Stefan Urbanek
  • Cubes 0.6 released

    New version of Cubes - Python OLAP framework and server - was released.

    Cubes is a framework for:

    Notable changes:

    • added ‘details’ to cube metadata - attributes that might contain fact details which are not relevant to aggregation, but might be interesting when displaying facts (such as contract name or notes)
    • added ordering of facts in aggregation browser

    SQL

    • SQL denormalizer can now, by request, automatically add indexes to level key columns
    • one detail table can be used more than once in SQL denomralizer (such as an organisation for both - supplier and requestor), added key alias to joins in model description, see joins documentation for more information.

    Slicer server

    • added log a and log_level configuration options (under [server])
    • added format= parameter to /facts, accepts json and csv
    • added fields= parameter to /facts - comma separated list of returned fields in CSV (see API)
    • limit number of facts returned in JSON (configurable by json_record_limit in [server] section), CSV can return whole dataset and will do it iteratively (we do not want to consume all of our memory, do we?)

    Also many bugs were fixed, including localization in fact(s) retrieval and pagination. Sharing of single SQLAlchemy engine and model within server thread was added for performance reasons.

    Enjoy.

  • Tags: announcement cubes olap