2012-10-05 by Stefan Urbanek

Cubes 0.10 Released

After a while, here is an update to Cubes - Python Lightweight OLAP framework for multidimensional modeling. There are some changes included that were mentioned in the EruoPython talk such as table_rows and cross_table.

I recommend to look at updated examples in the Github repository. The Flask example is now "real" example instead of "sandbox" example and you can see how to generate a simple table for dimension hierarchy browsing.

There is also a more complex example with star-like schema dataset in the cubes-examples github repository. Follow the instructions in README files how to make it running.

There are some backward incompatible changes in this release – until 1.0 the "point" releases will contain this kind of changes, as it is still evolving. You can find more information below.

Quick Summary

Way how model is constructed has changed. Designated methods are create_model() or load_model()
Dimension defition can have a "template". For example:

    {
      "name": "contract_date",
      "template": "date"
    }

added table_rows() and cross_table() to aggregation result for more convenient table creation. The table_rows takes care of providing appropriate dimension key and label for browsed level.
added simple_model(cube_name, dimension_names, measures)

Incompatibilities: use create_model() instead of Model(**dict), if you were using just load_model(), you are fine.

New Features

To address issue #8 create_model(dict) was added as replacement for Model(**dict). Model() from now on will expect correctly constructed model objects. create_model() will be able to handle various simplifications and defaults during the construction process.
added info attribute to all model objects. It can be used to store custom, application or front-end specific information
preliminary implementation of cross_table() (interface might be changed)
AggregationResult.table_rows() - new method that iterates through drill-down rows and returns a tuple with key, label, path, and rest of the fields.
dimension in model description can specify another template dimension – all properties from the template will be inherited in the new dimension. All dimension properties specified in the new dimension completely override the template specification
added point_cut_for_dimension
added simple_model(cube_name, dimensions, measures) – creates a single-cube model with flat dimensions from a list of dimension names and measures from a list of measure names. For example:

model = simple_model("contracts", ["year","contractor", "type"], ["amount"])

Slicer Server:

/cell – return cell details (replaces /details)

Changes

creation of a model from dictionary through Model(dict) is depreciated, use create_model(dict) instead. All initialization code will be moved there. Depreciation warnings were added. Old functionality retained for the time being. (important)
Replaced Attribute.full_name() with Attribute.ref()
Removed Dimension.attribute_reference() as same can be achieved with dim(attr).ref()
AggregationResult.drilldown renamed to AggregationResults.cells (important)

Planned Changes:

str(Attribute) will return ref() instead of attribute name as it is more useful

Fixes

order of dimensions is now preserved in the Model

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

2012-05-29 by Stefan Urbanek

Cubes 0.9.1: Ranges, denormalization and query cell

The new minor release of Cubes – light-weight Python OLAP framework – brings range cuts, denormalization with the slicer tool and cells in /report query, together with fixes and important changes.

See the second part of this post for the full list.

Range Cuts

Range cuts were implemented in the SQL Star Browser. They are used as follows:

Python:

cut = RangeCut("date", [2010], [2012,5,10])
cut_hi = RangeCut("date", None, [2012,5,10])
cut_low = RangeCut("date", [2010], None)

To specify a range in slicer server where keys are sortable:

    http://localhost:5000/aggregate?cut=date:2004-2005
    http://localhost:5000/aggregate?cut=date:2004,2-2005,5,1

Open ranges:

    http://localhost:5000/aggregate?cut=date:2010-
    http://localhost:5000/aggregate?cut=date:2004,1,1-
    http://localhost:5000/aggregate?cut=date:-2005,5,10
    http://localhost:5000/aggregate?cut=date:-2012,5

Denormalization with slicer Tool

Now it is possible to denormalize tour data with the slicer tool. You do not have to denormalize using python script. Data are denormalized in a way how denormalized browser would expect them to be. You can tune the process using command line switches, if you do not like the defaults.

Denormalize all cubes in the model:

$ slicer denormalize slicer.ini

Denormalize only one cube::

$ slicer denormalize -c contracts slicer.ini

Create materialized denormalized view with indexes::

$ slicer denormalize --materialize --index slicer.ini

Example slicer.ini:

[workspace]
denormalized_view_prefix = mft_
denormalized_view_schema = denorm_views

# This switch is used by the browser:
use_denormalization = yes

For more information see Cubes slicer tool documentation

Cells in Report

Use cell to specify all cuts (type can be range, point or set):

{
    "cell": [
        {
            "dimension": "date",
            "type": "range",
            "from": [2010,9],
            "to": [2011,9]
        }
    ],
    "queries": {
        "report": {
            "query": "aggregate",
            "drilldown": {"date":"year"}
        }
    }
}

For more information see the slicer server documentation.

New Features

cut_from_string(): added parsing of range and set cuts from string; introduced requirement for key format: Keys should now have format "alphanumeric character or underscore" if they are going to be converted to strings (for example when using slicer HTTP server)
cut_from_dict(): create a cut (of appropriate class) from a dictionary description
Dimension.attribute(name): get attribute instance from name
added exceptions: CubesError, ModelInconsistencyError, NoSuchDimensionError, NoSuchAttributeError, ArgumentError, MappingError, WorkspaceError and BrowserError

StarBrowser:

implemented RangeCut conditions

Slicer Server:

/report JSON now accepts cell with full cell description as dictionary, overrides URL parameters

Slicer tool:

denormalize option for (bulk) denormalization of cubes (see the the slicer documentation for more information)

Changes

important: all /report JSON requests should now have queries wrapped in the key queries. This was originally intended way of use, but was not correctly implemented. A descriptive error message is returned from the server if the key queries is not present. Despite being rather a bug-fix, it is listed here as it requires your attention for possible change of your code.
warn when no backend is specified during slicer context creation

Fixes

Better handling of missing optional packages, also fixes #57 (now works without slqalchemy and without werkzeug as expected)
see change above about /report and queries
push more errors as JSON responses to the requestor, instead of just failing with an exception

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

2012-05-14 by Stefan Urbanek

Cubes 0.9 Released

The new version of Cubes – light-weight Python OLAP framework – brings new StarBrowser, which we discussed in previous blog posts:

mappings, see also documentation
joins and denormalization
aggregations and new features, see also documentation

The new SQL backend is written from scratch, it is much cleaner, transparent, configurable and open for future extensions. Also allows direct browsing of star/snowflake schema without denormalization, therefore you can use Cubes on top of a read-only database. See DenormalizedMapper and SnowflakeMapper for more information.

Just to name a few new features: multiple aggregated computations (min, max,...), cell details, optional/configurable denormalization.

Important Changes

Summary of most important changes that might affect your code:

Slicer: Change all your slicer.ini configuration files to have [workspace] section instead of old [db] or [backend]. Depreciation warning is issued, will work if not changed.

Model: Change dimensions in model to be an array instead of a dictionary. Same with cubes. Old style: "dimensions" = { "date" = ... } new style: "dimensions" = [ { "name": "date", ... } ]. Will work if not changed, just be prepared.

Python: Use Dimension.hierarchy() instead of Dimension.default_hierarchy.

New Features

slicer_context() - new method that holds all relevant information from configuration. can be reused when creating tools that work in connected database environment
added Hierarchy.all_attributes() and .key_attributes()
Cell.rollup_dim() - rolls up single dimension to a specified level. this might later replace the Cell.rollup() method
Cell.drilldown() - drills down the cell
create_workspace(backend,model, **options) - new top-level method for creating a workspace by specifying backend name. Easier to create browsers (from possible browser pool) programmatically. The backend name might be full module name path or relative to the cubes.backends, for example sql.star for new or sql.browser for old SQL browser.
get_backend() - get backend by name
AggregationBrowser.cell_details(): New method returning values of attributes representing the cell. Preliminary implementation, return value might change.
AggregationBrowser.cut_details(): New method returning values of attributes representing a single cut. Preliminary implementation, return value might change.
Dimension.validate() now checks whether there are duplicate attributes
Cube.validate() now checks whether there are duplicate measures or details

SQL backend:

new StarBrowser implemented:
- StarBrowser supports snowflakes or denormalization (optional)
- for snowflake browsing no write permission is required (does not have to be denormalized)
new DenormalizedMapper for mapping logical model to denormalized view
new SnowflakeMapper for mapping logical model to a snowflake schema
ddl_for_model() - get schema DDL as string for model
join finder and attribute mapper are now just Mapper - class responsible for finding appropriate joins and doing logical-to-physical mappings
coalesce_attribute() - new method for coalescing multiple ways of describing a physical attribute (just attribute or table+schema+attribute)
dimension argument was removed from all methods working with attributes (the dimension is now required attribute property)
added create_denormalized_view() with options: materialize, create_index, keys_only

Slicer tool/server:

slicer ddl - generate schema DDL from model
slicer test - test configuration and model against database and report list of issues, if any
Backend options are now in [workspace], removed configurability of custom backend section. Warning are issued when old section names [db] and [backend] are used
server responds to /details which is a result of AggregationBrowser.cell_details()

Examples:

added simple Flask based web example - dimension aggregation browser

Changes

in Model: dimension and cube dictionary specification during model initialization is depreciated, list should be used (with explicitly mentioned attribute "name") -- important
important: Now all attribute references in the model (dimension attributes, measures, ...) are required to be instances of Attribute() and the attribute knows it's dimension
removed hierarchy argument from Dimension.all_attributes() and .key_attributes()
renamed builder to denormalizer
Dimension.default_hierarchy is now depreciated in favor of Dimension.hierarchy() which now accepts no arguments or argument None - returning default hierarchy in those two cases
metadata are now reused for each browser within one workspace - speed improvement.

Fixes

Slicer version should be same version as Cubes: Original intention was to have separate server, therefore it had its own versioning. Now there is no reason for separate version, moreover it can introduce confusion.
Proper use of database schema in the Mapper

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

2012-04-04 by Stefan Urbanek

Brewery 0.8 Released

I'm glad to announce new release of Brewery – stream based data auditing and analysis framework for Python.

There are quite a few updates, to mention the notable ones:

new brewery runner with commands run and graph
new nodes: pretty printer node (for your terminal pleasure), generator function node
many CSV updates and fixes

Added several simple how-to examples, such as: aggregation of remote CSV, basic audit of a CSV, how to use a generator function. Feedback and questions are welcome. I'll help you.

Note that there are couple changes that break compatibility, however they can be updated very easily. I apologize for the inconvenience, but until 1.0 the changes might happen more frequently. On the other hand, I will try to make them as painless as possible.

Full listing of news, changes and fixes is below.

Version 0.8

News

Changed license to MIT
Created new brewery runner commands: 'run' and 'graph':
- 'brewery run stream.json' will execute the stream
- 'brewery graph stream.json' will generate graphviz data
Nodes: Added pretty printer node - textual output as a formatted table
Nodes: Added source node for a generator function
Nodes: added analytical type to derive field node
Preliminary implementation of data probes (just concept, API not decided yet for 100%)
CSV: added empty_as_null option to read empty strings as Null values
Nodes can be configured with node.configure(dictionary, protected). If 'protected' is True, then protected attributes (specified in node info) can not be set with this method.
added node identifier to the node reference doc
added create_logger
added experimental retype feature (works for CSV only at the moment)
Mongo Backend - better handling of record iteration

Changes

CSV: resource is now explicitly named argument in CSV*Node
CSV: convert fields according to field storage type (instead of all-strings)
Removed fields getter/setter (now implementation is totally up to stream subclass)
AggregateNode: rename aggregates to measures, added measures as public node attribute
moved errors to brewery.common
removed field_name(), now str(field) should be used
use named blogger 'brewery' instead of the global one
better debug-log labels for nodes (node type identifier + python object ID)

WARNING: Compatibility break:

depreciate __node_info__ and use plain node_info instead
Stream.update() now takes nodes and connections as two separate arguments

Fixes

added SQLSourceNode, added option to keep ifelds instead of dropping them in FieldMap and FieldMapNode (patch by laurentvasseur @ bitbucket)
better traceback handling on node failure (now actually the traceback is displayed)
return list of field names as string representation of FieldList
CSV: fixed output of zero numeric value in CSV (was empty string)

Cubes 0.7.1 released

I am glad to announce new minor release of Cubes - Light Weight Python OLAP framework for multidimensional data aggregation and browsing. The news, changes and fixes are:

New Features

New method: Dimension.attribute_reference: returns full reference to an attribute
str(cut) will now return constructed string representation of a cut as it can be used by Slicer

Slicer server:

added /locales to slicer
added locales key in /model request
added Access-Control-Allow-Origin for JS/jQuery

Changes

Allow dimensions in cube to be a list, noy only a dictionary (internally it is ordered dictionary)
Allow cubes in model to be a list, noy only a dictionary (internally it is ordered dictionary)

Slicer server:

slicer does not require default cube to be specified: if no cube is in the request then try default from config or get first from model

Fixes

Slicer not serves right localization regardless of what localization was used first after server was launched (changed model localization copy to be deepcopy (as it should be))
Fixes some remnants that used old Cell.foo based browsing to Browser.foo(cell, ...) only browsing
fixed model localization issues; once localized, original locale was not available
Do not try to add locale if not specified. Fixes #11: https://github.com/Stiivi/cubes/issues/11

Tutorials

Added tutorials in tutorials/ with models in tutorials/models/ and data in tutorials/data/:

Tutorial 1:
- how to build a model programatically
- how to create a model with flat dimensions
- how to aggregate whole cube
- how to drill-down and aggregate through a dimension
Tutorial 2:
- how to create and use a model file
- mappings
Tutorial 3:
- how hierarhies work
- drill-down through a hierarchy
Tutorial 4 (not blogged about it yet):
- how to launch slicer server

Cubes 0.10 Released

Quick Summary

New Features

Changes

Fixes

Links

Share:

Cubes 0.9.1: Ranges, denormalization and query cell

Range Cuts

Denormalization with slicer Tool

Cells in Report

New Features

Changes

Fixes

Links

Share:

Cubes 0.9 Released

Important Changes

New Features

Changes

Fixes

Links

Share:

Brewery 0.8 Released

Version 0.8

News

Changes

Fixes

Links

Share:

Cubes 0.7.1 released

New Features

Changes

Fixes

Tutorials

Links

Share:

Tags