2014-09-02 by Stefan Urbanek

Cubes 1.0 Released - Pluggable Data Warehouse

Finally it is here: Cubes 1.0. Many of you are already using it from Github or from PyPi, it just has not been officially released, so here we go.

Cubes now brings a light-weight way to create concept-oriented pluggable data warehouse from multipe sources.

Summary:

  1. Analytical Workspace and Model Providers
  2. Model Objects Redesign
  3. HTTP API changes
  4. New Backends
  5. New SQL Backend Features
  6. Authentication and Authorization

Detailed list of changes.

The changes are major, backward incompatible, but necessary for the future direction of the Cubes.

Analytical Workspace

The biggest change is the Workspace – pluggable data-warehouse. You are no longer limited to one one model, one type of data store (database) and one set of cubes. The new Workspace is now framework-level controller object that manages models (model sources), cubes and datastores. To the future more features will be added to the workspace.

  • Multiple models per workspace/server instead of only one
  • Multiple backends per workspace/server instead of only one
  • Multiple data stores per workspace/server instead of only one

Models can now be generated or converted on-the-fly from another service with the new concept of Model Providers.

See also: Workspace, Providers

Model Objects Redesign

Notable change is addition of new object: Measure Aggregate. Cubes now distinguishes between measures and aggregates. measure represents a numerical fact property, aggregate represents aggregated value (applied aggregate function on a property, or provided natively by the backend). This new approach of aggregates makes development of backends and clients much easier. There is no need to construct and guess aggregate measures or splitting the names from the functions. Backends receive concrete objects with sufficient information to perform the aggregation (either by a function or fetch already computed value).

Now you can name the "record_count" as you like or you might not have it at all, if you do not like it.

More info about model can be found in the model documentation.

Other model changes:

  • cardinality - metadata that helps front-end to determine which kind of UI item to use or might restrict hich-cardinality queries
  • dimension linking – cubes can specify how the dimensions are going to be linked: specify what hierarchies are relevant to the cube, how what is the cardinality of dimension in the context of the cube and more.
  • roles dimensions and levels can have roles – metadata that might make dims/levels be handled in a special way. Currently only the time role is implemented.

HTTP API Changes

The server end-points have changed. Concept of global model was dropped, now there is just list of cubes. The front-end should approach the server in two steps:

  1. Get list of cubes with /cubes – only basic information, no structure metadata
  2. Get full cube model with /cube/NAME/model

Other changes:

  • Many actions now accept format= parameter, which can be json, csv or json_lines (new-line separated JSON).
  • Cuts for date dimension accepts named relative time references such as cut=date:90daysago-today
  • Dimension path elements can contain special characters if they are escaped by a backslash such as cut=city:Nové\ Mesto

More info

Backends

New backends:

  • MongoDB (thanks to Robin Thomas)
  • full implementation of the Slicer backend
  • Mixpanel
  • Google Analytics

With model providers you can easily create a backend for any other service which serves cube-like data and plug it into your data warehouse.

SQL Features

Notable addition to the SQL backend are outer joins (finally!): three types of joins were added to the SQL backend: match (inner), master (left outer) and detail (right outer).

More info about the SQL features.

Non-additive

Provisional semi-additive time dimension support was added. An aggregate can specify that it is non-additive through the time dimension (such as account amount snapshots) and the generated query will handle the situation by choosing the latest entry used.

The initial metadata infrastructure is in place. More flexible implementation that will include other dimensions and element selection functions will be introduced in the future releases.

Credit goes to Robin Thomas for this feature.

Authentication and Authorization

Authentication at the server level and authorization at the workspace level. The interface is extensible, so you can implement any method you like. The built-in methods are pretty simple:

permissive authentication methods: pass-parameter – just pass api_key parameter in the URL or Basic HTTP proxy – using username, ignoring password (there is one for testing purposes called "adminadmin" ...)

authorization: JSON configuration for roles (inheritable) and rights.

The authorization has two parts: access to the cube and restriction cell for a cube.

More info about authorization

Creating an auth extension

Visualizer

Cubes comes with a built-in Visualizer – a web app for visualizing cubes data over time series. Main features: drill-down, filtering, many chart options, connects to any cubes server. The application was developed by Robin Thomas and Ryan Berlew.

Instructions

About the Release

This release is a milestone in Cubes interface: the model metadata structure and the slicer API. They are very unlikely to be changed, may be slighly adjusted with maintaining backward compatibility or at least some kind of conversion tools.

Things that might change, due to planned refactoring:

  • Python interface – mostly Workspace and model compilation
  • Localization – definition of model localization
  • Extensions interface - which methods the extensions should implement and how
  • Configuration – slight changes in the slicer.ini sections

The above changes will be stabilized around v1.1 or v1.2 release.

To sum it up: it is safe to build applications on top of the Slicer/HTTP interface and it is safe to generate models to be used by cubes.

Credits

Many thanks to Robin Thomas and Ryan Berlew for major code contributions and for the Visualizer. Credit also goes to Jose Juan Montes, Tomas Levine and Byron Yi.

Links

Read the detailed list of changes.

Important note: The cubes repository has moved to the Data Brewery github organization group (read more).

Most recent sources can be found on github.

Questions, comments, suggestions for discussion can be posted to the Cubes Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

2014-02-25 by Stefan Urbanek

Welcome Robin and Thanks to Squarespace

Before the upcoming 1.0 release, I would like to introduce Cubes core developer Robin Thomas. Robin is experienced data warehouse engineer with profound knowledge of OLAP and multidimensional modeling. Robin and his team did a great job, contributed many new features and concepts.

We have quite a lot of new features and ideas thanks to Robin. Just to name a few:

  • new, completely rewritten Mongo backend
  • authorization and authentication
  • non-additive time dimension
  • statistical functions

and many more.

Thanks and credit goes also to: Brad Willard, Mathew Thomas, Ryan Berlew, Andrew Bartholomew and Emily Wagner.

In addition, I would like to thank Squarespace for their kindness and for contributing their additions back to the community as open-source.

2013-02-20 by Stefan Urbanek

Cubes 0.10.2 Released - Even More Hierarchies, Formatters and Docs

After few months and gloomy winter nights, here is a humble update of the Cubes light weight analytical framework. No major feature additions nor changes this time, except some usability tweaks and fixes.

Documentation was updated to contain relational database patterns for SQL backend. See the schemas, models and illustrations in the official documentation.

Also improvements in cross-referencing various documentation parts through see-also for having relevant information at-hand.

Thanks and credits for support and patches goes to:

  • Jose Juan Montes (@jjmontesl)
  • Andrew Zeneski
  • Reinier Reisy Quevedo Batista (@rquevedo)

Summary

  • many improvements in handling multiple hierarchies
  • more support of multiple hierarchies in the slicer server either as parameter or with syntax dimension@hierarchy:
  • dimension values: GET /dimension/date?hierarchy=dqmy
  • cut: get first quarter of 2012 ?cut=date@dqmy:2012,1
  • drill-down on hierarchy with week on implicit (next) level: ?drilldown=date@ywd
  • drill-down on hierarchy with week with exlpicitly specified week level: ?drilldown=date@ywd:week
  • order and order attribute can now be specified for a Level
  • optional safe column aliases (see docs for more info) for databases that have non-standard requirements for column labels even when quoted

New Features

  • added order to Level object - can be asc, desc or None for unspecified order (will be ignored)
  • added order_attribute to Level object - specifies attribute to be used for ordering according to order. If not specified, then first attribute is going to be used.
  • added hierarchy argument to AggregationResult.table_rows()
  • str(cube) returns cube name, useful in functions that can accept both cube name and cube object
  • added cross table formatter and its HTML variant
  • GET /dimension accepts hierarchy parameter
  • added create_workspace_from_config() to simplify workspace creation directly from slicer.ini file (this method might be slightly changed in the future)
  • to_dict() method of model objects now has a flag create_label which provides label attribute derived from the object's name, if label is missing
  • Issue #95: Allow charset to be specified in Content-Type header

SQL:

  • added option to SQL workspace/browser safe_labels to use safe column labels for databases that do not support characters like . in column names even when quoted (advanced feature, does not work with denormalization)
  • browser accepts include_cell_count and include_summary arguments to optionally disable/enable inclusion of respective results in the aggregation result object
  • added implicit ordering by levels to aggregate and dimension values methods (for list of facts it is not yet decided how this should work)
  • Issue #97: partially implemented sort_key, available in aggregate() and values() methods

Server:

  • added comma separator for order= parameter
  • reflected multiple search backend support in slicer server

Other:

  • added vim syntax highlighting goodie

Changes

  • AggregationResult.cross_table is depreciated, use cross table formatter instead
  • load_model() loads and applies translations
  • slicer server uses new localization methods (removed localization code from slicer)
  • workspace context provides proper list of locales and new key 'translations'
  • added base class Workspace which backends should subclass; backends should use workspace.localized_model(locale)
  • create_model() accepts list of translations

Fixes

  • browser.set_locale() now correctly changes browser's locale
  • Issue #97: Dimension values call cartesians when cutting by a different dimension
  • Issue #99: Dimension "template" does not copy hierarchies

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

2012-12-09 by Stefan Urbanek

Cubes 0.10.1 Released - Multiple Hierarchies

Quick Summary:

  • multiple hierarchies:
  • Python: cut = PointCut("date", [2010,15], hierarchy='ywd') (docs)
  • Server: GET /aggregate?cut=date@ywd:2010,15 (see docs - look for aggregate documentation)
  • Server drilldown: GET /aggregate?drilldown=date@ywd:week
  • added result formatters (experimental! API might change)
  • added pre-aggregations (experimental!)

New Features

  • added support for multiple hierarchies
  • added dimension_schema option to star browser – use this when you have all dimensions grouped in a separate schema than fact table
  • added HierarchyError - used for example when drilling down deeper than possible within that hierarchy
  • added result formatters: simple_html_table, simple_data_table, text_table
  • added create_formatter(formatter_type, options ...)
  • AggregationResult.levels is a new dictionary containing levels that the result was drilled down to. Keys are dimension names, values are levels.
  • AggregationResult.table_rows() output has a new variable is_base to denote whether the row is base or not in regard to table_rows dimension.
  • added create_server(config_path) to simplify wsgi script

  • added aggregates: avg, stddev and variance (works only in databases that support those aggregations, such as PostgreSQL)

  • added preliminary implemenation of pre-aggregation to sql worskspace:

  • create_conformed_rollup()
  • create_conformed_rollups()
  • create_cube_aggregate()

Server:

  • multiple drilldowns can be specified in single argument: drilldown=date,product
  • there can be multiple cut arguments that will be appended into single cell
  • added requests: GET /cubes and GET /cube/NAME/dimensions

Changes

  • Important: Changed string representation of a set cut: now using semicolon ';' as a separator instead of a plus symbol '+'
  • aggregation browser subclasses should now fill result's levels variable with coalesced_drilldown() output for requested drill-down levels.
  • Moved coalesce_drilldown() from star browser to cubes.browser module to be reusable by other browsers. Method might be renamed in the future.
  • if there is only one level (default) in a dimension, it will have same label as the owning dimension
  • hierarchy definition errors now raise ModelError instead of generic exception

Fixes

  • order of joins is preserved
  • fixed ordering bug
  • fixed bug in generating conditions from range cuts
  • AggregationResult.table_rows now works when there is no point cut
  • get correct reference in table_rows – now works when simple denormalized table is used
  • raise model exception when a table is missing due to missing join
  • search in slicer updated for latest changes
  • fixed bug that prevented using cells with attributes in aliased joined tables

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

Next Page