Cubes 1.0 Released - Pluggable Data Warehouse
Finally it is here: Cubes 1.0. Many of you are already using it from Github or from PyPi, it just has not been officially released, so here we go.
Cubes now brings a light-weight way to create concept-oriented pluggable data warehouse from multipe sources.
Summary:
- Analytical Workspace and Model Providers
- Model Objects Redesign
- HTTP API changes
- New Backends
- New SQL Backend Features
- Authentication and Authorization
The changes are major, backward incompatible, but necessary for the future direction of the Cubes.
Analytical Workspace
The biggest change is the Workspace – pluggable data-warehouse. You are no longer limited to one one model, one type of data store (database) and one set of cubes. The new Workspace is now framework-level controller object that manages models (model sources), cubes and datastores. To the future more features will be added to the workspace.
- Multiple models per workspace/server instead of only one
- Multiple backends per workspace/server instead of only one
- Multiple data stores per workspace/server instead of only one
Models can now be generated or converted on-the-fly from another service with the new concept of Model Providers.
See also: Workspace, Providers
Model Objects Redesign
Notable change is addition of new object: Measure Aggregate. Cubes now distinguishes between measures and aggregates. measure represents a numerical fact property, aggregate represents aggregated value (applied aggregate function on a property, or provided natively by the backend). This new approach of aggregates makes development of backends and clients much easier. There is no need to construct and guess aggregate measures or splitting the names from the functions. Backends receive concrete objects with sufficient information to perform the aggregation (either by a function or fetch already computed value).
Now you can name the "record_count" as you like or you might not have it at all, if you do not like it.
More info about model can be found in the model documentation.
Other model changes:
- cardinality - metadata that helps front-end to determine which kind of UI item to use or might restrict hich-cardinality queries
- dimension linking – cubes can specify how the dimensions are going to be linked: specify what hierarchies are relevant to the cube, how what is the cardinality of dimension in the context of the cube and more.
- roles dimensions and levels can have roles – metadata that might make
dims/levels be handled in a special way. Currently only the
time
role is implemented.
HTTP API Changes
The server end-points have changed. Concept of global model was dropped, now there is just list of cubes. The front-end should approach the server in two steps:
- Get list of cubes with
/cubes
– only basic information, no structure metadata - Get full cube model with
/cube/NAME/model
Other changes:
- Many actions now accept
format=
parameter, which can bejson
,csv
orjson_lines
(new-line separated JSON). - Cuts for date dimension accepts named relative time references such as
cut=date:90daysago-today
- Dimension path elements can contain special characters if they are escaped
by a backslash such as
cut=city:Nové\ Mesto
Backends
New backends:
- MongoDB (thanks to Robin Thomas)
- full implementation of the Slicer backend
- Mixpanel
- Google Analytics
With model providers you can easily create a backend for any other service which serves cube-like data and plug it into your data warehouse.
SQL Features
Notable addition to the SQL backend are outer joins (finally!): three types of joins were added to the SQL backend: match (inner), master (left outer) and detail (right outer).
More info about the SQL features.
Non-additive
Provisional semi-additive time dimension support was added. An aggregate can specify that it is non-additive through the time dimension (such as account amount snapshots) and the generated query will handle the situation by choosing the latest entry used.
The initial metadata infrastructure is in place. More flexible implementation that will include other dimensions and element selection functions will be introduced in the future releases.
Credit goes to Robin Thomas for this feature.
Authentication and Authorization
Authentication at the server level and authorization at the workspace level. The interface is extensible, so you can implement any method you like. The built-in methods are pretty simple:
permissive authentication methods: pass-parameter – just pass api_key parameter in the URL or Basic HTTP proxy – using username, ignoring password (there is one for testing purposes called "adminadmin" ...)
authorization: JSON configuration for roles (inheritable) and rights.
The authorization has two parts: access to the cube and restriction cell for a cube.
Visualizer
Cubes comes with a built-in Visualizer – a web app for visualizing cubes data over time series. Main features: drill-down, filtering, many chart options, connects to any cubes server. The application was developed by Robin Thomas and Ryan Berlew.
About the Release
This release is a milestone in Cubes interface: the model metadata structure and the slicer API. They are very unlikely to be changed, may be slighly adjusted with maintaining backward compatibility or at least some kind of conversion tools.
Things that might change, due to planned refactoring:
- Python interface – mostly Workspace and model compilation
- Localization – definition of model localization
- Extensions interface - which methods the extensions should implement and how
- Configuration – slight changes in the slicer.ini sections
The above changes will be stabilized around v1.1 or v1.2 release.
To sum it up: it is safe to build applications on top of the Slicer/HTTP interface and it is safe to generate models to be used by cubes.
Credits
Many thanks to Robin Thomas and Ryan Berlew for major code contributions and for the Visualizer. Credit also goes to Jose Juan Montes, Tomas Levine and Byron Yi.
Links
Read the detailed list of changes.
Important note: The cubes repository has moved to the Data Brewery github organization group (read more).
Most recent sources can be found on github.
Questions, comments, suggestions for discussion can be posted to the Cubes Google Group for discussion, problem solving and announcements.
Submit issues and suggestions on github
IRC channel #databrewery on irc.freenode.net