2014-09-02 by Stefan Urbanek
Finally it is here: Cubes 1.0. Many of you are already using it from
Github or from PyPi, it just has not
been officially released, so here we go.
Cubes now brings a light-weight way to create concept-oriented pluggable data
warehouse from multipe sources.
Summary:
- Analytical Workspace and Model Providers
- Model Objects Redesign
- HTTP API changes
- New Backends
- New SQL Backend Features
- Authentication and Authorization
Detailed list of changes.
The changes are major, backward incompatible, but necessary for the future
direction of the Cubes.
Analytical Workspace
The biggest change is the Workspace – pluggable data-warehouse. You are no
longer limited to one one model, one type of data store (database) and one set
of cubes. The new Workspace is now framework-level controller object that
manages models (model sources), cubes and datastores. To the future more
features will be added to the workspace.
- Multiple models per workspace/server instead of only one
- Multiple backends per workspace/server instead of only one
- Multiple data stores per workspace/server instead of only one
Models can now be generated or converted on-the-fly from another service with
the new concept of Model Providers.
See also:
Workspace,
Providers
Model Objects Redesign
Notable change is addition of new object: Measure Aggregate. Cubes now
distinguishes between measures and aggregates. measure represents a
numerical fact property, aggregate represents aggregated value (applied
aggregate function on a property, or provided natively by the backend). This
new approach of aggregates makes development of backends and clients much
easier. There is no need to construct and guess aggregate measures or
splitting the names from the functions. Backends receive concrete objects with
sufficient information to perform the aggregation (either by a function or
fetch already computed value).
Now you can name the "record_count" as you like or you might not have it at
all, if you do not like it.
More info about model can be found in the
model documentation.
Other model changes:
- cardinality - metadata that helps front-end to determine which kind of UI
item to use or might restrict hich-cardinality queries
- dimension linking – cubes can specify how the dimensions are going to be
linked: specify what hierarchies are relevant to the cube, how what is the
cardinality of dimension in the context of the cube and more.
- roles dimensions and levels can have roles – metadata that might make
dims/levels be handled in a special way. Currently only the
time
role is
implemented.
HTTP API Changes
The server end-points have changed. Concept of global model was dropped, now
there is just list of cubes. The front-end should approach the server in two
steps:
- Get list of cubes with
/cubes
– only basic information, no structure
metadata
- Get full cube model with
/cube/NAME/model
Other changes:
- Many actions now accept
format=
parameter, which can be json
, csv
or json_lines
(new-line separated JSON).
- Cuts for date dimension accepts named relative time references such as
cut=date:90daysago-today
- Dimension path elements can contain special characters if they are escaped
by a backslash such as
cut=city:Nové\ Mesto
More info
Backends
New backends:
- MongoDB (thanks to Robin Thomas)
- full implementation of the Slicer backend
- Mixpanel
- Google Analytics
With model providers you can easily create a backend for any other service
which serves cube-like data and plug it into your data warehouse.
SQL Features
Notable addition to the SQL backend are outer joins (finally!): three types of
joins were added to the SQL backend: match (inner), master (left outer) and
detail (right outer).
More info about the SQL
features.
Non-additive
Provisional semi-additive time dimension support was added. An aggregate can
specify that it is non-additive through the time dimension (such as account
amount snapshots) and the generated query will handle the situation by
choosing the latest entry used.
The initial metadata infrastructure is in place. More flexible implementation
that will include other dimensions and element selection functions will be
introduced in the future releases.
Credit goes to Robin Thomas for this feature.
Authentication and Authorization
Authentication at the server level and authorization at the workspace level.
The interface is extensible, so you can implement any method you like. The
built-in methods are pretty simple:
permissive authentication methods: pass-parameter – just pass api_key
parameter in the URL or Basic HTTP proxy – using username, ignoring password
(there is one for testing purposes called "adminadmin" ...)
authorization: JSON configuration for roles (inheritable) and rights.
The authorization has two parts: access to the cube and restriction cell for a cube.
More info about authorization
Creating an auth extension
Visualizer
Cubes comes with a built-in Visualizer – a web app for visualizing cubes data
over time series. Main features: drill-down, filtering, many chart options,
connects to any cubes server. The application was developed by Robin Thomas
and Ryan Berlew.
Instructions
About the Release
This release is a milestone in Cubes interface: the model metadata structure
and the slicer API. They are very unlikely to be changed, may be slighly
adjusted with maintaining backward compatibility or at least some kind of
conversion tools.
Things that might change, due to planned refactoring:
- Python interface – mostly Workspace and model compilation
- Localization – definition of model localization
- Extensions interface - which methods the extensions should implement and how
- Configuration – slight changes in the slicer.ini sections
The above changes will be stabilized around
v1.1
or v1.2 release.
To sum it up: it is safe to build applications on top of the Slicer/HTTP
interface and it is safe to generate models to be used by cubes.
Credits
Many thanks to Robin Thomas and Ryan
Berlew for major code contributions and for the
Visualizer. Credit also goes to
Jose Juan Montes,
Tomas Levine and
Byron Yi.
Links
Read the detailed list of changes.
Important note: The cubes repository has moved to the Data
Brewery github organization group (read
more).
Most recent sources can be found on github.
Questions, comments, suggestions for discussion can be posted to the
Cubes Google Group for discussion, problem solving and announcements.
Submit issues and suggestions
on github
IRC channel #databrewery on irc.freenode.net
2013-08-02 by Stefan Urbanek
Expressions is a lightweight arithmetic expression parser for creating simple
arithmetic expression compilers.
Goal is to provide minimal and understandable interface for handling
arithmetic expressions of the same grammar but slightly different dialects
(see below). The framework will stay lightweight and it is unlikely that it
will provide any more complex gramatical constructs.
Parser is hand-written to avoid any dependencies. The only requirement is
Python 3.
Source: github.com/Stiivi/expressions
Features
The expression is expected to be an infix expression that might contain:
- numbers and strings (literals)
- variables
- binary and unary operators
- function calls with variable number of arguments
The compiler is then used to build an object as a result of the compilation of
each of the tokens.
Dialects
Grammar of the expression is fixed. Slight differences can be specified using
a dialect
structure which contains:
- list of operators, their precedence and associativeness
- case sensitivity (currently used only for keyword based operators)
Planned options of a dialect that will be included in the future releases:
- string quoting characters (currently single
'
and double "
quotes)
- identifier quoting characters (currently unsupported)
- identifier characters (currently
_
and alpha-numeric characters)
- decimal separator (currently
.
)
- function argument list separator (currently comma
,
)
Use
Intended use is embedding of customized expression evaluation into an
application.
Example uses:
- Variable checking compiler with an access control to variables.
- Unified expression language where various other backends are possible.
- Compiler for custom object structures, such as for frameworks providing
functional-programing like interface.
How-to
Write a custom compiler class and implement methods:
compile_literal
taking a number or a string object
compile_variable
taking a variable name
compile_operator
taking a binary operator and two operands
compile_unary
taking an unary operator and one operand
compile_function
taking a function name and list of arguments
Every method receives a compilation context which is a custom object passed to
the compiler in compile(expression, context)
call.
The following compiler re-compiles an expression back into it's original form
with optional access restriction just to certain variables specified as the
compilation context:
class AllowingCompiler(Compiler):
def compile_literal(self, context, literal):
return repr(literal)
def compile_variable(self, context, variable):
"""Returns the variable if it is allowed in the `context`"""
if context and variable not in context:
raise ExpressionError("Variable %s is not allowed" % variable)
return variable
def compile_operator(self, context, operator, op1, op2):
return "(%s %s %s)" % (op1, operator, op2)
def compile_function(self, context, function, args):
arglist = ", " % args
return "%s(%s)" % (function, arglist)
Create a compiler instance and try to get the result:
compiler = AllowingCompiler()
result = compiler.compile("a + b", context=["a", "b"])
a = 1
b = 1
print(eval(result))
The output would be 2
as expected. The following will fail:
result = compiler.compile("a + c")
For more examples, such as building a SQLAlchemy structure
from an expression, see the examples folder.
Summary
Source: github.com/Stiivi/expressions
If you have any questions, comments, requests, do not hesitate to ask.
2013-06-22 by Stefan Urbanek
After a while of silence, here is first release of Bubbles – virtual data
objects framework.
Motto: Focus on the process, not the data technology
Here is a short presentation:
Introduction
I have started collecting functionality from various private data frameworks
into one, cleaning the APIs in the process.
Priorities of the framework are:
- understandability of the process
- auditability of the data being processed (frequent use of metadata)
- usability
- versatility
Working with data:
- keep data in their original form
- use native operations if possible
- performance provided by technology
- have options
Bubbles is performance agnostic at the low level of physical data
implementation. Performance should be assured by the data technology and
proper use of operations.
What bubbles is not?
- Numerical or statistical data tool. Use for example
Pandas instead.
- Datamining tool. It might provide data mining functionality in some sense,
but that will be provided by some other framework. For now, use
- All-purpose SQL abstraction framework. It provides operations on top of SQL,
but is not covering all the possibilities. Use Scikit Learn
SQLAlchemy for special constructs.
Data Objects and Representations
Data object might have one or multiple representations. A SQL table might act
as python iterator or might be composed as SQL statement. The more abstract
and more flexible representation, the better. If representations can be
composed or modified at metadta level, then it is much better than streaming
data all around the place.
Operations
Functionality of Bubbles is provided by operations. Operation takes one or
more objects as operands and set of parameters that affect the operation.
There are multiple versions of the operation – for various object
representations. Which operation is used is decided during runtime. For
example: if there is a SQL and iterator version and operand is SQL, then SQL
statement composition will be used.
Implementing custom operations is easy through an @operation
decorator.
I will be talking about them in detail in one of the upcoming blog posts.
Here is a list:
Bubbles (Brewery2) - Operations by Stefan Urbanek
Epilogue
Bubbles comes as Python 3.3 framework. There is no plan to have Python 2
back-port.
Bubbles will follow: 'provide mechanisms, not policies' rule as much as it
will be possible. Even some policies are introduced at the early stages of the
framework, such as operation dispatch or graph execution, they will be opened
later for custom replacement.
Version 0.2 is already planned and will contain:
- processing graph – connected nodes, like in the old Brewery
- more basic backends, at least Mongo and some APIs
- bubbles command line tool
Links
Sources can be found on github.
Read the documentation.
Join the Google Group for discussion, problem solving and announcements.
Submit issues and suggestions on github
IRC channel #databrewery on irc.freenode.net
If you have any questions, comments, requests, do not hesitate to ask.
2013-02-20 by Stefan Urbanek
After few months and gloomy winter nights, here is a humble update of the
Cubes light weight analytical framework. No major feature additions nor
changes this time, except some usability tweaks and fixes.
Documentation was updated to contain relational database patterns for SQL
backend. See the schemas, models and illustrations in the official
documentation.
Also improvements in cross-referencing various documentation parts through
see-also for having relevant information at-hand.
Thanks and credits for support and patches goes to:
- Jose Juan Montes (@jjmontesl)
- Andrew Zeneski
- Reinier Reisy Quevedo Batista (@rquevedo)
Summary
- many improvements in handling multiple hierarchies
- more support of multiple hierarchies in the slicer server either as
parameter or with syntax
dimension@hierarchy
:
- dimension values:
GET /dimension/date?hierarchy=dqmy
- cut: get first quarter of 2012
?cut=date@dqmy:2012,1
- drill-down on hierarchy with week on implicit (next) level:
?drilldown=date@ywd
- drill-down on hierarchy with week with exlpicitly specified week level:
?drilldown=date@ywd:week
- order and order attribute can now be specified for a Level
- optional safe column aliases (see docs for more info) for databases that
have non-standard requirements for column labels even when quoted
New Features
- added
order
to Level object - can be asc
, desc
or None for unspecified
order (will be ignored)
- added
order_attribute
to Level object - specifies attribute to be used for
ordering according to order
. If not specified, then first attribute is
going to be used.
- added hierarchy argument to
AggregationResult.table_rows()
str(cube)
returns cube name, useful in functions that can accept both cube
name and cube object
- added cross table formatter and its HTML variant
GET /dimension
accepts hierarchy parameter
- added
create_workspace_from_config()
to simplify workspace creation
directly from slicer.ini file (this method might be slightly changed in the
future)
to_dict()
method of model objects now has a flag create_label
which
provides label attribute derived from the object's name, if label is missing
- Issue #95: Allow charset to be specified in Content-Type header
SQL:
- added option to SQL workspace/browser
safe_labels
to use safe column
labels for databases that do not support characters like .
in column names
even when quoted (advanced feature, does not work with denormalization)
- browser accepts include_cell_count and include_summary arguments to
optionally disable/enable inclusion of respective results in the aggregation
result object
- added implicit ordering by levels to aggregate and dimension values methods
(for list of facts it is not yet decided how this should work)
- Issue #97: partially implemented sort_key, available in
aggregate()
and
values()
methods
Server:
- added comma separator for
order=
parameter
- reflected multiple search backend support in slicer server
Other:
- added vim syntax highlighting goodie
Changes
- AggregationResult.cross_table is depreciated, use cross table formatter
instead
load_model()
loads and applies translations
- slicer server uses new localization methods (removed localization code from
slicer)
- workspace context provides proper list of locales and new key 'translations'
- added base class Workspace which backends should subclass; backends should
use workspace.localized_model(locale)
create_model()
accepts list of translations
Fixes
- browser.set_locale() now correctly changes browser's locale
- Issue #97: Dimension values call cartesians when cutting by a different
dimension
- Issue #99: Dimension "template" does not copy hierarchies
Links
Sources can be found on github.
Read the documentation.
Join the Google Group for
discussion, problem solving and announcements.
Submit issues and suggestions on github
IRC channel #databrewery on irc.freenode.net
If you have any questions, comments, requests, do not hesitate to ask.
2012-12-09 by Stefan Urbanek
Quick Summary:
- multiple hierarchies:
- Python:
cut = PointCut("date", [2010,15], hierarchy='ywd')
(docs)
- Server:
GET /aggregate?cut=date@ywd:2010,15
(see docs - look for aggregate
documentation)
- Server drilldown:
GET /aggregate?drilldown=date@ywd:week
- added result formatters (experimental! API might change)
- added pre-aggregations (experimental!)
New Features
- added support for multiple hierarchies
- added
dimension_schema
option to star browser – use this when you have
all dimensions grouped in a separate schema than fact table
- added
HierarchyError
- used for example when drilling down deeper than
possible within that hierarchy
- added result formatters: simple_html_table, simple_data_table, text_table
- added create_formatter(formatter_type, options ...)
AggregationResult.levels
is a new dictionary containing levels that the
result was drilled down to. Keys are dimension names, values are levels.
AggregationResult.table_rows()
output has a new variable is_base
to denote
whether the row is base or not in regard to table_rows dimension.
-
added create_server(config_path)
to simplify wsgi script
-
added aggregates: avg, stddev and variance (works only in databases that
support those aggregations, such as PostgreSQL)
-
added preliminary implemenation of pre-aggregation to sql worskspace:
create_conformed_rollup()
create_conformed_rollups()
create_cube_aggregate()
Server:
- multiple drilldowns can be specified in single argument:
drilldown=date,product
- there can be multiple
cut
arguments that will be appended into single cell
- added requests:
GET /cubes
and GET /cube/NAME/dimensions
Changes
- Important: Changed string representation of a set cut: now using
semicolon ';' as a separator instead of a plus symbol '+'
- aggregation browser subclasses should now fill result's
levels
variable
with coalesced_drilldown()
output for requested drill-down levels.
- Moved coalesce_drilldown() from star browser to cubes.browser module to be
reusable by other browsers. Method might be renamed in the future.
- if there is only one level (default) in a dimension, it will have same label
as the owning dimension
- hierarchy definition errors now raise ModelError instead of generic
exception
Fixes
- order of joins is preserved
- fixed ordering bug
- fixed bug in generating conditions from range cuts
AggregationResult.table_rows
now works when there is no point cut
- get correct reference in
table_rows
– now works when simple denormalized
table is used
- raise model exception when a table is missing due to missing join
- search in slicer updated for latest changes
- fixed bug that prevented using cells with attributes in aliased joined
tables
Links
Sources can be found on github.
Read the documentation.
Join the Google Group for discussion, problem solving and announcements.
Submit issues and suggestions on github
IRC channel #databrewery on irc.freenode.net
If you have any questions, comments, requests, do not hesitate to ask.