2014-02-25 by Stefan Urbanek
Before the upcoming 1.0 release, I would like to introduce
Cubes core developer Robin
Thomas. Robin is experienced data warehouse engineer
with profound knowledge of OLAP and multidimensional modeling. Robin and his
team did a great job, contributed many new features and concepts.
We have quite a lot of new features and ideas thanks to Robin. Just to name a
- new, completely rewritten Mongo backend
- authorization and authentication
- non-additive time dimension
- statistical functions
and many more.
Thanks and credit goes also to: Brad
Willard, Mathew Thomas, Ryan Berlew,
Andrew Bartholomew and Emily Wagner.
In addition, I would like to thank Squarespace for
their kindness and for contributing their additions back to the community as
2014-02-20 by Stefan Urbanek
Here is a short presentation about the Cubes workspace changes:
Most recent Cubes sources can be found on github.
Read the development documentation.
2013-08-02 by Stefan Urbanek
Expressions is a lightweight arithmetic expression parser for creating simple
arithmetic expression compilers.
Goal is to provide minimal and understandable interface for handling
arithmetic expressions of the same grammar but slightly different dialects
(see below). The framework will stay lightweight and it is unlikely that it
will provide any more complex gramatical constructs.
Parser is hand-written to avoid any dependencies. The only requirement is
The expression is expected to be an infix expression that might contain:
- numbers and strings (literals)
- binary and unary operators
- function calls with variable number of arguments
The compiler is then used to build an object as a result of the compilation of
each of the tokens.
Grammar of the expression is fixed. Slight differences can be specified using
dialect structure which contains:
- list of operators, their precedence and associativeness
- case sensitivity (currently used only for keyword based operators)
Planned options of a dialect that will be included in the future releases:
- string quoting characters (currently single
' and double
- identifier quoting characters (currently unsupported)
- identifier characters (currently
_ and alpha-numeric characters)
- decimal separator (currently
- function argument list separator (currently comma
Intended use is embedding of customized expression evaluation into an
- Variable checking compiler with an access control to variables.
- Unified expression language where various other backends are possible.
- Compiler for custom object structures, such as for frameworks providing
functional-programing like interface.
Write a custom compiler class and implement methods:
compile_literal taking a number or a string object
compile_variable taking a variable name
compile_operator taking a binary operator and two operands
compile_unary taking an unary operator and one operand
compile_function taking a function name and list of arguments
Every method receives a compilation context which is a custom object passed to
the compiler in
compile(expression, context) call.
The following compiler re-compiles an expression back into it's original form
with optional access restriction just to certain variables specified as the
def compile_literal(self, context, literal):
def compile_variable(self, context, variable):
"""Returns the variable if it is allowed in the `context`"""
if context and variable not in context:
raise ExpressionError("Variable %s is not allowed" % variable)
def compile_operator(self, context, operator, op1, op2):
return "(%s %s %s)" % (op1, operator, op2)
def compile_function(self, context, function, args):
arglist = ", " % args
return "%s(%s)" % (function, arglist)
Create a compiler instance and try to get the result:
compiler = AllowingCompiler()
result = compiler.compile("a + b", context=["a", "b"])
a = 1
b = 1
The output would be
2 as expected. The following will fail:
result = compiler.compile("a + c")
For more examples, such as building a SQLAlchemy structure
from an expression, see the examples folder.
If you have any questions, comments, requests, do not hesitate to ask.
2013-06-22 by Stefan Urbanek
After a while of silence, here is first release of Bubbles – virtual data
Motto: Focus on the process, not the data technology
Here is a short presentation:
I have started collecting functionality from various private data frameworks
into one, cleaning the APIs in the process.
Priorities of the framework are:
- understandability of the process
- auditability of the data being processed (frequent use of metadata)
Working with data:
- keep data in their original form
- use native operations if possible
- performance provided by technology
- have options
Bubbles is performance agnostic at the low level of physical data
implementation. Performance should be assured by the data technology and
proper use of operations.
What bubbles is not?
- Numerical or statistical data tool. Use for example
- Datamining tool. It might provide data mining functionality in some sense,
but that will be provided by some other framework. For now, use
- All-purpose SQL abstraction framework. It provides operations on top of SQL,
but is not covering all the possibilities. Use Scikit Learn
SQLAlchemy for special constructs.
Data Objects and Representations
Data object might have one or multiple representations. A SQL table might act
as python iterator or might be composed as SQL statement. The more abstract
and more flexible representation, the better. If representations can be
composed or modified at metadta level, then it is much better than streaming
data all around the place.
Functionality of Bubbles is provided by operations. Operation takes one or
more objects as operands and set of parameters that affect the operation.
There are multiple versions of the operation – for various object
representations. Which operation is used is decided during runtime. For
example: if there is a SQL and iterator version and operand is SQL, then SQL
statement composition will be used.
Implementing custom operations is easy through an
I will be talking about them in detail in one of the upcoming blog posts.
Here is a list:
Bubbles (Brewery2) - Operations by Stefan Urbanek
Bubbles comes as Python 3.3 framework. There is no plan to have Python 2
Bubbles will follow: 'provide mechanisms, not policies' rule as much as it
will be possible. Even some policies are introduced at the early stages of the
framework, such as operation dispatch or graph execution, they will be opened
later for custom replacement.
Version 0.2 is already planned and will contain:
- processing graph – connected nodes, like in the old Brewery
- more basic backends, at least Mongo and some APIs
- bubbles command line tool
Sources can be found on github.
Read the documentation.
Join the Google Group for discussion, problem solving and announcements.
Submit issues and suggestions on github
IRC channel #databrewery on irc.freenode.net
If you have any questions, comments, requests, do not hesitate to ask.
2013-04-08 by Stefan Urbanek
Data Brewery home page was redesigned. I would like to thank Andrej Sykora who did a great job with the new look and migration of the old blog posts.
The main reason for redesign was providing more content for each project. Another one was to have it designed in a way that future projects can be easily added – by having one subdomain for each project.
Important: Blog Moving
The Data Brewery blog is moving away from Tumblr. New blog posts will be generated using Pelican to static pages. The base URL will stay the same: blog.databrewery.org.
The old blog URLs are being redirected to the new URLs. There are still few blog posts that need to be migrated, but we hope to have these finished soon.
If you are following the blog with a feeds reader, here is a link to the new feed.