2013-08-02 by Stefan Urbanek

Introducing Expressions for Python

Expressions is a lightweight arithmetic expression parser for creating simple arithmetic expression compilers.

Goal is to provide minimal and understandable interface for handling arithmetic expressions of the same grammar but slightly different dialects (see below). The framework will stay lightweight and it is unlikely that it will provide any more complex gramatical constructs.

Parser is hand-written to avoid any dependencies. The only requirement is Python 3.

Source: github.com/Stiivi/expressions

Features

The expression is expected to be an infix expression that might contain:

  • numbers and strings (literals)
  • variables
  • binary and unary operators
  • function calls with variable number of arguments

The compiler is then used to build an object as a result of the compilation of each of the tokens.

Dialects

Grammar of the expression is fixed. Slight differences can be specified using a dialect structure which contains:

  • list of operators, their precedence and associativeness
  • case sensitivity (currently used only for keyword based operators)

Planned options of a dialect that will be included in the future releases:

  • string quoting characters (currently single ' and double " quotes)
  • identifier quoting characters (currently unsupported)
  • identifier characters (currently _ and alpha-numeric characters)
  • decimal separator (currently .)
  • function argument list separator (currently comma ,)

Use

Intended use is embedding of customized expression evaluation into an application.

Example uses:

  • Variable checking compiler with an access control to variables.
  • Unified expression language where various other backends are possible.
  • Compiler for custom object structures, such as for frameworks providing functional-programing like interface.

How-to

Write a custom compiler class and implement methods:

  • compile_literal taking a number or a string object
  • compile_variable taking a variable name
  • compile_operator taking a binary operator and two operands
  • compile_unary taking an unary operator and one operand
  • compile_function taking a function name and list of arguments

Every method receives a compilation context which is a custom object passed to the compiler in compile(expression, context) call.

The following compiler re-compiles an expression back into it's original form with optional access restriction just to certain variables specified as the compilation context:

class AllowingCompiler(Compiler):
    def compile_literal(self, context, literal):
        return repr(literal)

    def compile_variable(self, context, variable):
        """Returns the variable if it is allowed in the `context`"""

        if context and variable not in context:
            raise ExpressionError("Variable %s is not allowed" % variable)

        return variable

    def compile_operator(self, context, operator, op1, op2):
        return "(%s %s %s)" % (op1, operator, op2)

    def compile_function(self, context, function, args):
        arglist = ", " % args
        return "%s(%s)" % (function, arglist)

Create a compiler instance and try to get the result:

compiler = AllowingCompiler()

result = compiler.compile("a + b", context=["a", "b"])
a = 1
b = 1
print(eval(result))

The output would be 2 as expected. The following will fail:

result = compiler.compile("a + c")

For more examples, such as building a SQLAlchemy structure from an expression, see the examples folder.

Summary

Source: github.com/Stiivi/expressions

If you have any questions, comments, requests, do not hesitate to ask.

2013-04-08 by Stefan Urbanek

Site Redesign, Leaving Tumblr

Data Brewery home page was redesigned. I would like to thank Andrej Sykora who did a great job with the new look and migration of the old blog posts.

Why?

The main reason for redesign was providing more content for each project. Another one was to have it designed in a way that future projects can be easily added – by having one subdomain for each project.

Important: Blog Moving

The Data Brewery blog is moving away from Tumblr. New blog posts will be generated using Pelican to static pages. The base URL will stay the same: blog.databrewery.org.

The old blog URLs are being redirected to the new URLs. There are still few blog posts that need to be migrated, but we hope to have these finished soon.

If you are following the blog with a feeds reader, here is a link to the new feed.

2012-06-12 by Stefan Urbanek

Cubes and Slicer are going to EuroPython 2012

Cubes is going to EuroPython 2012.

EDIT: Added "Need help?".

There are going to be two sessions. First there will be talk introducing to light-weight OLAP with Cubes and Slicer, on Friday at 9:45 in room Tagliatelle (add to calendar). Afterwards there will be longer, more in-depth and hands-on training about Slicing and Dicing with Cubes on Friday at 14:30 in room Pizza Napoli (add to calendar)

In the talk I will introduce the framework and explain reasons for it's existence. Then I will dig into architecture, features and briefly show examples how to use it for slicing and dicing. Newbies are welcome.

The training will go into more details and the participants will learn:

  • how to prepare data for aggregated browsing - star and snowflake schemas
  • how to create a logical model, define cubes, dimensions and hierarchies
  • how to browse aggregated data and how to slice and dice cubes from within Python
  • how to create a WSGI OLAP server ("in 15 minutes" style) for aggregated data browsing and how to use it in your web application for providing (browsable) data to end-user reports
  • how to provide localized reporting

If the time permits, we can look at the anatomy of the framework and see how to implement a backend for another kind of data store.

I will be focusing on the existing SQL (relational OLAP) backend.

Customized examples

You might use the training session (and not only the session) to solve your problem - just bring your own sample data, if you like.

Do you have any data that you would like to slice and dice? Have a database schema and do not know how to create a logical model? You can send me a data sample or a schema, so I can prepare examples based on problem you are solving.

Please, do not send any confidential data or schemas under NDA.

 

Need help?

If you have any questions or would like to help with your data: from data preparation, through data modeling to slicing and dicing. You can grab me during the whole event. If you can not find me, just tweet me: @Stiivi.

Participation

If anyone is interested in participating in the project, he is welcome. Here are some features that are either out of scope of my skills and I would like to cooperate with someone more professional, or I do not have available resources to do that:

I am also very open to new feature suggestions and feature change requests. Just little note: Cubes is meant to be small and simple. At least for now. There are plenty of complex and feature-rich solutions out there. If we can make new, more complex features as non-critical, optional plug-ins, that would be great.

Links and Calendar Events

You can add the talks to your calendar by following the links:

2012-05-29 by Stefan Urbanek

Cubes 0.9.1: Ranges, denormalization and query cell

The new minor release of Cubes – light-weight Python OLAP framework – brings range cuts, denormalization with the slicer tool and cells in /report query, together with fixes and important changes.

See the second part of this post for the full list.

Range Cuts

Range cuts were implemented in the SQL Star Browser. They are used as follows:

Python:

cut = RangeCut("date", [2010], [2012,5,10])
cut_hi = RangeCut("date", None, [2012,5,10])
cut_low = RangeCut("date", [2010], None)

To specify a range in slicer server where keys are sortable:

    http://localhost:5000/aggregate?cut=date:2004-2005
    http://localhost:5000/aggregate?cut=date:2004,2-2005,5,1

Open ranges:

    http://localhost:5000/aggregate?cut=date:2010-
    http://localhost:5000/aggregate?cut=date:2004,1,1-
    http://localhost:5000/aggregate?cut=date:-2005,5,10
    http://localhost:5000/aggregate?cut=date:-2012,5

Denormalization with slicer Tool

Now it is possible to denormalize tour data with the slicer tool. You do not have to denormalize using python script. Data are denormalized in a way how denormalized browser would expect them to be. You can tune the process using command line switches, if you do not like the defaults.

Denormalize all cubes in the model:

$ slicer denormalize slicer.ini

Denormalize only one cube::

$ slicer denormalize -c contracts slicer.ini

Create materialized denormalized view with indexes::

$ slicer denormalize --materialize --index slicer.ini

Example slicer.ini:

[workspace]
denormalized_view_prefix = mft_
denormalized_view_schema = denorm_views

# This switch is used by the browser:
use_denormalization = yes

For more information see Cubes slicer tool documentation

Cells in Report

Use cell to specify all cuts (type can be range, point or set):

{
    "cell": [
        {
            "dimension": "date",
            "type": "range",
            "from": [2010,9],
            "to": [2011,9]
        }
    ],
    "queries": {
        "report": {
            "query": "aggregate",
            "drilldown": {"date":"year"}
        }
    }
}

For more information see the slicer server documentation.

New Features

  • cut_from_string(): added parsing of range and set cuts from string; introduced requirement for key format: Keys should now have format "alphanumeric character or underscore" if they are going to be converted to strings (for example when using slicer HTTP server)
  • cut_from_dict(): create a cut (of appropriate class) from a dictionary description
  • Dimension.attribute(name): get attribute instance from name
  • added exceptions: CubesError, ModelInconsistencyError, NoSuchDimensionError, NoSuchAttributeError, ArgumentError, MappingError, WorkspaceError and BrowserError

StarBrowser:

  • implemented RangeCut conditions

Slicer Server:

  • /report JSON now accepts cell with full cell description as dictionary, overrides URL parameters

Slicer tool:

  • denormalize option for (bulk) denormalization of cubes (see the the slicer documentation for more information)

Changes

  • important: all /report JSON requests should now have queries wrapped in the key queries. This was originally intended way of use, but was not correctly implemented. A descriptive error message is returned from the server if the key queries is not present. Despite being rather a bug-fix, it is listed here as it requires your attention for possible change of your code.
  • warn when no backend is specified during slicer context creation

Fixes

  • Better handling of missing optional packages, also fixes #57 (now works without slqalchemy and without werkzeug as expected)
  • see change above about /report and queries
  • push more errors as JSON responses to the requestor, instead of just failing with an exception

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

2012-05-14 by Stefan Urbanek

Cubes 0.9 Released

The new version of Cubes – light-weight Python OLAP framework – brings new StarBrowser, which we discussed in previous blog posts:

The new SQL backend is written from scratch, it is much cleaner, transparent, configurable and open for future extensions. Also allows direct browsing of star/snowflake schema without denormalization, therefore you can use Cubes on top of a read-only database. See DenormalizedMapper and SnowflakeMapper for more information.

Just to name a few new features: multiple aggregated computations (min, max,...), cell details, optional/configurable denormalization.

Important Changes

Summary of most important changes that might affect your code:

Slicer: Change all your slicer.ini configuration files to have [workspace] section instead of old [db] or [backend]. Depreciation warning is issued, will work if not changed.

Model: Change dimensions in model to be an array instead of a dictionary. Same with cubes. Old style: "dimensions" = { "date" = ... } new style: "dimensions" = [ { "name": "date", ... } ]. Will work if not changed, just be prepared.

Python: Use Dimension.hierarchy() instead of Dimension.default_hierarchy.

New Features

  • slicer_context() - new method that holds all relevant information from configuration. can be reused when creating tools that work in connected database environment
  • added Hierarchy.all_attributes() and .key_attributes()
  • Cell.rollup_dim() - rolls up single dimension to a specified level. this might later replace the Cell.rollup() method
  • Cell.drilldown() - drills down the cell
  • create_workspace(backend,model, **options) - new top-level method for creating a workspace by specifying backend name. Easier to create browsers (from possible browser pool) programmatically. The backend name might be full module name path or relative to the cubes.backends, for example sql.star for new or sql.browser for old SQL browser.
  • get_backend() - get backend by name

  • AggregationBrowser.cell_details(): New method returning values of attributes representing the cell. Preliminary implementation, return value might change.

  • AggregationBrowser.cut_details(): New method returning values of attributes representing a single cut. Preliminary implementation, return value might change.

  • Dimension.validate() now checks whether there are duplicate attributes

  • Cube.validate() now checks whether there are duplicate measures or details

SQL backend:

  • new StarBrowser implemented:
    • StarBrowser supports snowflakes or denormalization (optional)
    • for snowflake browsing no write permission is required (does not have to be denormalized)
  • new DenormalizedMapper for mapping logical model to denormalized view
  • new SnowflakeMapper for mapping logical model to a snowflake schema
  • ddl_for_model() - get schema DDL as string for model
  • join finder and attribute mapper are now just Mapper - class responsible for finding appropriate joins and doing logical-to-physical mappings
  • coalesce_attribute() - new method for coalescing multiple ways of describing a physical attribute (just attribute or table+schema+attribute)
  • dimension argument was removed from all methods working with attributes (the dimension is now required attribute property)
  • added create_denormalized_view() with options: materialize, create_index, keys_only

Slicer tool/server:

  • slicer ddl - generate schema DDL from model
  • slicer test - test configuration and model against database and report list of issues, if any
  • Backend options are now in [workspace], removed configurability of custom backend section. Warning are issued when old section names [db] and [backend] are used
  • server responds to /details which is a result of AggregationBrowser.cell_details()

Examples:

  • added simple Flask based web example - dimension aggregation browser

Changes

  • in Model: dimension and cube dictionary specification during model initialization is depreciated, list should be used (with explicitly mentioned attribute "name") -- important
  • important: Now all attribute references in the model (dimension attributes, measures, ...) are required to be instances of Attribute() and the attribute knows it's dimension
  • removed hierarchy argument from Dimension.all_attributes() and .key_attributes()
  • renamed builder to denormalizer
  • Dimension.default_hierarchy is now depreciated in favor of Dimension.hierarchy() which now accepts no arguments or argument None - returning default hierarchy in those two cases
  • metadata are now reused for each browser within one workspace - speed improvement.

Fixes

  • Slicer version should be same version as Cubes: Original intention was to have separate server, therefore it had its own versioning. Now there is no reason for separate version, moreover it can introduce confusion.
  • Proper use of database schema in the Mapper

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

Next Page