2012-04-04 by Stefan Urbanek

Brewery 0.8 Released

I'm glad to announce new release of Brewery – stream based data auditing and analysis framework for Python.

There are quite a few updates, to mention the notable ones:

  • new brewery runner with commands run and graph
  • new nodes: pretty printer node (for your terminal pleasure), generator function node
  • many CSV updates and fixes

Added several simple how-to examples, such as: aggregation of remote CSV, basic audit of a CSV, how to use a generator function. Feedback and questions are welcome. I'll help you.

Note that there are couple changes that break compatibility, however they can be updated very easily. I apologize for the inconvenience, but until 1.0 the changes might happen more frequently. On the other hand, I will try to make them as painless as possible.

Full listing of news, changes and fixes is below.

Version 0.8

News

  • Changed license to MIT
  • Created new brewery runner commands: 'run' and 'graph':
    • 'brewery run stream.json' will execute the stream
    • 'brewery graph stream.json' will generate graphviz data
  • Nodes: Added pretty printer node - textual output as a formatted table
  • Nodes: Added source node for a generator function
  • Nodes: added analytical type to derive field node
  • Preliminary implementation of data probes (just concept, API not decided yet for 100%)
  • CSV: added empty_as_null option to read empty strings as Null values
  • Nodes can be configured with node.configure(dictionary, protected). If 'protected' is True, then protected attributes (specified in node info) can not be set with this method.

  • added node identifier to the node reference doc

  • added create_logger

  • added experimental retype feature (works for CSV only at the moment)

  • Mongo Backend - better handling of record iteration

Changes

  • CSV: resource is now explicitly named argument in CSV*Node
  • CSV: convert fields according to field storage type (instead of all-strings)
  • Removed fields getter/setter (now implementation is totally up to stream subclass)
  • AggregateNode: rename aggregates to measures, added measures as public node attribute
  • moved errors to brewery.common
  • removed field_name(), now str(field) should be used
  • use named blogger 'brewery' instead of the global one
  • better debug-log labels for nodes (node type identifier + python object ID)

WARNING: Compatibility break:

  • depreciate __node_info__ and use plain node_info instead
  • Stream.update() now takes nodes and connections as two separate arguments

Fixes

  • added SQLSourceNode, added option to keep ifelds instead of dropping them in FieldMap and FieldMapNode (patch by laurentvasseur @ bitbucket)
  • better traceback handling on node failure (now actually the traceback is displayed)
  • return list of field names as string representation of FieldList
  • CSV: fixed output of zero numeric value in CSV (was empty string)

Links

  • github sources: https://github.com/Stiivi/brewery
  • Documentation: http://packages.python.org/brewery/
  • Mailing List: http://groups.google.com/group/databrewery/
  • Submit issues here: https://github.com/Stiivi/brewery/issues
  • IRC channel: #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

2012-03-09 by Stefan Urbanek

Cubes 0.8 Released

Another minor release of Cubes - Light Weight Python OLAP framework is out. Main change is that backend is no longer hard-wired in the Slicer server and can be selected through configuration file.

There were lots of documentation changes, for example the reference was separated from the rest of docs. Hello World! example was added.

The news, changes and fixes are:

New Features

  • Started writing StarBrowser - another SQL aggregation browser with different approach (see code/docs)

Slicer Server:

  • added configuration option modules under [server] to load additional modules
  • added ability to specify backend module
  • backend configuration is in [backend] by default, for SQL it stays in [db]
  • added server config option for default prettyprint value (useful for demontration purposes)

Documentation:

  • Changed license to MIT + small addition. Please refer to the LICENSE file.
  • Updated documentation - added missing parts, made reference more readable, moved class and function reference docs from descriptive part to reference (API) part.
  • added backend documentation
  • Added "Hello World!" example

Changed Features

  • removed default SQL backend from the server
  • moved worskpace creation into the backend module

Fixes

  • Fixed create_view to handle not materialized properly (thanks to deytao)
  • Slicer tool header now contains #!/usr/bin/env python

Links

  • github sources: https://github.com/Stiivi/cubes
  • Documentation: http://packages.python.org/cubes/
  • Mailing List: http://groups.google.com/group/cubes-discuss
  • Submit issues here: https://github.com/Stiivi/cubes/issues
  • IRC channel: #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.

2012-02-23 by Stefan Urbanek

Cubes goes MIT license with small addition for SaaS

Cubes - The Lightweight Python OLAP Framework is now licensed under the MIT license with small addition. The full license is as follows:

Copyright (c) 2011-2012 Stefan Urbanek, see AUTHORS for more details

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

  • The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

  • If your version of the Software supports interaction with it remotely through a computer network, the above copyright notice and this permission notice shall be accessible to all users.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

The addition says, that if you use it as part of software as a service (SaaS) you have to provide the copyright notice in an about, legal info, credits or some similar kind of page or info box. That's all.

May it be like that? :-)

Updated Cubes sources are here, as usual.

Enjoy.

2012-02-21 by Stefan Urbanek

Cubes - Python OLAP Framework Architecture

What is inside the Cubes Python OLAP Framework? Here is a brief overview of the core modules, their purpose and functionality.

The lightweight framework Cubes is composed of four public modules:

  • model - Description of data (metadata): dimensions, hierarchies, attributes, labels, localizations.
  • browser - Aggregation browsing, slicing-and-dicing, drill-down.
  • backends - Actual aggregation implementation and utility functions.
  • server - WSGI HTTP server for Cubes

Model

Logical model describes the data from user’s or analyst’s perspective: data how they are being measured, aggregated and reported. Model is independent of physical implementation of data. This physical independence makes it easier to focus on data instead on ways of how to get the data in understandable form.

Cubes model is described by:

  • model object (doc)
  • list of cubes
  • dimensions of cubes (they are shared with all cubes within model) (doc) (doc)
  • hierarchies (doc) and hierarchy levels (doc) of dimensions (such as category-subcategory, country-region-city)
  • optional mappings from logical model to the physical model (doc)
  • optional join specifications for star schemas, used by the SQL denormalizing backend (doc)

There is a utility function provided for loading the model from a JSON file: load_model.

The model module object are capable of being localized (see Model Localization for more information). The cubes provides localization at the metadata level (the model) and functionality to have localization at the data level.

See also: Model Documentation

Browser

Core of the Cubes analytics functionality is the aggregation browser. The browser module contains utility classes and functions for the browser to work.

The module components are:

  • Cell – specification of the portion of the cube to be explored, sliced or drilled down. Each cell is specified by a set of cuts. A cell without any cuts represents whole cube.
  • Cut – definition where the cell is going to be sliced through single dimension. There are three types of cuts: point, range and set.

The types of cuts:

  • Point Cut – Defines one single point on a dimension where the cube is going to be sliced. The point might be at any level of hierarchy. The point is specified by "path". Examples of point cut: [2010] for year level of Date dimension, [2010,1,7] for full date point.
  • Range Cut – Defines two points (dimension paths) on a sortable dimension between whose the cell is going to be sliced from cube.
  • Set Cut – Defines list of multiple points (dimension paths) which are going to be included in the sliced cell.

Example of point cut effect:

The module provides couple utility functions:

  • path_from_string - construct a dimension path (point) from a string
  • string_from_path - get a string representation of a dimension path (point)
  • string_from_cuts and cuts_from_string are for conversion between string and list of cuts. (Currently only list of point cuts are supported in the string representation)

The aggregation browser can:

  • aggregate a cell (aggregate(cell))
  • drill-down through multiple dimensions and aggregate (aggregate(cell, drilldown="date"))
  • get all detailed facts within the cell (facts(cell))
  • get single fact (fact(id))

There is convenience function report(cell, report) that can be implemented by backend in more efficient way to get multiple aggregation queries in single call.

More about aggregated browsing can be found in the Cubes documentation.

Backends

Actual aggregation is provided by the backends. The backend should implement aggregation browser interface.

Cubes comes with built-in ROLAP backend which uses SQL database through SQLAlchemy. The backend has two major components:

  • aggregation browser that works on single denormalized view or a table
  • SQL denormalizer helper class that converts star schema into a denormalized view or table (kind of materialisation).

There was an attempt to write a Mongo DB backend, but it does not work any more, it is included in the sources only as reminder, that there should be a mongo backend sometime in the future.

Anyone can write a backend. If you are interested, drop me a line.

Server

Cubes comes with Slicer - a WSGI HTTP OLAP server with API for most of the cubes framework functionality. The server is based on the Werkzeug framework.

Intended use of the slicer is basically as follows:

  • application prepares the cell to be aggregated, drilled, listed... The cell might be whole cube.
  • HTTP request is sent to the server
  • the server uses appropriate aggregation browser backend (note that currently there is only one: SQL denormalized) to compute the request
  • Slicer returns a JSON reply to the application

For more information, please refer to the Cubes Slicer server documentation.

One more thing...

There are plenty things to get improved, of course. Current focus is not on performance, but on achieving simple usability.

The Cubes sources can be found on Github: https://github.com/stiivi/cubes . There is also a IRC channel #databrewery on irc.freenode.net (I try to be there during late evening CET). Issues can be reported on the github project page.

If you have any questions, suggestions, recommendations, just let me know.

HackerNews Thread

2012-02-14 by Stefan Urbanek

IRC Channel for Brewery and Cubes

I've opened an IRC channel on irc.freenode.net for Data Brewery and Cubes: #databrewery. I will be there mostly during CET daytime/late evening time. Check it out.

Previous Page | Next Page