2013-06-22 by Stefan Urbanek

Introducing Bubbles – virtual data objects framework

After a while of silence, here is first release of Bubbles – virtual data objects framework.

Motto: Focus on the process, not the data technology

Here is a short presentation:

Bubbles – Virtual Data Objects from Stefan Urbanek

Introduction

I have started collecting functionality from various private data frameworks into one, cleaning the APIs in the process.

Priorities of the framework are:

  • understandability of the process
  • auditability of the data being processed (frequent use of metadata)
  • usability
  • versatility

Working with data:

  • keep data in their original form
  • use native operations if possible
  • performance provided by technology
  • have options

Bubbles is performance agnostic at the low level of physical data implementation. Performance should be assured by the data technology and proper use of operations.

What bubbles is not?

  • Numerical or statistical data tool. Use for example Pandas instead.
  • Datamining tool. It might provide data mining functionality in some sense, but that will be provided by some other framework. For now, use
  • All-purpose SQL abstraction framework. It provides operations on top of SQL, but is not covering all the possibilities. Use Scikit Learn SQLAlchemy for special constructs.

Data Objects and Representations

Data object might have one or multiple representations. A SQL table might act as python iterator or might be composed as SQL statement. The more abstract and more flexible representation, the better. If representations can be composed or modified at metadta level, then it is much better than streaming data all around the place.

Operations

Functionality of Bubbles is provided by operations. Operation takes one or more objects as operands and set of parameters that affect the operation. There are multiple versions of the operation – for various object representations. Which operation is used is decided during runtime. For example: if there is a SQL and iterator version and operand is SQL, then SQL statement composition will be used.

Implementing custom operations is easy through an @operation decorator.

I will be talking about them in detail in one of the upcoming blog posts.

Here is a list:

Bubbles (Brewery2) - Operations by Stefan Urbanek

Epilogue

Bubbles comes as Python 3.3 framework. There is no plan to have Python 2 back-port.

Bubbles will follow: 'provide mechanisms, not policies' rule as much as it will be possible. Even some policies are introduced at the early stages of the framework, such as operation dispatch or graph execution, they will be opened later for custom replacement.

Version 0.2 is already planned and will contain:

  • processing graph – connected nodes, like in the old Brewery
  • more basic backends, at least Mongo and some APIs
  • bubbles command line tool

Links

Sources can be found on github. Read the documentation.

Join the Google Group for discussion, problem solving and announcements.

Submit issues and suggestions on github

IRC channel #databrewery on irc.freenode.net

If you have any questions, comments, requests, do not hesitate to ask.