Wednesday, November 19, 2008

Refactoring script for CLAM networks

Now that the CLAM NetworkEditor has become such a convenient tool to define audio processing systems, we started to use it massively instead of C++ code to integrate processings. Even thought, we started to find that whenever we change a configuration structure, a port/control name, or even a processing class name, the networks xml had to be edited by hand because they didn't load. That problem led us to avoid such kind of changes, which is not a sane option. 'Embrace change', agile gurus said, and so we did.

We have developed a python tool to support network refactoring. It can be used, both as a python module, or as a command line tool, to batch modify CLAM network files using high level operations such as:


  • Renaming a processing class name
  • Renaming a processing name
  • Renaming processing ports or controls
  • Renaming a processing configuration parameter
  • Removing/adding/reorder configuration parameters
  • Setting configuration parameters values

The script just does XPath navigation and DOM manipulations in order to know which pieces need to be changed. Each high level command is just a couple of lines in python.

We are starting to use it to:

  • Adapt to changes in the C++ code
  • Change network configuration parameters in batch
  • Provide migration scripts for user's networks

About the last point, we plan the next release to provide network migration scripts containing a command set such as:
  
ensureVersion 1.3
renameClass OutControlSender NewClassName
renameConnector AudioMixer inport "Input 0" "New Port Name 0"
renameConfig AudioMixer NumberOfInPorts NInputs
setConfigByType AudioMixer NInputs 14
upgrade 1.3.2


Still some short term TODO's:

  • Include the clam version in the network xml so that the ensureVersion and upgrade commands work.
  • Integrating also Qt Designer UI files in the refactorings
  • Add some other commands as they are needed

Happy CLAM networks refactoring!

Friday, November 14, 2008

Managing Audio Back2Back Tests

So long since last post, and a lot of things to explain (GSoC results, GSoC Mentor Submit, QtDevDays, CLAM network refactoring script, typed controls...). But, let's start explaining some work we did on a Back-to-back system we recently deployed for CLAM and our 3D acoustic project in Barcelona Media.

Back-to-back testing background



You, extreme programmer, might want to have unit tests (white box testing) for every single line of code you write. But, sometimes, this is a hard thing to achieve. For example, canonical test cases for audio processing algorithms that exercise a single piece of code are very hard to find. You might also want to take control of a piece of untested code in order to refactor it without introducing new bugs. In all those cases back-to-back tests are your most powerful tool.

Back-to-back tests (B2B) are black box tests that compare the output of a reference version of an algorithm with the output of an evolved version, given the same set of inputs. When a back-to-back test fails, it means that something changed but normally it doesn't give you any more information than that. If the change was expected to alter the output, you must revalidate the new output again and make it the new reference. But if the alteration was not expected, you should either roll-back the change or fix it.

In back-to-back tests there is no truth to be asserted. You just rely on the fact that the last version was OK. If b2b tests get red because an expected change of behaviour but you don't validate the new results, you will loose any control on posterior changes. So, is very important to keep them green or validating any new correct result. Because of that, B2B tests are very helpful to be used in combination of a continuous integration system such as TestFarm, that can point you to the guilty commit even if further commits have been done.

CLAM's OfflinePlayer is very convenient to do back2back testing of CLAM networks. It runs them off-line specifying some input and outputs wave files. Automate the check by subtracting the output with a reference file and checking the level against a threshold, and you have a back-to-back test.

But still maintaining the outputs up-to-date is hard. So, we have developed a python module named audiob2b.py that makes defining and maintaining b2b test on audio very easy.

Defining a b2b test suite


A test suite is defined by defining back-to-back data path, and a list of test cases, each one defining a name, a command line and a set of outputs to be checked:

#!/usr/bin/env python
# back2back.py
from audiob2b import runBack2BackProgram
data_path="b2b/mysuite"
back2BackTests = [
("testCase1",
"OfflinePlayer mynetwork.clamnetwork b2b/mysuite/inputs/input1.wav -o output1.wav output2.wav"
, [
"output1.wav",
"output2.wav",
]),
# any other testcases there
]
runBack2BackProgram(data_path, sys.argv, back2BackTests)




Notice that this example uses OfflinePlayer but, as you write the full command line, you are not just limited to that. Indeed for 3D acoustics algorithms we are testing other programs that also generate wave files.

Back-to-back work flow


When you run the test suite the first time (./back2back.py without parameters) there is no reference files (expectation) and you will get a red. Current outputs will be copied into the data path like that:
b2b/mysuite/testCase1_output1_result.wav
b2b/mysuite/testCase1_output2_result.wav
...

After validating that the outputs are OK, you can accept a test case by issuing:
$ ./back2back.py --validate testCase1
The files will be moved as:
b2b/mysuite/testCase1_output1_expected.wav
b2b/mysuite/testCase1_output2_expected.wav
...

And the next time you run the tests, they will be green. At this point you can add and commit the 'expected' files on the data repository.

Whenever the output is altered in a sensible way and you get a red, you will have again the '_result' files and also some '_diff' files so that you can easily check the difference. All those files will be cleaned as soon you validate them or you get back the old results.
So the main benefit of that is that the expectation files management is almost automated so it is easier to maintain them in green.

Supporting architecture differences


Often the same algorithm provides slightly different values depending on the architecture you are running on, mostly because different precision (ie. 32 vs. 64 bits) or different implementations of the floating point functions.

Having back-to-back tests changing all the time depending on which platform you run them is not something desirable. The audiob2b module generate platform dependant expectations by validating them with the --arch flag. Platform dependant expectations are used instead the regular ones just if the ones for the current platform are found.

Future


The near future of the tool is just being used. We should extend the set of controlled networks and processing modules in CLAM. So I would like to invite other CLAM contributors to add more back2back's. Place your suite data in 'clam-test-data/b2b/'. We should decide where the suite definitions themselves should be placed. Maybe somewhere in CLAM/test but it won't be fair because dependencies on NetworkEditor and maybe in plugins.

Also a feature that would extend the kind of code we control with back-to-back, would be supporting file types other than wave files such as plain text files, or XML files (some kind smarter than just plain text). Any ideas? Comments?

Tuesday, July 22, 2008

Exterminating signals and slots

I recently posted at the clam devel wiki some advices, conventions, traps and tips on programing with Qt within CLAM addressed to the GSoC students. The most painful trap is the one of gratuitous signals and slots usage. Signals and slots is a very powerful way of designing independent components that comunicate each other with low coupling. Is that powerful that it is very tempting for novices to use them everywhere and this can turn very harmful for you.

If a connected slot doesn't exist, you just get an error console on run-time, checks on slots are that soft, and this is bad. Also the signal slot resolution is more expensive than just a method call. But the main reason to avoid them is that they make the code very hard to follow. When you see a signal 'emit' you have to find where such signal is connected in order to find which are the objects and slots that will be called. Multiply this process by nearly 200 signal emisions that were done at the vmqt library Annotator is using and you'll understand why such components although very smartly designed in structure, they are very hard to maintain and use.

You should emit signals just when you don't know which is the receiver of an event. If you can guess the receiver, there is no use for signals. On the other side, if you cannot access the emitter code, or you don't want to couple it, then there is room for a 'connect'.

A bad smell for noticing that you are over using sigslots is having in a class you are writting a connection like this:

connect(this, signal, knownWidget, slot)

and later, maybe in a different method:
emit signal()

When that known, and later forgotten, object is the only one you are connecting to the slot you might want to just keep a reference to it and just call the slot instead of emitting the signal.
knownWidget->slot()

which is faster, compile time checked and much more traceable.

Given that bad smell locator at hand i decided to do a fast review of vmqt module. vmqt is the successor of the many Visualization Modules we had in CLAM, just annother definitive VM rewrite. I tried to use it several times, but it is very hard to have a global view of what it does because all the program flow is driven by signal connections. No joke: 180 signal emissions and a similar number of 'connect's in 30 classes.

After a first review, 110 signals emisions have been avoided just by having a pointer to the Plot2D at the Renderers (the objects that represent drawing layers of the plot such as the lines, the playhead, the grid...). Having a pointer to it, Renderers can do a direct call to the Plot2D slots (now regular functions) instead of emitting a signal whose only receiver is the plot.

70 'emit's are still wandering around. Some of them communicate the plot with the wplot (a widget containing the plot and other elements such as the sliders, the rulers...) so they are used to syncronize. Those are likely to be removed in a similar way than for renderers.

The rest are specific renderer signals that are connected and propagated by the specific wplot. Those are harder to remove and indeed they are very convenient to keep.

As the extermination goes on, one can better see how to really take profit of the nice vmqt module structure and which aspects can be enhanced.

Wednesday, May 21, 2008

A configuration idiom for Python scripts

For our Python scripts we have used a fair number of idioms and variations such as importing a config.py file with the parameters as module global vars or importing the content of a dictionary. A different one we used lately is very useful and simple:


# This should be your program using the config

import os
class config :
#Here we define the default parameters
paramA = "value A"
paramC = "value C"
class category1 :
paramC="value sub C"

#Here we load new values from files, if they exist
for configFile in [
"/usr/share/myprog/defaultconfig",
"/etc/myprog.conf",
"~/.myprog/config",
"myconfig.conf",
] :
if not os.access(configFile,os.R_OK) : continue
execfile(configFile)

# That's the config params usage. Pretty!
print config.paramA
print config.paramB
print config.paramC
print config.category1.paramC
print "This is a template: %(paramC)s" % config.category1.__dict__


If you write in the config file this:

# myconfig.conf
paramA="new value A"
paramB="new value B"
category1.paramC="new subCvalue"
paramB+=" extension " + paramA

you'll get this

new value A
new value B extension new value A
value C
new subCvalue
This is a template: new subCvalue


The config file is read as you were adding code at the same indentation level you have on the 'execfile' call.

Notice the advantages:

  • Your config file looks like var assignations

  • You can use inner classes to build up categories

  • You can have a list of configuration locations with different precedence

  • You can include almost whatever python code

  • You can do templating with the params getting a dictionary like config.__dict__ or config.category1.__dict__

  • You can put config checking code after loading.


Be carefull on unreliable contexts:

  • Malicious config files can include almost whatever python code

  • Config syntax errors crash the program (i guess that can be solved)

  • Config files may add any new attribute, category, or method you didn't have


But if you are just managing your own utility scripts like us, that idiom is fantastic.

Tuesday, March 18, 2008

Command of the day: hotspot2calltree

Today, the command of the day is hotspot2calltree.

I do like a lot using kcachegrind to tune my C++ code. You can use KCacheGrind to navigate through the actual function calls in a given execution of your program and seeing very graphically where the time is spent.

Today i needed to optimize some python code but that's not C++ code. No problem. Add in your code this:


import hotshot
prof = hotshot.Profile("profile.hotspot")
prof.runcall(myFunction)
prof.close()

And then, at the shell:

sudo apt-get install kcachegrind-converters
hotspot2calltree -o profile.callgrind profile.hotspot
kcachegrind profile.callgrind

And now you get a nice kcachegrind profile you can navigate on.

Wednesday, March 5, 2008

Preparing for GSoC 2008

GSoC 2008 is already here! We are preparing our submision for CLAM as organization and I hope we are as lucky as last year. For GSoC 2007 we got 6 fervent students who pushed CLAM a big step forward. We still don't know whether we will be selected as organization or not. We haven't even filled the submision data. But it is time to trigger some resorts. So, what to do now?

If you are an experienced CLAM developer, please consider becoming a mentor. The more mentors the more students we can cope with.

If you are a user, is the time to push your favourite feature into the GSoC project proposals.

If you are an student wanting to be part of the program, I advice you to get involved with the project from now as we will consider early involvement a big plus for eligibility.

If you are Xavi, Pau or myself, then you should fill CLAM submision instead of blogging ;-)

I love summer.

Sunday, February 10, 2008

Comand of the day: kig

Each task has a tool. With graphics and figures too. If i want to edit a photograph or to perform some nice effects on an existing image, that's a task for Gimp. If I need some cute artwork for icons, banners, web... I prefer to vector with my so loved Inkscape. When such drawings are not so artistical and need some related and formal figures and diagrams, i normally consider dia, limited but correct. If i want to plot a complex graph i let the task to graphviz's dot and i just declare the vertexs. If I want to plot some program data output, python plus matplotlib is the faster way to process and render it. Automating execution to visualization process is very convenient. Some time ago I used gnuplot for that but python is more flexible towards data formats. Often what you need is to explore such data, interactive visualization can be revealing with tools such as qtiplot.

But my problem today wasn't none of the above but drawing some geometric diagrams (angles, vectors, tangents...) to illustrate some trigonometric equations. I was thinking on QCad but Kig did perfectly. Kig is about doing geometrical manipulation: intersections, angle transportation, angle mesurements, solidarized elements, python scripting...



I was to integrate kig png exporting into WiKo figure generation but i found two show-stoper problems: Command line options for batch exporting seems not to work and there is no way to control the viewport but by resizing the windows and controlling the zoom. While the concept of limited size canvas exists, I didn't find a way to resize it :-( Anyway is worth to dig in such a tool even doing patches to have batch exporting working.

Thursday, January 31, 2008

Impressions on Django (II): Development environment

On my previous post, I explained some basic ideas on how Django works. The post explained, mostly the programming model which is a clever implementation of the classical Model-View-Controller pattern. Not that new. But compared to other web development environments I worked with (PHP, Zope/Plone or plain mod_python), Django is a clear improvement because is both simple and pragmatical.

The programming model is something that has a direct impact on how you design the product, but there are other factors that determine how easy a developer can develop within a platform. In this post I arge about such factors, how Django afects the development workflow.

Editing and controlling source code

Compared to Zope, another python web application environment I've been fighting against, Django is a clear step forward to control your work.

First of all, the developer has the control of the source. Files are stored on the file system! It may seem an obvious assertion, but, if you have been using Zope you will know why I am so happy with that: Zope forces you to store all the code into a large binary file which acts as virtual filesystem. You cannot use a regular version control system such subversion to control your changes. ...and worst, it forces you to input code in web forms!! Argh!! Django ends with such a nightmare. Files are back to the file system and to your preferred editor.

Modularity

Still Django applications are modular in the same sense that Zope: you can combine several applications (modules) on a single site. For example, the administration interface is an application itself that can be used to edit any model of any installed application. You can disable it or enable it as a whole or for specific models. There is also an authentication application that can be used to transversally control the access to your application features. Every other application can use the authentication application to obtain information about validated users and to limit the access to certain features depending on the user profile. Session management and per-session storage, RSS feeds, SiteMap files, data mining interface... they all come for free, provided as applications. Besides the included applications, a nice application repository is available.

Debug cycle and exploration

Debugging your code is also very straight forward. While developing with Django, you are deploying a development Web Server not related to Apache, that you start from the console, and it is not visible from the outside. This enables each developer having her own instance of the web site. No administration passwords are needed and reloading is easier. Every message you print you'll get it in the console so it is ideal for debugging by tracing compared to using plain mod_python which runs on the apache server and you are to monitor the apache error log. Also, when some code fails with an exception, Django constructs a very handy web page with a lot of information: the back-trace enriched with code context, parameters and local variables inspection, all the environment values, the request values... All that available in a collapsible error web page. Lately I've been tempted of importing a failing python code into Django just to get such a big amount of information.

Also by running './manage.py shell' you get a python shell executed in the same conditions your code will be. Indeed the 'manage.py' script has very useful subcommands to infer model definitions from existing databases, to see database definitions from models, to import and export test data fixtures...

The next post

So yes, Django is a mod_python with gears and a Zope without the fat. However, while I don't regret choosing Django, it is not that perfect. On my next post, the last one on those first impressions on Django series, I will explain the problems we found while using the framework. The limitations we found and how it could be (is being) improved.

Well, maybe not the next post. Some other topics have appeared while writing this series of posts and they are likely to require an entry.

Sunday, January 20, 2008

Impressions on Django (I)

During the last months, I've been involved in a project on (text) information retrieval. The project required to develop a web interface over a python core and we chose Django as framework. Other option was using plain mod_python as we did for EfficiencyGuardian. But some time ago, in the context of the SIMAC project, Xavier Oliver, who was the responsible of implementing BOCA, CLAM Annotator's collaborative back-end, used Django and he was very enthusiastic on how rapid he had all the web working. So I wanted to give it a try for our current work. Here I am posting my first impressions as newbie django user. What Django has to offer?

Database abstraction


The most important part of Django is a persistent object model which maps python objects into database entities giving you a nice abstraction on the database layer. By defining such model classes, you get an object oriented programming interface to query and change related database tables, and even navigating and joining through relations as they were regular connected objects.

I normally dislike too transparent interfaces when they deal with efficiency sensible things such as database access. But Django states very clearly when and how such access is done allowing you to control it but using a high level object oriented idiom.

Administration interface


You can enable an existing administration module for your site. That is a web interface to create, edit and remove your model objects. You can control the way they are edited by adding extra properties to the model classes attributes.

Attributes have more types than real SQL types. Fields can be, for example, URL's, emails, telephone numbers or zip codes, and Django gives you for free custom validation and custom administration interfaces for them.

The administration interface also considers table relations providing, for instance, web interfaces for choosing related objects by foreign key and buttons to add a new related objects.

Application logic


Often, as in our case, the administration interface is pretty close to what the final application will be. But direct manipulation of a model is not enough for most web applications. Normally you need some additional application logic: Which functionalities are presented to the user, and which is the user dialog with the system to perform such functionalities.

Three elements are combined to build up such application logic in Django: URL mappings, views and page templates. URL mappings map regular expressions of requesting URL into calls to python functions. Such pỳthon functions are the views which perform the needed actions on the system and construct an output web page. Views usually inject python data into an HTML skeleton, the page template, to generate the response web page.

Django provides some convenience views to create, update, delete and listing model objects, also supporting common features such as pagination, date based browsing, validation and destructive action confirmation.

The next entry


Here i explained the basics of Django execution model. On the next entry i'll write some impressions on Django as development environment compared to other environments i used for web development such as Zope, mod_python and vanilla PHP.

Saturday, January 19, 2008

Qt 3 and Qt4 relicensed as GPL v3

I read in Thiago Maceira's blog that Trolls have relicensed Qt3 and Qt4 as GPL v3.

I still had no time to analyze how this will affect clam. Relicensing CLAM has proven to be a hard political and burocratic problem, but a transition from GPL v2 to GPL v3 is easier since we have the 'or later' notice, so we have the door open to that controversial upgrade. Definitely, trolls' move to v3 is something that will boost the adoption scene.

A nice lateral consequence of this affects directly to one CLAM application, SMSTools, which still uses Qt3, and not Qt4 (althought Zack Welch started a temptative Qt4 port). Until now, Qt3 didn't enjoyed the same nice dual licensing Qt4 has. Former Qt3 licensing was a modified GPL that had problems on using Qt3 for non-unix platforms such as windows. So with Qt3 relicensing we can now freely distribute SMSTools precompiled binaries for Windows.

Update: I read in the official announcement that they didn't droped GPLv2, they just added a third license to the existing dual licensing so you can use one of three licensing schemes: non-free, GPLv2 and GPLv3. Definitely they want to make developers live easier, not just by building excellent API's.

Friday, January 18, 2008

BiBTeX Tooltip in html WiKo output

I was busy yesterday on providing better BiBTeX support for the HTML output in WiKo. I was adding bibliography tooltips. They were an original suggestion from Pau Arumi, and i agree with him that they will be very handy for reading an article in electronic format. The former html bibliography page is still available: just click instead of hovering.

To have it working, you should download the latest wiko version, install 'python-bibtex' debian/ubuntu package and appending to your css style sheet the latest statements of this css concerning 'bibref' class.

Then, just place any bibtex files on the same folder the wiki files are and refer a bibliography entry in the wiki files as '@cite:SomeBibtexId'. By 'WiKonpiling' you'll get both normal LaTeX bibliography and this html equivalent.

Take a look on the final feel at this chapter of my master thesis.