A common problem during our OOXML import is that there are several different OOXML dialects: OOXML transitional, OOXML strict and the not specified version written by MSO 2007. The MSO 2007 version is mostly identical to OOXML transitional with the small but nasty exception that they have some differences in the default values. Recently I got a document from a Collabora customer using MSO 2007 exhibiting some bugs related to that.
A few days ago I finally managed to bring support for handling the differences between the OOXML dialect written by MSO 2007 and the one in the specification to LibreOffice. This is an important step forward for our OOXML chart import as that code was written against the MSO 2007 version and more and more documents are generated by newer MSO versions. In recent years we have changed quite a few of the default values in the code to handle OOXML specification conforming documents correctly. Sadly this introduced a number of regressions for the handling of MSO 2007 documents.
With  and  we are now able to recognize files that have been created by MSO 2007 and are able to use different default values. Currently this is only used for the flag that decides if the chart title is deleted but more cases might be fixed in the future.
I had the privilege to attend GUADEC this year and speak about Libreoffice. I was really impressed by the conference and enjoyed the beautiful city of Strasbourg and the nice Gnome community.
My talk was about Resuing Libreoffice in your application and centered mainly around LibreOfficeKit, “The Document Liberation Project” and new features in Libreoffice 4.3.
The next conference that I will attend will be the LibreOffice conference in Bern, Switzerland where I will give presentations about OpenGL in Libreoffice, recent development in charts and automated testing.
The release of the next LibreOffice version is not that far away with a lot of cool new features. Additionally to the many nice features already mentioned on our Release Notes page I want to talk a bit about one of the new chart features that will be part of the 4.3 release.
What is property mapping and how to use it in a chart?
Property mapping is a way to map a property of a chart series, for now fill color and line color, onto a data range in a spreadsheet. Based on the value in the spreadsheet the property value is changed.
If this sounds familiar you are correct. Inside spreadsheets you have a similar feature called conditional formatting that allows formatting of a cell based on a spreadsheet value. Until now all the chart formatting was either fully automated based on default values in the LibreOffice code or hard formatting. The new “conditional formatting” for charts allows us to dynamically adapt the chart formatting based on the data in our spreadsheet.
A simple use case for this feature is to highlight special values in your chart. In older versions you would need to modify the formatting of the chart each time your data changed. With this new feature you just have an additional column where you calculate the color automatically based on the value of the point. In the screenshot below data series “col2” has a property mapping that formats the bar red if the value is larger than 3, otherwise green.
How do you add a property mapping to a chart?
Adding a property mapping is quite simple. In the chart wizard or in the data ranges dialog select the data series and add a property mapping based on the list shown after clicking on the “Add property mapping” button (The available mappings depend on the chart type). In the next step set the range property for the mapping as shown in the screenshot.
The feature is already working quite nicely in current daily builds however I’m aware of some open items that need improvement. The UX team asked for a few changes to the dialog and in my opinion there needs to be a way to prevent that empty cells are treated as 0 (black). I think it might be a better idea to use the series color in case we find an empty cell.
Additionally the concept is still a bit user unfriendly. The mapping is based on implementation defined property values and calculating the correct RGBA value needs some experience. A small step into a more user friendly handling is the addition of the COLOR spreadsheet function that takes 3 (RGB) or 4 (RGBA) parameters and returns the correct value.
Testing in a daily build is highly appreciated. Additionally I’m still looking for ways to extend property mapping to non-color properties but I’m missing a good concept. If you have an idea for a good mapping between values and properties please drop me a note.
I wrote a blog post last year reporting about our import crash testing with a python script and how we use these results to improve our quality. Since last year we have extended the script and use it regularly on a TDF server.
Export crash testing
The largest change to the script was the new support for export crash testing. Every document that we successfully import is now exported to a number of formats depending on the application that opens it. Similar to the import testing crashes are logged into a file and are made available together with the import crash testing logs.
File Format Validation testing
Based on the exported files we started to run validators against the exported files. Right now we use officeotron for validating exported OOXML files and ODF Validator for validating ODF files. The logs for each document are written to an own file and published on a TDF server. Additionally to prevent introducing validation errors we started recently to use the same validators in our build to validate the files generated by our automated tests. Building with –with-export-validation and two scripts similar to the ones found here a validation error in the exported files will generate a test failure.
Increased document pool
At the time of my last blog post we were using a bit less than 25000 documents for the import testing. Since then we increased that number to about 54000 documents in many more formats. Together with the export testing which generates about 120000 documents with about 90GB of generated files the tests need about 3 days to run.
The reports have been incomplete recently as we have been hit by a bug currently suspected to be in the kernel. Around the 10000th document the load of the server increases without doing any actual work. We are currently trying to determine if it is a single document that is responsible or if it is a combination of a more complex setup. It has been limited to the 18000 writer documents already.
As always I’m looking for people who either want to fix one of the issues or improve the script.
Recently it came to our attention that we can only handle OOXML transitional and the older Microsoft dialect of OOXML. A short analysis of the document showed that OOXML strict uses different namespaces and different relationship URLs.
After two days of hacking and a lot of help from Miklos Vajna, who fixed the docx import problems, we support now OOXML strict import in master and in the Libreoffice 4-2 branch.
Please test this feature with a daily build or with the upcoming Libreoffice 4.2.3 and report problems that you have with OOXML strict.
Posted in Libreoffice
The new cppunit release which is just a minor update brings mainly support for 64bit Windows builds and fixed some packaging bugs related to the Visual Studio project files. For Linux/Unix users the only change is that we report dlopen errors now correctly thanks to a LibreOffice patch.
Please report bugs and feature requests to the freedesktop bugzilla or the developer mailing list. More information including the MD5 hash can be found on the project page.
I have been working recently on finishing the work on a python script for Libreoffice that automatically imports documents and tests if we crash there. The plan is to run this script automatically against all our bugzilla documents on a regular basis.
I have been running similar tests for calc files(the TEST_BUG_FILES) case already before the 3.5 and the 4.0 releases and fixed these crashes with Eike and Kohei before the releases. However this work was done half manually inside of an “unit” test and as soon as it crashed I had to restart the test. As a result of this complex setup it took me between 4 and 6 days to import all 6000+ calc documents. Back then I already had the idea that this task could be automated and moved to a TDF server.
I already tried to convince someone at the Munich hackfest to write such a script as an Easy Hack but had to wait till December until Joren picked up the task. Based on convwatch.py he implemented the first version that has undergone several iterations now and can be found in the Libreoffice dev-tools repository. The script is still looking quite ugly as I have been only adding code and it still contains a large number of debug output for me but the current version should work fine against current Libreoffice master.
After several toolchain problems, one needs a libstdc++ created with at least Linux binutils 22.214.171.124.1 or newer, I finally published the results of the current test run at the Libreoffice developer mailing list. In my latest test run I had a collection of a bit more than 24500 documents with the file extensions cdr, doc, docx, fodg, fodp, fods, fodt, odg, odp, ods, odt, ppt, pptx, pub, rtf, vdx, vsd, wpd, xls, xlsx. While 60 crashes might sound like a lot one has to remember that many of these crashes will never be seen by users. The test is run with a dbgutil build which means that we enforce the exception specification, we switch the standard library to the gcc debug library which has additional assertions and some crashes are related to the special set up of the test. Nevertheless we are planning to fix all these crashes and use the script as part of our automatic testing.
And finally a special thanks to the amazing Libreoffice community who was incredibly supporting in realizing this crazy concept.