Wednesday, March 16, 2016

Tips for LibreOffice newbies

Li Haoyi has written an excellent blog post entitled "Diving Into Other People's Code" about diving into an unfamilar codebase (HN discussion here).

I think this is really very helpful for anyone who wants to look at the LibreOffice source for the first time. Many of the things he mentions are directly relatable to LibreOffice - in particular getting your dev environment setup is particularly relatable. 

Getting Started

To develop in LibreOffice, you really need some level of C++ skill, an ability to read other people's code, and be willing to learn and improve over time. However, you can still contribute Java and Python code to parts of the project (though in terms of unit tests, we really encourage converting Java unit tests to C++ tests eventually). 

Any newbies should start here:


and then have a look at:

https://wiki.documentfoundation.org/Development/How_to_build

One easy way of setting up your development environment is to use lode, which is a project that helps you setup a base environment to build LibreOffice:

https://wiki.documentfoundation.org/Development/lode

Once it is installed, check out how to use gerrit:


Be warned, depending on your setup this can take quite a bit of time and can be at times frustrating. We do our best to make things as easy as possible, but even building LibreOffice can take hours and hours to complete. 

Learn git

A bit of git experience is very helpful - luckily you only ever have to do a git clone once, but you should learn about branching, rebasing, pulling and pushing. I would strongly suggest using the logerrit scripts. I personally use the following to pull in changes onto my current working branch:

./g pull -r

I strongly recommend spending a few hours on the following graphical tutorial on git, Learn Git Branching. This is well worth doing regardless of whether you contribute to LibreOffice - knowing how git works to the degree that this shows you is a skill that is immensely valuable. This tutorial demystified a lot of git concepts for me, and helped me "get" how git does things. Eventually you will get stuck in some git mess, so knowing how to reset is very valuable. 

I also recommend learning how to use interactive rebase in git, which is done via:

git rebase -i

What this does is take your changes are change the "base" of the commit in question. You can move the commit up or down the version history, reword specific commit messages, "squash" commits into the parent commit (be very careful about doing this) or edit the commit itself. When you run git rebase -i it opens up your text editor - just read what it states in the editor to learn how to use it. 

If you do ever mess up a rebase, incidentally, you can use the git reflog to back out of your change. Check out the Atlassian reflog tutorial on this, could be a lifesaver. Another tutorial I used before I found the Atlassian one can be found here, try the Atlassian one first though. 

Hot tip for vim fans

If you are a vim fan, then I suggest the following setup for your .vimrc file:

filetype indent on
set tabstop=4
set shiftwidth=4
set expandtab
highlight ExtraWhitespace ctermbg=red guibg=red
match ExtraWhitespace /\s\+$/

augroup trailing_whitespace
  autocmd!
  autocmd BufWritePre *.c :%s/\s\+$//e
  autocmd BufWritePre *.h :%s/\s\+$//e
  autocmd BufWritePre *.cxx :%s/\s\+$//e
  autocmd BufWritePre *.hxx :%s/\s\+$//e
augroup END

We don't use tabs, we use 4 spaced-tabs. Some time ago I discovered a way of showing extra whitespace at the end of lines directly in vim, and then later after some discussion on the LibreOffice developer mailing list I discovered how to safely remove trailing whitespace on write

Start Hacking!

When you are modifying the LibreOffice codebase, you'll find that you'll spend more time reading and grokking the code than actually modifying it. I'd consider this normal, so if you find this is occurring don't fret - we all have to do this. The LibreOffice codebase is massive, I think from memory it has more lines of code than the Linux codebase and has been in continuous development since 1988. 

When I started on LibreOffice hacking, I found I was most interested in finding the code entry point. Frankly, I was only initially reading the code for pleasure (yes, I'm strange that way) and so I used http://opengrok.libreoffice.org to browse the code. 

However, many years before I had actually attempted to compile OpenOffice.org - and failed rather spectacularly as I ran out of disk space. Whilst this was occuring I did a lot of reading of OpenOffice.org developer manuals - I really recommend reading OpenOffice.org's Developer Guide. Don't be put off by the fact that it is hosted on the competing Apache OpenOffice site, it's really very high quality and whilst the LibreOffice codebase is more actively developed there is still a lot of really useful information on the Apache OpenOffice sites. 

Once you have read enough, have a look at our EasyHacks list. These are a list of things that need to be done in the LibreOffice codebase that are designed for newbies - and if you ever want to participate in Google Summer of Code you'll need to demonstrate that you've worked on a few bugs or submitted a few commits to deal with easy hacks. You'll also know that you've contributed to a codebase that literally millions of people use every day, and your name will be in the commit history. :-)

Join in with the development community

LibreOffice is a very friendly environment. We mostly go out of our way to help out newbies, as we want to encourage as many participants as possible. A such, you can join us on the #libreoffice-dev IRC channel on Freenode. We don't largely answer user questions (for that there is #libreoffice), but if you are getting stuck on something feel free to ask on the channel. 

The other way of joining in is to subscribe to the developer mailing list. You can either subscribe to the digest, or get every message directly - in which case I suggest setting up a filter or else you might get drowned out in emails. If you do get digest messages, please don't reply to the digest directly but copy the message you want to respond to into the email, and retain the original subject line, prefixed with "Re:". We suggest setting up the Reply-To header on your mail client, but it's good practice to ensure that the original mail receipients are cc'ed into your responses. 

(I'm also a chronic abuser, but please don't top-post. Don't be me.)

Commit messages

A final tip: commit messages are really important. I personally tend to write longer messages detailing the work that I've done, but that's just my own style. The most important thing about commit messages is they must be to the point and explain what you have done, not what the code used to do. If your code change is complicated or not necessarily easy to follow, then it's best to try to leave an explanation of what you are doing. The commit title should be clear enough that someone browsing the git log in cgit can understand the purpose of the commit. 

If your commit is easy to understand or minor, then only a commit message title is needed. If you need to explain things in more detail in the commit message body, as I said, you should state what you did. 

An example that shows both what you should and shouldn't do is where I fixed an issue where the PPI wasn't being exported to JPEGs correctly

Good things about my commit message
  • Note that as I fixed a bug, I prefixed the commit message short title with tdf#85761. There is a bot that reviews the git logs and if it finds this in the message title it automatically updates the bugzilla bug. You can only reference one bug however.
  • I summarised my change that we don't hardcode JPEGs to 96 DPI any more but use mapmode pref size to get the DPI, and moved the scaling function from the EPS filter code to the MapMode class. 
  • It explains what the problem actually is, as it was a bit difficult to understand the cause of the issue from the original bug report. 
Bad thing:
  • I still spent too much time explaining what the code originally did (thus violating my own recommendation I give in this blog post).
FWIW, it can still be useful in explaining what the code did, but I'd leave that to a note in the last paragraph - and I recommend you make it brief. 

Friday, January 22, 2016

Why I love hacking at LibreOffice

The LibreOffice codebase is, to be frank, messy. This isn't a criticism of previous developers - it's still an amazing product and an amazing feat of programming given the number of platforms it runs on. The StarView guys, and later OpenOffice.org development team, did a great job. For instance, I was reading up on the font mapping code and I often saw Herbert Duerr's name, and I've got nothing but respect for the work that he did and his dedication to the project.

Unfortunately, the messiness is the result of a codebase that was first created in 1988 and has been actively worked on and changed directions a number of times. So... what is it that I love about working on LibreOffice?

In a word: the people. I've never met any of my fellow hackers in person, only a few phone calls to Michael Meeks and listened in on some Collabora meetings when I did some work for them. Mostly I stick around the IRC channel and keep an eye on the mailing list.

On the IRC channel, however, I've noticed the good humour and tolerance of the developers towards those not as expert as themselves. For example, Stephan Bergman (sberg) is the go to guy in terms of anything C++ related - and the other day he helped me out with a tricky bit of code (well, I thought it was tricky) related to the equality operator. Tor Lillqvist, as another example, is a diamond in the rough IMO and has often pointed me in the right direction when I'm online. And I cannot forget to mention Norbert Thiebaud (shm_get on IRC) for his constant maintenance of Jenkins (a thankless task!) and gerrit. These are literally just some of the folks who I interact with all the time and just a small sample of the folks who work on the LibreOffice project.

Without the goodwill and encouragement of the folks who hack away daily on the LibreOffice project, I doubt I'd be able to work on the small part of the codebase that I have chosen to improve. In fact, there is an active outreach program currently being run lead by Jan Iverson (JanIV on IRC) and there are quite a few commits coming in from newbies of all stripes. This is leading to some real improvements in the code, for instance new developer Dipankar Niranjan did a whole series of commits to remove the "dog-tag" code from the VCL module (the "dog-tags" was a way of attempting to ensure that Windows weren't deleted at the wrong time, but was a total hack and has been solved by VclPtr) - this has recently allowed sberg to remove ImplSVEvent::maDelData. Seeing the code evolve is also highly enjoyable :-)

There are so many more examples of this it's just not possible for me to list them in this post. It's interesting that LibreOffice has made so much traction in the last year in terms of code quality - whilst the UI hasn't changed hugely (though that is going to change I suspect in the next year!) the incremental improvements have been increasing at a rate of knots and folks are beginning to notice the positive effects. One user, who will remain nameless but is an active follower of the LibreOffice development process, told me that he noticed that the 5.x release was very much more responsive and stable that the 4.x series... and amusingly commented that "it's remarkable how interesting it is to watch the development of such an intrinsically boring type of software".

So there you have it, the reason I love hacking at LibreOffice is the people who work on it! May it ever be this way.

Thursday, January 7, 2016

Restricting Doxygen indexing to particular LibreOffice modules

I have committed a code change to the LibreOffice tree to allow developers to specify what modules they want Doxygen to index. I have done this because indexing every module is quite time consuming, and you may only want to review a subset of modules (in my case I am interested primarily in the vcl module).

To index specific modules only, you now export the $INPUT_PROJECTS environment variable and run make docs. For example, if you want to index only the vcl and svtools modules, you would do the following:

chris@libreoffice-ia64:~/repos/libreoffice$ export INPUT_PROJECTS="vcl svtools"
chris@libreoffice-ia64:~/repos/libreoffice$ make docs
chris@libreoffice-ia64:~/repos/libreoffice$ unset INPUT_PROJECTS

You can(not) have a void namespace in C++...

Update: So this was discussed on IRC, and it turns out I've got this all entirely wrong. It's actually much simpler - it turns out you don't need a space before the scope operator thus you can legally have:

void::TheClass::TheFunction(int n1, int n2) {}

This is the same as:

void ::TheClass::TheFunction(int n1, int n2) {}

and:

void     ::TheClass      ::TheFunction(int n1, int n2) {}

--------

I ran doxygen over the vcl module in LibreOffice and I got a lot of warnings and errors. I am now making a concerted effort to fix these warnings because I actually find doxygen generates good info on the classes, and I rather like the collaboration and inheritance diagrams it produces.

However, a strange error that it produced was:

/home/chris/repos/libreoffice/vcl/source/control/combobox.cxx:1322 warning: no uniquely matching class member found for void::ComboBox::Impl::ImplUserDrawHandler(UserDrawEvent *pEvent)

"A void namespace in the LibreOffice codebase?" I thought. How could this be? How is this even possible?

Sure enough, the definition was void::ComboBox::Impl:ImplUserDrawHandler(UserDrawEvent*)!

In an effort to work out if this was what was intended (which I was very doubtful of) I tracked down the commit, which converted the old style IMPL_LINK macro to a boost signal. It was definitely not intended!

I then tracked down where the C++ standard is held and reviewed the C++14 draft (I can't afford to spend US$216 on the official ISO standard). Section 7.3 deals with namespaces:



Amazingly (to my mind) there is no restriction on the namespace identifier, thus it is entirely possible to accidentally create a void, int, float or any other reserved C++ keyword as a namespace!

I wonder if it might be worthwhile for compilers to spit out a warning if they see this? Perhaps we need to write a clang plugin to detect this sort of thing in LibreOffice. The only reason this hasn't had an impact is that it doesn't appear that anything is currently using user-defined draw signals on comboboxes.

I pushed commit 86338fbb94324ed to fix this.

Tuesday, January 5, 2016

Possible 15 year old regression

A few days ago, I filed bug 96826 ("Typewriter attribute not given enough weight when finding font based on attributes") against LibreOffice. Basically, I've been refactoring VCL font code and I was very interested to see what my predecessors had been doing before me. This meant that I actually read pretty much all of the commits I could find all the way back to the year 2000 for that file, which held much of the font code for VCL. 

One of the functions was originally in a class called ImplFontCache, which had a get() function that worked as a font mapper. It basically does a running total of a number of weighted values depending on the "strength" or "quality" of the font matching attribute. One of those attributes was to check if the font is a fixed-width font, then check to see if the font is a typewriter style font, giving a deliberately higher weighting to typewriter style fonts. 

Anyway, on the 29th June 2001 someone with the initials th (no idea who this was, nor is it really important) did some tweaks to the function in an attempt to improve font mapping in commit 925806de6. To ensure that the typewriter style was given a weighted value, they have multiplied the value by 2. However, they appear to have accidentally removed a zero from the weighting, thus reducing the weighting of the typewriter style fonts, not increased it!

As I was reading the code, I thought "whoops, that was unfortunate" and assumed I'd see the error picked up somewhere down the track and fixed. Now either the weighting has little effect on these fonts, or not many people have problems with typewriter fonts not being cleanly mapped by the underlying platform, but this has never been fixed

The code has since migrated its way into PhysicalFontCollection::FindFontFamilyByAttributes(), and it's still there!

    610         // test MONOSPACE+TYPEWRITER attributes
    611         if( nSearchType & ImplFontAttrs::Fixed )
    612         {
    613             if( nMatchType & ImplFontAttrs::Fixed )
    614                 nTestMatch += 1000000*2;
    615             // a typewriter attribute is even better
    616             if( ImplFontAttrs::None == ((nSearchType ^ nMatchType) & ImplFontAttrs::Typewriter) )
    617                 nTestMatch += 10000*2;
    618         }

As with any codebase with a 30 year pedigree, I've not committed a fix because I just don't know if this is something I want to touch. Hence I've logged the bug to see if anyone can shed any light on it. I'll probably follow up on the mailing list in a few weeks time to ping any more experienced developers, but until then this potential bug, which is in the middle of going through puberty, will remain.