fgaz

I am a student of computer science. I build things, and sometimes write here

Haskell Summer of Code 2017: Last Mile for `cabal new-build`: first and last status update

Reading time: 10 minutes
Table of contents
  1. What has been done
    1. #3638: new-run, new-test, new-bench, new-exec
    2. Data-files
    3. new-install ...almost
  2. How to try it
  3. What remains to do
    1. Finishing new-install
    2. The aforementioned garbage-collect
    3. Flags for package maintainers
  4. Difficulties I encountered
    1. Big codebase
    2. Lack of documentation
    3. Git
    4. Continuous Integration
    5. Project planning and organization
    6. Beware the paths!
  5. Good things
    1. Now I use new-build for all my Haskell projects!
    2. HSoC itself
    3. The Community
  6. Acknowledgments
  7. Upcoming posts

Time flies! The Haskell Summer of Code is over, and this is my first and last status update. Last in the HSoC, but not in the project, as you'll see.

My goal was to bring cabal-install's new-build to a usable state, to eventually replace the old commands.

What has been done

My original proposal was way too optimistic. Of the many secondary and optional goals I planned, I've reached zero of them.

Fortunately they were, as I wrote, secondary. new-build was and is a work in progress, but now it's close to completion.

Here's a summary of what I did:

#3638: new-run, new-test, new-bench, new-exec

The most important goal of the project was to reach feature parity with old-build.

With these commands, new-build only lacks new-install.

Data-files

Data-files are additional files to be included in a package distribution (be it a sdist, a debian package or the store itself). They are used when including the data in the executable itself (eg. with a literal string or even a picture using file-embed) is not feasible or not practical.

The path to the data-files is hardcoded at compile-time (the --datadir flag), and by default it is set to the store (where the majority of packages will reside). For this reason, when using an inplace-built package, datadir resolved to a non-existing directory.

Fortunately, datadir can be overridden with an environment variable, which is now properly set by new-run.

If your project includes data-files, now you can use new-run and it should work without problems.

new-install ...almost

This is the biggest command, because it actually handles four cases:

The first one, which allows the installation of programs from hackage, and is probably the most used one (eg. cabal new-install pandoc), is almost complete (needs some cleanup but it works).

The difference with old-install is that executables will be installed in the store and symlinked to ~/.cabal/bin or equivalent.

This raises the problem of garbage collection: deleting the symlink leaves the executable and all of its dependencies in the store. In the future, a cabal garbage-collect command will track the symlinks and automatically clean the store.

How to try it

Cabal HEAD should always build and should be fairly stable. If you want to try the new features just run:

    cabal get -s cabal-install
    cd cabal-install*

and then build/install it with cabal install or cabal sandbox init + cabal install or cabal new-build.

What remains to do

90% of the work was done, now there's the other 90%.

The HSoC is now complete, but new-build cannot replace the old interface yet: a few essential features are missing, and --now that I'm more familiar with the cabal codebase-- I plan to work on some of them even outside of the HSoC period. 1

a small mountain, and a big mountain
`new-install` is the one just to the right

Finishing new-install

One point done, three to go.

There is a design concept, but there are a lot of details to figure out, for the libraries in particular.

The aforementioned garbage-collect

The store can grow rapidly and reach several GB in size, so we need a way to clean it up without influencing any built or installed package.

Again, there is a design concept on the issue page.

Flags for package maintainers

Before replacing old-build, we need to be sure that package maintainers have still a way to control the location of binaries and data-files.

Difficulties I encountered

Big codebase

    > cloc cabal
    [...]
    Language          files      blank    comment       code
    --------------------------------------------------------
    Haskell            1273      24897      28127     120096

Cabal is big. Big project, big functions, big types, big everything.

My biggest Haskell project before HSoC was two orders of magnitude smaller, and this was my first impact with a real-world project developed by a team during an extended period of time.

an Interstellar meme
[ORGAN INTENSIFIES]

Haskell's strong types helped me a lot here. When I was working on new-bench, I just "followed the types", and everything clicked on the first try, as the legends say.

Lack of documentation

The cabal bus factor is somewhat low.

Some parts of the codebase are completely devoid of comments, and we lack an overview of the codebase, some document which describes how cabal works and where to find certain parts.

Few mysterious entities, known as cabal devs, are the precious holders of such arcane knowledge, and to learn from them one must prove himself worthy of it by sheer pinging-perseverance on #hackage

...fortunately they are always happy to give some pointers to newcomers :) .

dcoutts as a sorcerer on cabal as a creature
In this rare photography we can see dcoutts (top) while he invokes a nix-style build on cabal (below)

Moreover, the documentation is improving. The undocumented parts are mostly old code, and there is an ongoing effort to cover that too.

And again, Haskell's types come to the rescue! The type always enhances and often replaces the documentation in a more expressive way.

Git

I now see that my git practices weren't the best...

A messy git history

well, not at this level, but...

Credit: xkcd (CC BY-NC 2.5)

I had to learn git the right way.

Coding alone is different than doing it in a team. Until now I almost never had to deal with conflicts2, and I almost always worked only on master.

In the cabal repo, lots of features get merged very often (well, more often than in a single user repo anyway), so it's easy to get a conflict. The history has to be kept clean, so I finally had to learn how to rebase properly too.

This didn't always go well. In an attempt to rebase an old branch I accidentally created a convoluted merge graph, almost impossible to disentangle.

A very messy git history
Welp.

Oh well, we learn from our errors.

I also had to exercise my multitasking skills by working on different branches. While the tests for one branch were running, I could write the docs for another command. Or two. The tests take a long time.

Which takes us to...

Continuous Integration

There's 4 hours worth of tests, from ghc 7.6 (cabal has a support window of five years) to 8.2. And there are the FTP and the AMP in the middle.

Two stick figures play with swords while ci runs
Credit: xkcd (CC BY-NC 2.5)

But this is a plus! Apart from the many sword fights, the tests helped me catch a lot of bugs before even committing them. Now I use ci for many of my personal projects too.

Project planning and organization

I never had to do a fixed project on a strict schedule, and Daniel helped me a lot here. If it wasn't for him I'll probably be trying to fix mostly pointless bugs now. Planning time is not wasted time!

Beware the paths!

One little thing: cabal now uses a slightly different path for built binaries.

My build command used the old one, and I wasted 1 hour or so trying to figure out why my debug statements weren't printing anything.

This happened ~~two~~ three times.

You only need to worry about this if you use both pre- and post-2.0 new-build in the same project though. While pre-2.0 puts the executables in dist-newstyle/build/package-id/build/exe/exe, post-2.0 puts them in ~~dist-newstyle/build/os/compiler/package-id/build/exe/exe~~ edit: this may change again before new-build becomes the default, so just look it up in the docs or in your dist* folder.

Good things

Now I use new-build for all my Haskell projects!

Obviously I eat my own dog food now. And I like it.

In the next weeks I'll write a post about it, specifically about how to integrate new-build and vim.

HSoC itself

The Haskell Summer of Code has been a wonderful experience, and I recommend it to every student who is reading this!

It taught me so much, and I loved being able to work on an open source project as widely used as cabal (well, in the Haskell ecosystem at least :-P ).

The Community

And last but absolutely not least, the #hackage community was always very helpful and friendly, offering constructive criticism and involving me in related projects.

Acknowledgments

I'd like to thank the Haskell community, always friendly and striving for knowledge; the organizers and sponsors, which made possible this HSoC; and most of all the #hackage folks: Daniel, my mentor, who helped me a lot when I was lost in the code or when I needed to plan, Duncan "dcoutts", Herbert "hvr", Mikhail "refold", Edward "ezyang", alanz, merijn, phadej, cocreature, and any others who helped me along the way.

Upcoming posts