Do symbols files make sense for C++?

I also posted this to debian-devel, which is where the discussion will be, but I wanted to post it on my journal as well so that others who may be interested will see it on Planet Debian. For those not familiar with Debian, this concerns the implications of a new feature for how Debian tracks shared library dependencies.

I'm currently working on the Policy modification to document (and recommend) use of symbols instead of shlibs, but I'd only personally used symbols with C libraries. Today I decided that I should try adding a symbols file to a C++ library, particularly if I'm going to recommend everyone do it. I tried this exercise with xml-security-c, which is, I think, a reasonably typical C++ library. Not the sort of core C++ library that would sit at the center of the distribution, but a random software package that's in Debian because other things use it.

The experience was rather interesting, and I ended up uploading the new version without a symbols file and continuing to just use shlibs. That's for the following reasons:

  1. The generated symbols file was HUGE. Hundreds of lines. This is a marked difference from the typical C symbols file, which is of quite manageable size. Some of that is that the library provides a lot of different classes, but some of it is that C++ just generates a lot of exported symbols. There's no way that I could do what I would do with a C library and understand those symbols, why they're there, and whether they are likely to have changed between revisions.

  2. Generating a reasonable symbols file was a pain. Generating an unreasonable symbols file that just contains all of the mangled symbols is largely mechanical and uninteresting, but that symbols file doesn't seem to me to convey useful information. So I did some scripting to translate the symbols back with c++filt, and add (c++) tags, and then try to understand what I was looking at and figure out whether I should sort the symbols list because the default sort is by mangled name, which is meaningless. This is a rather unappealing process. It's not particularly difficult, but it's very awkward and feels like it's missing vital tools.

  3. The resulting symbols file is incomprehensible to someone without strong knowledge of C++. It's full of opaque entries that don't make sense to the non-C++ programmer, which I suspect is a substantial number of people who package C++ libraries for Debian. I know enough C++ from school that I can evaluate security fixes, make simple patches, and review upstream changes, and I think that's all that should be needed to package things for Debian. But I'm deeply uncomfortable producing a symbols file on my own that contains entries for things that I know nothing about and cannot evaluate when they've last changed, like "non-virtual thunk to FooClass::~FooClass@Base".

  4. Once I had a symbols file that resulted in a successful build and that I could have uploaded, I started thinking about how I was going to maintain it. With a C program, I would change the symbols file versions when the underlying function implementation changes in a way that may not offer eqiuvalence, similar to bumping shlibs. I realized that I was going to have no idea when that happened, and the only way that I would maintain the symbols file would be to either trust upstream to maintain ABI equivalence and therefore only change the symbols file when upstream changes the SONAME, or not trust upstream to maintain ABI equivalence and therefore change all the versions with each new upstream release. That gives me exactly the same semantics as a shlibs file, so what's the point in having a symbols file?

  5. The exported symbols of the library contained many symbols that obviously weren't really from that library, but instead were artifacts of the C++ compilation process, things like instantiations of std::vector. Do those go into the symbols file? Do they change from architecture to architecture? If they disappear again, is that actually an ABI break? How do I know? It's all very mysterious, and while shlibs provides the same semantics as just ignoring this, at least I'm not then including in the package data, generated by me, things that I'm just blindly ignoring.

I came away from this experience thinking that I should revise the Policy amendment to say that symbols files are really for C libraries and for C++ libraries with either a tightly maintained symbol export list or maintained by a C++ expert, and that most C++ library maintainers should just not bother with this and use shlibs, bumping the shlibs version or not based on their impression of how good upstream is at maintaining ABI equivalence.

But that feels like a result contrary to what I had previously thought was the intended direction, so I wanted to ask the Debian development community as a whole: am I missing something? Are these symbols files actually useful? Am I missing some trick to make them useful?

Posted: 2012-01-25 23:04 — Why no comments?

Last spun 2013-07-01 from thread modified 2013-01-04