The One and the Many

X, fonts, and rpbar

Recently I fixed a text display issue in an X program I use called rpbar. I learned about X libraries, drawing text, and using fonts in X. I'm going to talk a bit about what I learned through describing how I resolved the problem.

The problem

rpbar is a task bar. It is a horizontal bar with buttons, where each button represents a window you are running. The text on each button is the window's title. rpbar looks like this:

What rpbar looks like

I noticed that titles with non-ASCII characters would not display correctly.

Sometimes rpbar truncated titles starting at the non-ASCII character. For example, it showed a title with the trademark symbol ™ (U+2122) only up to where the ™ should be. Other times it showed characters but they were clearly corrupted, such as showing extra characters that were not present.

This looked to me like improper handling of character encoding. My system is set to use UTF-8 encoding, so I expected that rpbar was in some way mistranslating its input/output. I thought to trace through the program and make sure it correctly expected UTF-8 throughout. Specifically I wanted to see where the window titles were coming from, and examine their path through to showing on the screen.

Initial investigation

rpbar uses the Xlib library for its interaction with X. This is a C library interface to the X protocol. Its documentation is here (many functions also have man pages).

The X.Org developers deprecated this library for new development. It is still available but they recommend not using it and to transition away. They recommend using something like GTK+ or Qt instead, both of which are fairly heavyweight in comparison. There are also two newer lower level replacements for Xlib. One is XCB, the current recommended library, and the other is Xt, which like Xlib they have deprecated.

I hoped to avoid adding a new library dependency to rpbar if possible. I expected this would be more palatable for a pull request as well as be a smaller change.

rpbar retrieves the window titles by running a command with popen() and then repeatedly running fgets() to retrieve the output. It reads into C++ std::strings which it eventually draws to the screen using XmbDrawString(), an Xlib function. I did not see any translation between character encodings.

Looking at Xlib, I found several functions available to draw text to the screen:

Could using a different one of these fix the problem? I thought the UTF-8 one in particular sounded promising.

Attempt 1: Xlib, locale, and alternate draw string functions

While rpbar using XmbDrawString() seemed correct since UTF-8 is a multibyte encoding, I decided to try switching to the Xutf8DrawString() function. It is not clear to me the intended difference between the two functions, and I did not find any documentation about why the developers created the second. The man pages for these does not give much to go on.

I believe now that either should work (for a given value of work) when working with UTF-8. I suspect Xutf8DrawString() works without needing to rely on global locale information, whereas XmbDrawString() supports different multibyte encodings depending on your locale. Either way, I expect UTF-8 character encoding throughout. But I decided to try switching to see the behaviour.

Unfortunately I saw no difference in behaviour after changing to Xutf8DrawString().

I started looking at how rpbar loaded the font to use. It called an Xlib function XCreateFontSet(3). The man page for this function says this: "The font set is bound to the current locale when XCreateFontSet is called". What locale was rpbar using? There was no setlocale(3) call, so it was running in the C locale!

I added a setlocale(LC_ALL, "") call in main(). This loads locale information from the environment. This immediately improved things. I could now see some non-ASCII characters correctly, such as the right single quotation mark ’ (U+2019).

However there were still issues. ™ still did not display correctly. In its place was "b. Weird. Why?

When you try to draw a character that is not available in the loaded font set, then XmbDrawString() draws either a replacement string, or nothing. XCreateFontSet() tells you what this replacement string is. In my case the replacement string was blank, so nothing should show if the problem was a missing character.

If you try to draw an invalid codepoint then the behaviour is undefined. However I did not believe this to be the case since the character displayed fine elsewhere (using printf for example!).

Attempt #2: Xlib with different fonts and encodings

At this point I felt stumped, so I tried a few ideas.

Nothing I tried drew this character.

I decided I should look at how others solved this problem.

The rpbar author based parts of it on dmenu. I looked at dmenu's text drawing functionality. I found that in 2015 the dmenu developers replaced their Xlib text drawing with Xft. Looking back at the latest Xlib version I did not find anything useful that would solve this problem. In fact dmenu was using almost the same method as rpbar. Maybe dmenu's Xlib version had this very problem. Maybe that is why the dmenu developers switched to Xft. I checked their mailing list for rationale for the switch, but could not find any.

Attempt #3: Moving to Xft, part one

I decided to try Xft. Luckily Xft can work on top of Xlib so I did not need to rip everything apart. I could replace only the text drawing parts.

I replaced the XmbDrawString() call with Xft's XftDrawStringUtf8(). I removed all of the old font code which (initially) simplified rpbar a fair amount (no more falling back to XLoadQueryFont() if XCreateFontSet() failed for one).

I needed to replace the method used to determine font height and width, but happily there were counterparts I could borrow from dmenu.

The conversion to Xft was quite simple. There is a man page, Xft(3), and I also found a tutorial written by the author. Other than these the documentation is limited. I relied heavily on dmenu's code.

I was drawing text with Xft. With the font Inconsolata I found that all characters I tried worked well, including the previously problematic ™. Hurray!

I tried another font, the one used by default in rpbar, Terminus. Frustratingly I found that none of the non-ASCII Unicode characters displayed correctly, not even the ones that were working with Xlib. In their places rpbar showed only an "n".

The solution: Multiple Xft fonts

I decided to go back to dmenu to see if they did anything I was not. I remembered there was a complex function to draw text, and I had used only the simplest parts.

Sure enough the solution was there in the dmenu source. They use an Xft function called XftCharExists() to check each character to see if it is available in the font. If a character is not available, then they perform some Fontconfig magic to load an additional font that does have the character, and draw it with that font.

I added the same logic into rpbar and the strings displayed perfectly in each font I tried. When using Terminus rpbar now loads a second font when a string contains a missing character. (I had to adjust the method to measure the width of a string of text to use this same logic since measuring it with a single font was no longer guaranteed to be accurate).

This means that each time we draw text to the screen it could be in multiple fonts, but at least it means all characters will show up (if any available font has them!).

Thoughts on Xft

Switching to using Xft instead of Xlib for loading fonts and drawing text was easy. As a bonus, Xft also has a nicer way to specify fonts than Xlib which uses an opaque XLFD syntax which looks like -Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1. There are also fonts that work with Xft but not with Xlib, such as my favoured font, Inconsolata.

There is a downside though. The method in rpbar to draw text is a lot more complicated now with the need to check each character and load additional fonts. I can't rely on a single Xft call to draw a string of text. Possibly this is something that would be good to include in Xft's API. I expect higher level libraries and frameworks would do this for you.

Tips for working with Xft, Xlib, and fonts

A few things I found useful:

Unanswered questions

While I solved the problem I wanted to, I did not answer every question I came across. I made several assumptions.

Xlib's XmbDrawString() vs. Xutf8DrawString()

Which should I use? What's the difference?

After experimentation I've found that XmbDrawString() can draw more characters than Xutf8DrawString(). With the utf8 function, some characters will show up as "boxes" in the -misc-fixed-medium-r-semicondensed font that will appear when using the mb function. I don't know why. Viewing the font with xfn I can see the characters.

Update:

I looked at the xlib source code. The source for these two functions ends up being these two in omText.c: _XmbGenericDrawString() and _Xutf8GenericDrawString(). They both call _XomGenericDrawString() but with different type arguments. The first passes the type as XOMMultiByte while the second passes XOMUtf8String. This last function converts the input to XChar2b.

When using a UTF-8 locale, both the mbDrawString and utf8DrawString functions end up expecting UTF-8 input. In both cases conversion goes through omXChar.c's _XomInitConverter() to lcUTF8.c where the main conversion happens.

However there is a difference. In the mbDrawString case we convert from multibyte to fontcharset (open_utf8tofcs()). In the utf8DrawString case we convert from utf8string to charset (open_utf8tocs()). I'm not sure the full extent of the differences, nor why this is the case. The effect though is that some characters don't render correctly.

While it would make sense that the two functions would be identical with a UTF-8 locale, Xlib internally performs the conversion differently in these two cases. From what I observe, Xlib supports the mbDrawString case better. First piece of evidence: I can render more characters! Second piece of evidence: The utf8DrawString function uses an "indirect" conversion (see omXChar.c).

Xlib's missing character behaviour

With Xlib, is it possible to check if a font/fontset can draw the character correctly?

With Xft I can check each character for whether it is available in a font. I wonder if this can be done with Xlib.

The only way I could see to know this was to rely on XCreateFontSet()'s replacement string behaviour. But you can only know this after you try to draw the text to the screen rather than beforehand as with Xft.

Xlib and certain characters not rendering

Why can't I draw certain characters with the Xlib *DrawString() functions?

For example, ™ renders as "b (or not at all) whatever I try.

As far as I can tell the fonts I test with have this character available. I can see it if I open up the font in xfd. (Fonts I primarily tried being Terminus and -Misc-Fixed-* fonts as recommended by Markus Kuhn. I can also draw the character fine if I use the Terminus font with Xft. Is it simply that Xlib is broken for some characters? Why do some non-ASCII characters render and others don't? The UTF-8 encoded version of ™ is 3 bytes, but others that are 3 bytes render fine. There are characters with higher code points that render as well, so it is not a question of a cut off.

Update:

I looked at the Xlib source code and figured out what was happening.

What happens for FontSets is that Xlib opens the XLocale Database and finds there a list of encodings to try to convert the input to.

For en_US.UTF-8 the database file is here on my Debian system: /usr/share/X11/locale/en_US.UTF-8/XLC_LOCALE. It lists, in order, ISO8859-1 ... KSC5601.1987-0 ... ISO10646-1. (I omitted a few).

I found that for the majority of inputs, Xlib converted them to charsets such as ISO8859-1 or ISO10646-1. However for the ones I have had problems rendering, like ™, Xlib ends up converting to the charset KSC5601.1987-0 (in the case of ™ anyway). I don't know if the character would show correctly if the fontset I loaded had this charset, but unfortunately it is one my fontset reported as missing (from XCreateFontSet()). (I suspect it would show something else if I had the charset available).

If I edit Xlib to skip this charset, or to always try ISO10646-1, then the character correctly converts and displays.

This explains what is happening. The input happens to match characters in KSC5601.1987-0 as well as ISO10646-1 and we try the former first. The question is whether this is a bug. It seems so to me given the behaviour. I mean, this is a South Korean charset and my locale is en_CA.UTF-8. Most importantly the fontset I'm using doesn't even have the charset available. It would make sense to me to check if the fontset has the charset before converting the character to it (try another, for example), or to more intelligently choose the charsets to use. If we list the ISO10646-1 charset first then the problems go away since we convert to that charset.

I filed a bug about this in the freedesktop bugzilla.

Relatedly, I found that if I convert from UTF-8 to XChar2b (an Xlib type) myself, and then use XDrawString16() and skip using fontsets all together (that is, use XLoadQueryFont()) that I can draw all these characters too.

This is another solution, though not as good as using Xft because as I understand it XChar2b cannot represent Unicode fully as it is limited to 16 bits. There is an algorithm to convert to XChar2b here. This way may be better than using fontsets which have the other problems I described.

Xft and iso8859-1 vs. Unicode

Why does Xft not load the Unicode version of a font by default?

Terminus has both iso8859-1 and Unicode versions, yet by default Xft loads the iso8859-1 version (using XftFontOpenName()).

If I delete the iso8859-1 version (you can see the filename using FcPatternPrint()), then Xft loads the Unicode version. I can also load the Unicode version if I find a character not in the iso8859-1 version, and ask Xft for a font that has the character.

There is no way to specify a charset or version of the font to load that I could find (only language). It seems like it would make sense to prefer loading the Unicode one if it is available rather than the iso8859-1 one. It would avoid the needing to load additional fonts. (Well, it would probably always be best to check).

Perhaps preferring iso8859-1 is an optimization. If so it seems a bit misguided. I have seen some projects, such as ratpoison, that do not try to load additional fonts, and so end up not being able to draw these characters.

There are probably configuration options that I could tweak to load the Unicode version by default. But if so it strikes me as an unfriendly default.

Xlib and iso10646-1 charset fonts

Why does using Xlib and requesting an iso10646-1 charset font to load (with XCreateFontSet()) cause ASCII characters to not render?

If I choose a font that has this charset, but don't specify a charset when loading the font, then ASCII characters render correctly. (Without specifying a charset, XCreateFontSet() loads several charsets into the font set, but if you specify one, then it loads only that charset). But specifying only to load iso10646-1 then ASCII characters are missing. This is Unicode (essentially), so it seems to me it should be the one with the most characters available.

The strange part is several non-ASCII characters render. Looking at the font in this charset with xfd shows all characters present.

There is some information about this charset on Markus Kuhn's Unicode fonts and tools for X11 page. From reading that, it sounds to me like loading only an iso10646-1 charset font should be sufficient.

Update:

I figured this out through examining the Xlib code.

The reason is similar to the problem I describe with the problem characters: Xlib's conversion code does not care what charsets are available in the fontset you are using. It converts to various charsets, in order, based on those listed in the X Locale Database. If you then try to display text with your fontset, then it's just too bad if it converted to a charset not available in your fontset.

Take the character a. If we load only an ISO10646-1 charset font into the fontset, and then try to display it, Xlib takes the input as UTF-8 and converts it internally to ISO8859-1, and then tries to display it using this charset in your fontset. But your fontset does not have a charset listed for ISO8859-1 as that is one of the charsets the fontset is missing.

Even though Xlib would be able to convert the a to ISO10646-1, it accepts the match to ISO8859-1 first and does not try any others.

We might think that a is the same in both of these encodings (if we take ISO10646-1 to be UTF-8), but in Xlib this is not the case. I suspect it may be because internally ISO10646-1 is actually UCS-2, at least for fonts. There are comments to that effect in lcUTF8.c anyway.

You can see this is the problem by playing with lcUTF8.c's create_tofontcs_conv() where we set up the encodings to try.

Conclusion

Interacting with X and fonts was more complicated than I expected. There are several libraries at every level, and more levels than I knew about. I think the X.Org recommendation to use a higher level library makes a lot of sense.

I think part of the reason this was difficult is the limited documentation for Xlib and Xft. I suppose it is because of the age as well as the amount of cruft that has built up over time, combined with it being less common to use these libraries directly these days.

Comments