X, fonts, and rpbar
Recently I fixed a text display issue in an X program I use called rpbar. I learned about X libraries, drawing text, and using fonts in X. I'm going to talk a bit about what I learned through describing how I resolved the problem.
The problem
rpbar is a task bar. It is a horizontal bar with buttons, where each button represents a window you are running. The text on each button is the window's title. rpbar looks like this:
I noticed that titles with non-ASCII characters would not display correctly.
Sometimes rpbar truncated titles starting at the non-ASCII character. For example, it showed a title with the trademark symbol ™ (U+2122) only up to where the ™ should be. Other times it showed characters but they were clearly corrupted, such as showing extra characters that were not present.
This looked to me like improper handling of character encoding. My system is set to use UTF-8 encoding, so I expected that rpbar was in some way mistranslating its input/output. I thought to trace through the program and make sure it correctly expected UTF-8 throughout. Specifically I wanted to see where the window titles were coming from, and examine their path through to showing on the screen.
Initial investigation
rpbar uses the Xlib library for its interaction with X. This is a C library interface to the X protocol. Its documentation is here (many functions also have man pages).
The X.Org developers deprecated this library for new development. It is still available but they recommend not using it and to transition away. They recommend using something like GTK+ or Qt instead, both of which are fairly heavyweight in comparison. There are also two newer lower level replacements for Xlib. One is XCB, the current recommended library, and the other is Xt, which like Xlib they have deprecated.
I hoped to avoid adding a new library dependency to rpbar if possible. I expected this would be more palatable for a pull request as well as be a smaller change.
rpbar retrieves the window titles by running a command with popen()
and then
repeatedly running fgets()
to retrieve the output. It reads into C++
std::string
s which it eventually draws to the screen using XmbDrawString()
,
an Xlib function. I did not see any translation between character encodings.
Looking at Xlib, I found several functions available to draw text to the screen:
XDrawString()
XDrawString16()
XmbDrawString()
- Drawing multibyte stringsXwcDrawString()
- Wide character strings (wchar_t
)Xutf8DrawString()
Could using a different one of these fix the problem? I thought the UTF-8 one in particular sounded promising.
Attempt 1: Xlib, locale, and alternate draw string functions
While rpbar using XmbDrawString()
seemed correct since UTF-8 is a multibyte
encoding, I decided to try switching to the Xutf8DrawString()
function. It is
not clear to me the intended difference between the two functions, and I did
not find any documentation about why the developers created the second. The man
pages for these does not give much to go on.
I believe now that either should work (for a given value of work) when working
with UTF-8. I suspect Xutf8DrawString()
works without needing to rely on
global locale information, whereas XmbDrawString()
supports different
multibyte encodings depending on your locale. Either way, I expect UTF-8
character encoding throughout. But I decided to try switching to see the
behaviour.
Unfortunately I saw no difference in behaviour after changing to
Xutf8DrawString()
.
I started looking at how rpbar loaded the font to use. It called an Xlib
function XCreateFontSet(3)
. The man page for this function says this: "The
font set is bound to the current locale when XCreateFontSet
is called". What
locale was rpbar using? There was no setlocale(3)
call, so it was running in
the C locale!
I added a setlocale(LC_ALL, "")
call in main()
. This loads locale
information from the environment. This immediately improved things. I could now
see some non-ASCII characters correctly, such as the right single quotation
mark ’ (U+2019).
However there were still issues. ™ still did not display correctly. In its
place was "b
. Weird. Why?
When you try to draw a character that is not available in the loaded font set,
then XmbDrawString()
draws either a replacement string, or nothing.
XCreateFontSet()
tells you what this replacement string is. In my case the
replacement string was blank, so nothing should show if the problem was a
missing character.
If you try to draw an invalid codepoint then the behaviour is undefined. However I did not believe this to be the case since the character displayed fine elsewhere (using printf for example!).
Attempt #2: Xlib with different fonts and encodings
At this point I felt stumped, so I tried a few ideas.
- I tried loading different fonts to see if maybe the problem was something to do with the font I was using.
- I tried forcing use of a font with the iso10646-1 (mostly equivalent to Unicode) charset.
- I tried switching between the different
*DrawString()
functions. - I tried converting between different encodings with
iconv()
(in particular I tried UCS-2 and UCS-4).
Nothing I tried drew this character.
I decided I should look at how others solved this problem.
The rpbar author based parts of it on dmenu. I looked at dmenu's text drawing functionality. I found that in 2015 the dmenu developers replaced their Xlib text drawing with Xft. Looking back at the latest Xlib version I did not find anything useful that would solve this problem. In fact dmenu was using almost the same method as rpbar. Maybe dmenu's Xlib version had this very problem. Maybe that is why the dmenu developers switched to Xft. I checked their mailing list for rationale for the switch, but could not find any.
Attempt #3: Moving to Xft, part one
I decided to try Xft. Luckily Xft can work on top of Xlib so I did not need to rip everything apart. I could replace only the text drawing parts.
I replaced the XmbDrawString()
call with Xft's XftDrawStringUtf8()
. I
removed all of the old font code which (initially) simplified rpbar a fair
amount (no more falling back to XLoadQueryFont()
if XCreateFontSet()
failed for one).
I needed to replace the method used to determine font height and width, but happily there were counterparts I could borrow from dmenu.
The conversion to Xft was quite simple. There is a man page, Xft(3), and I also found a tutorial written by the author. Other than these the documentation is limited. I relied heavily on dmenu's code.
I was drawing text with Xft. With the font Inconsolata I found that all characters I tried worked well, including the previously problematic ™. Hurray!
I tried another font, the one used by default in rpbar,
Terminus. Frustratingly I found that
none of the non-ASCII Unicode characters displayed correctly, not even the ones
that were working with Xlib. In their places rpbar showed only an "n
".
The solution: Multiple Xft fonts
I decided to go back to dmenu to see if they did anything I was not. I remembered there was a complex function to draw text, and I had used only the simplest parts.
Sure enough the solution was there in the dmenu source. They use an Xft
function called XftCharExists()
to check each character to see if it is
available in the font. If a character is not available, then they perform some
Fontconfig magic to load
an additional font that does have the character, and draw it with that font.
I added the same logic into rpbar and the strings displayed perfectly in each font I tried. When using Terminus rpbar now loads a second font when a string contains a missing character. (I had to adjust the method to measure the width of a string of text to use this same logic since measuring it with a single font was no longer guaranteed to be accurate).
This means that each time we draw text to the screen it could be in multiple fonts, but at least it means all characters will show up (if any available font has them!).
Thoughts on Xft
Switching to using Xft instead of Xlib for loading fonts and drawing text was
easy. As a bonus, Xft also has a nicer way to specify fonts than Xlib which
uses an opaque XLFD syntax which looks like
-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
. There are
also fonts that work with Xft but not with Xlib, such as my favoured font,
Inconsolata.
There is a downside though. The method in rpbar to draw text is a lot more complicated now with the need to check each character and load additional fonts. I can't rely on a single Xft call to draw a string of text. Possibly this is something that would be good to include in Xft's API. I expect higher level libraries and frameworks would do this for you.
Tips for working with Xft, Xlib, and fonts
A few things I found useful:
- You can see what Xft font you loaded with
XftFontOpenName()
by usingFcPatternPrint(font->pattern)
. This is useful because Xft/Fontconfig do their best to always load something, so it's not always clear if the font you tried to load is the one actually loaded. - With Xft, you should check each character is available in the font(s) you are using, and if not, load another font that does have the character. Otherwise you will be missing being able to draw some. dmenu does this, as do I in this rpbar pull request.
- A couple useful documentation links for Xft.
- You can list Xlib fonts with
xlsfonts
, and Xft fonts withfc-list
. (There are other usefulfc-*
programs too). - You can look at the characters in a font either with
FontForge or with the program
xfd
. An example opening a font withxfd
:xfd -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
(the-misc-fixed-*
part is an XLFD font string).
Unanswered questions
While I solved the problem I wanted to, I did not answer every question I came across. I made several assumptions.
Xlib's XmbDrawString()
vs. Xutf8DrawString()
Which should I use? What's the difference?
After experimentation I've found that XmbDrawString()
can draw more
characters than Xutf8DrawString()
. With the utf8 function, some characters
will show up as "boxes" in the -misc-fixed-medium-r-semicondensed
font that
will appear when using the mb function. I don't know why. Viewing the font with
xfn
I can see the characters.
Update:
I looked at the xlib source code. The source for these two functions ends up
being these two in omText.c
: _XmbGenericDrawString()
and
_Xutf8GenericDrawString()
. They both call _XomGenericDrawString()
but with
different type arguments. The first passes the type as XOMMultiByte
while the
second passes XOMUtf8String
. This last function converts the input to
XChar2b.
When using a UTF-8 locale, both the mbDrawString
and utf8DrawString
functions end up expecting UTF-8 input. In both cases conversion goes through
omXChar.c
's _XomInitConverter()
to lcUTF8.c
where the main conversion
happens.
However there is a difference. In the mbDrawString
case we convert from
multibyte
to fontcharset
(open_utf8tofcs()
). In the utf8DrawString
case
we convert from utf8string
to charset
(open_utf8tocs()
). I'm not sure the
full extent of the differences, nor why this is the case. The effect though is
that some characters don't render correctly.
While it would make sense that the two functions would be identical with a
UTF-8 locale, Xlib internally performs the conversion differently in these two
cases. From what I observe, Xlib supports the mbDrawString
case better. First
piece of evidence: I can render more characters! Second piece of evidence: The
utf8DrawString
function uses an "indirect" conversion (see omXChar.c
).
Xlib's missing character behaviour
With Xlib, is it possible to check if a font/fontset can draw the character correctly?
With Xft I can check each character for whether it is available in a font. I wonder if this can be done with Xlib.
The only way I could see to know this was to rely on XCreateFontSet()
's
replacement string behaviour. But you can only know this after you try to draw
the text to the screen rather than beforehand as with Xft.
Xlib and certain characters not rendering
Why can't I draw certain characters with the Xlib *DrawString()
functions?
For example, ™ renders as "b
(or not at all) whatever I try.
As far as I can tell the fonts I test with have this character available. I can
see it if I open up the font in xfd
. (Fonts I primarily tried being Terminus
and -Misc-Fixed-*
fonts as recommended by Markus
Kuhn. I can also draw the
character fine if I use the Terminus font with Xft. Is it simply that Xlib
is broken for some characters? Why do some non-ASCII characters render and
others don't? The UTF-8 encoded version of ™ is 3 bytes, but others that are 3
bytes render fine. There are characters with higher code points that render as
well, so it is not a question of a cut off.
Update:
I looked at the Xlib source code and figured out what was happening.
What happens for FontSets is that Xlib opens the XLocale Database and finds there a list of encodings to try to convert the input to.
For en_US.UTF-8
the database file is here on my Debian system:
/usr/share/X11/locale/en_US.UTF-8/XLC_LOCALE
. It lists, in order, ISO8859-1
... KSC5601.1987-0
... ISO10646-1
. (I omitted a few).
I found that for the majority of inputs, Xlib converted them to charsets such
as ISO8859-1
or ISO10646-1
. However for the ones I have had problems
rendering, like ™, Xlib ends up converting to the charset KSC5601.1987-0
(in
the case of ™ anyway). I don't know if the character would show correctly if
the fontset I loaded had this charset, but unfortunately it is one my fontset
reported as missing (from XCreateFontSet()
). (I suspect it would show
something else if I had the charset available).
If I edit Xlib to skip this charset, or to always try ISO10646-1
, then the
character correctly converts and displays.
This explains what is happening. The input happens to match characters in
KSC5601.1987-0
as well as ISO10646-1
and we try the former first. The
question is whether this is a bug. It seems so to me given the behaviour. I
mean, this is a South Korean charset and my locale is en_CA.UTF-8
. Most
importantly the fontset I'm using doesn't even have the charset available. It
would make sense to me to check if the fontset has the charset before
converting the character to it (try another, for example), or to more
intelligently choose the charsets to use. If we list the ISO10646-1
charset
first then the problems go away since we convert to that charset.
I filed a bug about this in the freedesktop bugzilla.
Relatedly, I found that if I convert from UTF-8 to XChar2b (an Xlib type)
myself, and then use XDrawString16()
and skip using fontsets all together
(that is, use XLoadQueryFont()
) that I can draw all these characters too.
This is another solution, though not as good as using Xft because as I understand it XChar2b cannot represent Unicode fully as it is limited to 16 bits. There is an algorithm to convert to XChar2b here. This way may be better than using fontsets which have the other problems I described.
Xft and iso8859-1 vs. Unicode
Why does Xft not load the Unicode version of a font by default?
Terminus has both iso8859-1 and Unicode versions, yet by default Xft loads the
iso8859-1 version (using XftFontOpenName()
).
If I delete the iso8859-1 version (you can see the filename using
FcPatternPrint()
), then Xft loads the Unicode version. I can also load the
Unicode version if I find a character not in the iso8859-1 version, and ask Xft
for a font that has the character.
There is no way to specify a charset or version of the font to load that I could find (only language). It seems like it would make sense to prefer loading the Unicode one if it is available rather than the iso8859-1 one. It would avoid the needing to load additional fonts. (Well, it would probably always be best to check).
Perhaps preferring iso8859-1 is an optimization. If so it seems a bit misguided. I have seen some projects, such as ratpoison, that do not try to load additional fonts, and so end up not being able to draw these characters.
There are probably configuration options that I could tweak to load the Unicode version by default. But if so it strikes me as an unfriendly default.
Xlib and iso10646-1 charset fonts
Why does using Xlib and requesting an iso10646-1 charset font to load (with
XCreateFontSet()
) cause ASCII characters to not render?
If I choose a font that has this charset, but don't specify a charset when
loading the font, then ASCII characters render correctly. (Without specifying a
charset, XCreateFontSet()
loads several charsets into the font set, but if
you specify one, then it loads only that charset). But specifying only to load
iso10646-1 then ASCII characters are missing. This is Unicode (essentially), so
it seems to me it should be the one with the most characters available.
The strange part is several non-ASCII characters render. Looking at the font in
this charset with xfd
shows all characters present.
There is some information about this charset on Markus Kuhn's Unicode fonts and tools for X11 page. From reading that, it sounds to me like loading only an iso10646-1 charset font should be sufficient.
Update:
I figured this out through examining the Xlib code.
The reason is similar to the problem I describe with the problem characters: Xlib's conversion code does not care what charsets are available in the fontset you are using. It converts to various charsets, in order, based on those listed in the X Locale Database. If you then try to display text with your fontset, then it's just too bad if it converted to a charset not available in your fontset.
Take the character a
. If we load only an ISO10646-1
charset font into the
fontset, and then try to display it, Xlib takes the input as UTF-8
and
converts it internally to ISO8859-1
, and then tries to display it using this
charset in your fontset. But your fontset does not have a charset listed for
ISO8859-1
as that is one of the charsets the fontset is missing.
Even though Xlib would be able to convert the a
to ISO10646-1
, it accepts
the match to ISO8859-1
first and does not try any others.
We might think that a
is the same in both of these encodings (if we take
ISO10646-1
to be UTF-8), but in Xlib this is not the case. I suspect it may
be because internally ISO10646-1
is actually UCS-2
, at least for fonts.
There are comments to that effect in lcUTF8.c
anyway.
You can see this is the problem by playing with lcUTF8.c
's
create_tofontcs_conv()
where we set up the encodings to try.
Conclusion
Interacting with X and fonts was more complicated than I expected. There are several libraries at every level, and more levels than I knew about. I think the X.Org recommendation to use a higher level library makes a lot of sense.
I think part of the reason this was difficult is the limited documentation for Xlib and Xft. I suppose it is because of the age as well as the amount of cruft that has built up over time, combined with it being less common to use these libraries directly these days.