Monday, July 31, 2006

type sizes in C vs bit sizes

An interesting difference between C type sizes and the architectures that they're hosted on has come across in a rather annoying manner since I've been working on byteswapping builtins for gcc. The standard library function (for integers at least) comes in the standard, long, and long long styles, e.g. ctz, ctzl, ctzll. This has some odd side effects for things which usually return a value of the size of a register or the size of the input. Writing a general routine when you're using a cross compiler is difficult because it depends on the size of the type on the target machine which isn't always readily available. This is why a lot of these routines should be based on the size of the type that was wanted - based on the types in stdint.h for example.

For the new byte swapping builtins I followed this idea, we now have __builtin_bswap32 and __builtin_bswap64 which take and return types of int32_t and int64_t respectively. We needed to add some additional size specific types into gcc for this, but it'll help when we want to specify additional builtins of this sort. Hopefully future revisions of the various standards will have standard libraries that require sizes and types instead of just types.

Wednesday, July 26, 2006

mythical man month

This post on silver bullets is a great one. Makes you realize that while some software is just poorly designed, software engineering is an incremental process and that anyone that thinks different is fooling themselves. The author is mostly talking of fad technologies, but I also remember another paper of the problem with "throw-it away and design it again" software engineering. Sometimes you can do it better again. Most of the time you can't and just end up wasting a lot of time and money. The Mythical Man-Month should be required reading for all software engineering managers - and not as a "we can do better than that" document.

Sunday, July 23, 2006

current projects

I've been working on various bits of compiler work for Apple lately. A lot of the public work that's been seen has been of the cleanup variety - fixing up testcase failures from previous releases. However, I've also been working on some bugs relating to the object file format. We've made some recent maneuvers toward having a more sane definition in the toolchain - moving toward linker generated gots and plts, but for a great deal of the existing toolchains this would require a lot of work.

In my previous work I hadn't needed to work around limitations in the object file format - at least not in any great degree. Sure the MIPS ABI could use some changes, and there definitely aren't enough bits in the elf flags field for all of the various instruction sets or processors for the target. In the grand scheme of things these are small peanuts. Mach-O (or macho) is the format that was chosen at work about the time that work on OS X was begun.

Mach-O isn't very flexible. The object file format contains bitfields, not the least of which is for relocations. There are 4 bits of relocations available to a single architecture. For those of you counting at home this gives you 16 relocations. The fact that mach-o by default defines five of them leaves an architecture 11 additional relocations that you can define. Under ELF even the x86 port has 38 relocations, including the ones for TLS. The PowerPC mach-o port is out of relocations completely and yet if we had another 4 bits we probably couldn't do all of the TLS relocations we should have. If we want to get rid of the the compiler generated stubs, or have compiler generated thread-local storage a new way of defining relocations is needed. Or we could move to ELF.