Asian Inspiration

Whenever I end up in Hong Kong and talk to friends there I am generally inspired to work on a few projects of various sorts. Having just finished some travel in China and Hong Kong for work I’ve come up with a good reason to get back to working on the pcb I was designing, but it’s a project that will be better if I can breadboard it out first. I’m hoping to build a small arm system that is smaller than a typical usb stick so that I can more easily do arm work while traveling. As it stands right now, with my typical Olimex LPC board I need 2 mini usb cables to handle both flashing and pulling out UART for a console. That’s on top of the 10 pin connector bridging the daughter board over via UEXT. Unfortunately, breadboarding a 32 pin cortex-m3 is a bit difficult so I imagine the first step will be to design a breakout board for the chip I want to use. Once I create a simple DIP for it and solder it all together I hope to be able to use that to test the other components for the software build before I do a final push for a form factor usb pcb. I’ll also need to write some software to handle passing the UART over to my laptop via USB. That might be a good excuse to play with Go a bit and break out of my C shell, though I’ll have to see if it has libusb support at this time.

Hopefully next update I’ll at least have a schematic up for the DIP board.

Magic Smoke

At this point I’ve worked on enough hardware projects at a few companies to have a good idea of trends in the industry. One of the first things I learned while doing factory support and validation at Palm was how incredible it is that anything electronic is built by anyone within the madness that is the hardware design and fabrication supply chain. But more to my next point, I also was amazed by how many things in hardware are as hackish as you’d expect in zero hour software patches for products on their ship day.

One of my favorite stories along these lines comes from one of the first products I worked on. When you are building a device with a radio and battery there are a number of compliance tests you need to pass, with most of them coming from the FCC or CTIA. Generally you assemble a preproduction unit, make some custom firmware, and then ship them off to a 3rd party testing house who would run tests and send a report back that you could use as needed. However, especially in the case of CTIA, some defined tests were vague in that one testing house could interpret the spirit of a test to be different from another testing house. It was a frustrating process to see something that passed before fail in a new round, but you eventually got the hang of it and began to assume the worst case for each test. But on the product I am referring to, there was a very specific test we could not pass. If I remember correctly, one of the requirements was that if you had a charger plugged in and shut the device off it needed to stay off and not charge the battery past a certain safe threshold.

The test itself was fairly mundane, but the Power Management IC (PMIC) provided to use by the chip vendor had a design flaw. Essentially, if power was applied to the voltage lines the PMIC would boot the main cpu and the device would boot, thus failing the test. This meant that if the charger was attached and the user shut the device off it would immediately restart and boot back into the main OS no matter what we tried when it came to configuring the PMIC. In the end, the only solution I could come up was to have the device effectively ‘pretend’ it was off. If the system was shutdown from Linux with the charger attached it would write a specific value into the retention ram. When the bootloader came up it would check for that value and if it saw it it would not boot Linux. It also would power down the display, radio and other peripherials in the device and begin charging the battery. If the battery hit 95% charged it would physically disconnect it via the charging FET interface. Otherwise, in this state it would for all intents and purposes appear completely off and would only be powering itself via the usb charger. But if the user pressed the power button for ~1 second it would detect it and boot to Linux as if the device was booting cold in the first place. If the charger was ever pulled then the device would completely power off and the value in memory would be gone, so the next boot would be a proper cold boot anyway. This passed every test and to date I’m still not entirely sure whether the testers would have cared if they realized how much we were cheating on that specific part. I like to think that because it was mostly about battery safety that they wouldn’t mind, but I really have no idea.

The more time I spend working on hardware the more I discover this sort of thing isn’t that uncommon. But on the plus side, it’s at least satisfying to come up with clever solutions to things you’re told are impossible.

Working Harder Isn’t My Answer

The last couple weeks have been a bit brutal as I’ve been deeply entrenched in a particularly difficult project to debug. In the end, throwing more hours at it wasn’t as effective as I would have been had I actually stepped back and reminded myself of the basic things I know I should be doing but sometimes forget when I’m writing large swaths of embedded, kernel or OS code. This list is as much a reminder for myself as anyone else who happens to read this.

  • Never assume a variable on the stack is initialized unless you did it yourself
  • Always build with -Wall and fix even the most mundane of warnings before diving into debugging serious issues
  • If you have to make a temporary change or hack always throw in a // XXX and a WIP checkpoint commit into git
  • Make sure the code is valgrind clean at every stage of the process
  • ulimit -c unlimited. Having core dumps for difficult to reproduce races makes a night and day difference
  • export MALLOC_CHECK_=2. Catch the memory errors before they’re entrenched
  • Use memory guards around important structures so you can watch them if needed
  • Verify in the disassembly or symbol dump that what you think you compiled was actually compiled
  • Verify that you’re using the proper byte order
  • Double check the byte order because when it matters it really matters
  • Don’t merely skim the datasheet. Don’t overlook timing tables. Know the details and be able to explain them to anyone who asks
  • Don’t assume, verify
  • Don’t let yourself waste time to errors a compiler could have caught

More Time Planning

Unfortunately, due to the Holidays and finding myself engrossed in a work project I haven’t had time to finish the timer design for the ARM system I’m working on. There are some core bits I can’t decide on in a way that is satisfactory to me so I’ve been kicking around different ideas. To start, if you have X hardware timers but want to support essentially infinite software timers what is the best way to do it? The common approach in a situation like this is to use the system tick timer (systick) and have it tick every Y milliseconds. When its associated interrupt is fired you check if any timers should have their callbacks / events handled and you go back to work. Unfortunately, this also means your timing system has potential to be off by nearly Y every timer since it may fire and reschedule itself right before the deadline you were trying to meet. For that reason I’ve been pondering a system where you utilize all the hardware timers available in a FIFO sort of design, keeping in mind the constraints of each. For example, on the NXP chip I mentioned in a previous post there are two 16 bit timers and two 32 bit timers. A data structure I had in mind for keeping track of all of them is:

typedef struct {
        uint32_t        *base_addr;
        uint32_t        cnt_max;
        uint8_t         in_use;
        void            *callback;
        void            *arg;
} timer_t;

timer_t system_timers[] = {
    { LPC_TM16B0, 0xFFFF, 0, NULL, NULL },
    { LPC_TM16B1, 0xFFFF, 0, NULL, NULL },
    { LPC_TM32B0, 0xFFFFFFFF, 0, NULL, NULL },
    { LPC_TM32B1, 0xFFFFFFFF, 0, NULL, NULL }
};

Under this setup when a timer is set it can walk the timer list and find the first timer that is both unused and has the largest precision that will successfully fit the requirements. Another benefit to this is that all four timers can be mapped to the same interrupt handler which can walk the list to figure out what fired and handle it appropriately. A modification would be made so there was some sort of identifier to properly do that, so I imagine it would be easier to just have a single interrupt handler mapped to each vector via a macro to fill in those characteristics.

My problem right now stems from how I want to handle the infinite timer issue. If timers are configured to fire in some number of microseconds, what happens if all the timers are in use and I try to set up one that should expire in the middle of them? Do I find one that would fire after my new one and replace it? That would accrue a timing penalty as I swapped the data out and reconfigured the hardware block. I would also need to add a field to track the expected time to fire and do a number of comparisions. Considering how sensitive the constraints in that situation are, I think a better approach would be a hybrid approach. Something like using 3 of them for timers as needed, then in overflow cases have the fourth timer dedicated to a systick type behavior. When timers become free they can grab from the waiting timer pool as available. I’ll have to put more thought into it and perhaps prototype it to see how it feels.

Relocation and Binary Angst

A few weeks back I was testing out the flash controller for an NXP chip I had laying around and ran into unexpected issues. On chips with on-board flash memory it’s typical for there to be functions included in the chip’s rom that user code can access for modifying flash. On bigger systems you would tend to have an actual external controller for access to disk and memory space, but on smaller things like a Cortex-M it is generally just included in the chip itself. In the case of this chip, the rom entry point is hardcoded at 0x1FFF1FF1 in memory and is possible to call it via a function pointer like so:

void (*flash_cmd)(uint32_t *input, uint32_t *output) = (void *)0x1FFF1FF1;

This should have been a simple case, but the following code would often result in a hardfault

uint32_t flash_read_partid()
{
    uint32_t cmd[5], ret[4];
    __disable_irq();
    cmd[0] = 54;
    flash_cmd(cmd, ret);
    __enable_irq();
    return 0;
}

At the time it didn’t make a lot of sense to me. The address in *flash_cmd was correct, the stack pointers cmd and ret were both valid. But every time the function was called the system would fault and a seemingly random address would be in the PC register. As it turns out, this has nothing to do with the flash controller at all and everything to do with how data is stored in binaries and how code is actually executed in modern CPUs.

To start, I suppose I should go into binary formats a bit. On OSX programs are generally compiled into the Mach-O format, on Windows the PE format, and on Linux the ELF format. When one is working with cross compiled toolchains and embedded systems typically we are generating a final flat binary that is simply the code laid out flat with no metadata, but before it reaches that point it is typically an ELF binary. ELF can have dozens of sections, but the four most important are:

text: code is stored in this segment
rodata: variables that are constant and immutable
data: values for initialized variables are stored here
bss: variables that are initialized to zero

To put it simply, your code is in the .text section and your variables are in a mixture of the other three depending on their type. In the case of my flash_cmd above it would be stored in data or rodata depending on how the linker script was configured.

When you compile a binary to something like ELF it contains metadata telling the operating system to load sections of it at certain memory addresses. On the NXP chip I’m using, code is stored starting at the address 0x00000000 and memory starts at 0x10000000. As an ELF file my binary would have told the system that data and bss need to be located at 0x10000000 (the start of memory) and that code itself would be at 0x00000000. With no virtual memory or relative memory addressing, the program will look at 0x10000000 despite the data actually being in the 0x00000000 code space and bad things happen. It has the same effect as casting a function pointer to a random memory address and crossing your fingers. For this reason you need to handle this so that everything is in the right place when needed. If we look at the objdump output of the binary we can see flash_cmd is in the binary properly at 0x10000000, but we’re still getting a junk jump and causing a hardfault when accessing it..

000005bc 00000024 T putchar
000005e0 00000011 r hex_tbl
0000071c R __text_end__
10000000 D __data_start__
10000000 00000004 d flash_cmd
10000004 B __bss_start__
10000004 D __data_end__
10000004 00000400 b cm3_stack
10000404 B __bss_end__

To better explain why this is happening it’s easiest to take a look at the linker script::

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(cm3_start)

MEMORY {
    FLASH (rx)  : ORIGIN = 0x00000000, LENGTH = 32K
    RAM (rwx)   : ORIGIN = 0x10000000, LENGTH = 8K
}

SECTIONS {
    /* straightforward, put the vector table in front and lay out text/rodata first */
    .text : {
        . = ALIGN(4);
        KEEP (*(.text.vector_table))
            KEEP (*(.text.vector_table_platform))
            *(.text)
            *(.text.*)
            . = ALIGN(4);
    } >FLASH

    .rodata : {
        . = ALIGN(4);
        *(.rodata)
            *(.rodata.*)
            . = ALIGN(4);
        __text_end__ = . ;
    } >FLASH

    /* This represents initialized memory values. 
       place data at __text_end__ but link so that references are in ram */
    .data : AT(__text_end__) {
        . = ALIGN(4);
        __data_start__ = . ;
        *(.data)
            *(.data.*)
            . = ALIGN(4);
        __data_end__ = . ;
    } >RAM

    /* data to clear on boot (ex: the stack) */
    .bss : {
        . = ALIGN(4);
        __bss_start__ = . ;
        *(.bss)
            *(.bss.*)
            . = ALIGN(4);
        __bss_end__ = . ;
    } >RAM
}

There’s a lot going on here but the important details are to notice there are two MEMORY listings, and the sections I mentioned earlier are here. You’ll also see symbols I’m creating like __data_start__ and such, they’re the important detail. Since everything in the binary is all in one place, by creating these I know where the data structures and such of the executable are stored. Then at runtime I can copy from those addresses to the actual address in ram so that they can be accessed without causing faults and bad jumps. If this isn’t done then it means when the memory is accessed it will not contain the properly initialized values and issues like the one I was seeing will appear. An example of how to do this is in the first function PC points to at boot

void cm3_start(void)
{
    uint32_t *src = &__text_end__, *dest = &__data_start__;
    while (dest < &__data_end__)
        *dest++ = *src++;

    uint32_t *bss = &__bss_start__;
    while (bss < &__bss_end__)
        *bss++ = 0;

    platform_init();
    main();
}   

Hopefully it’s straightforward with the earlier explanation in mind. Another detail to note is that I am zeroing out the bss section. This is because you can not assume memory is cleared on boot, whereas values in bss are expected to be initialized to zero. For this reason the entire section is modified by hand.

In the end, my flash issues were because I had incorrectly assigned one of my linker symbols for relocation. But like most things in embedded, figuring out the problem is usually a great learning experience.

Keeping Time on ARM

Implementing support for high resolution timers on ARM platforms is an interesting thought exercise if you’re used to having your OS do the work for you. In this case, I’m using a Cortex-M3 as a part of a LPC1343 package. Timers can vary from vendor to vendor, but the Cortex-M3 spec at the very least defines a Systick timer. In the case of the LPC1343 there are also two 16 bit timers as well as two 32 bit timers. Systick can be configured to tick every cycle or every N cycles based on a prescale value, but for the purposes of most userspace applications it’s not a good fit. When an interrupt is triggered if excessive time is spent in the handler then the clock itself can be delayed or skewed. Obviously this isn’t a good fit for general userspace tasks, but fortunately there are better options for this chip.

The four timers mentioned earlier all have similar configuration registers and interfaces. The only difference between the two sets mentioned are the max value of the clock ticks. A tick itself isn’t that useful for the purposes of timing, but mapping to actual timings isn’t very difficult. A quick primer:

Imagine a 1 Mhz cpu:
1 Mhz = 1,000,000 Hertz per second
1 second = 1,000,000 microseconds
1 cycle = 1 microsecond

The main registers of interest for our uses are the Timer Control Register, Prescalar Register and Match register. The first controls enabling/disabling/resetting and not much else. However, the prescale and match registers are more interesting. A value in the prescale register will delay increasing the cycle counter until that many cycles have passed. This is crucial for accurate timings due to varying clock speeds. If a prescale value is set to the same value as the cpu’s clock rate in Mhz then it means every cycle counted is a microsecond. From there we can wire up interrupt handlers or simple wait loops to trigger upon a change.

More interestingly, you can also configure a match value to fire off an event when the timer counter matches. These can be interrupts, value resets or digital io shifts. For example, we could configure a line to toggle every time a certain amount of time has passed (this is in effect a PWM output on the line).

1ms pwm