Wednesday 20 May 2015

Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 2 - RedBoot reverse engineering and APEX hacking

(continuation of Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 1; meanwhile, my article was mentioned briefly in BSDNow Episode 89 - Exclusive Disjunction around minute 36:25)

Choosing to call RedBoot from a hacked Apex


As I was saying in my previous post, in order to be able to automate the booting of the NetBSD image via TFTP, I opted for using a 2nd stage bootloader (planning to flash it in the NSLU2 instead of a Linux kernel), and since Debian was already using Apex, I chose Apex, too.

The first problem I found was that the networking support in Apex was relying on an old version of the Intel NPE library which I couldn't find on Intel's site. The new version was incompatible/not building with the old build wrapper in Apex, so I was faced with 3 options:
  1. Fight with the availabel Intel code and try to force it to compile in Apex
  2. Incorporate the NPE driver from NetBSD into a rump kernel to be included in Apex instead of the original Intel code, since the NetBSD driver only needed an easily compilable binary blob
  3. Hack together an Apex version that simulates the typing necessary RedBoot commands to load via TFTP the netbsd image and execute it.
After taking a look at the NPE driver buildsystem, I concluded there were very few options less attractive that option 1, among which was hammering nails through my forehead as a improvement measure against the severe brain damage which I would probably be likely to be inflicted with after dealing with the NPE "build system".

Option 2 looked like the best option I could have, given the situation, but my NetBSD foo was too close to 0 to even dream to endeavor on such a task. In my opinion, this still remains the technically superior solution to the problem since is very portable and a flexible way to ensure networking works in spite of the proprietary NPE code.

But, in practice, the best option I could implement at the time was option 3. I initially planned to pre-fill from Apex my desired commands into the RedBoot buffer that stored the keyboard strokes typed by the user:

load -r -b 0x200000 -h 192.168.0.2 netbsd-nfs.bin
g
Since this was the first time ever for me I was going to do less than trivial reverse engineering in order to find the addresses and signatures of interesting functions in the RedBoot code, it wasn't bad at all that I had a version of the RedBoot source code.

When stuck with reverse engineering, apply JTAG


The bad thing was that the code Linksys published as the source of the RedBoot running inside the NSLU2 was, in fact, a different code which had some significant changes around the code pieces I was mostly interested in. That in spite of the GPL terms.

But I thought that I could manage. After all, how hard could it be to identify the 2-3 functions I was interested in and 1 buffer? Even if I only had the disassembled code from the slug, it shouldn't be that hard.

I struggled with this for about 2-3 weeks on the few occasions I had during that time, but the excitement of leaning something new kept me going. Until I got stuck somewhere between the misalignment between the published RedBoot code and the disassembled code, the state of the system at the time of dumping the contents from RAM (for purposes of disassemby), the assembly code generated by GCC for some specific C code I didn't have at all, and the particularities of ARM assembly.

What was most likely to unblock me was to actually see the code in action, so I decided attaching a JTAG dongle to the slug and do a session of in-circuit-debugging was in order.

Luckily, the pinout of the JTAG interface was already identified in the NSLU2 Linux project, so I only had to solder some wires to the specified places and a 2x20 header to be able to connect through JTAG to the board.


JTAG connections on Kinder (the NSLU2 targeting NetBSD)

After this was done I tried immediately to see if when using a JTAG debugger I could break the execution of the code on the system. The answer was sadly, no.

The chip was identified, but breaking the execution was not happening. I tried this in OpenOCD and in another proprietary debugger application I had access to, and the result was the same, breaking was not happening.
$ openocd -f interface/ftdi/olimex-arm-usb-ocd.cfg -f board/linksys_nslu2.cfg
Open On-Chip Debugger 0.8.0 (2015-04-14-09:12)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.sourceforge.net/doc/doxygen/bugs.html
Info : only one transport option; autoselect 'jtag'
adapter speed: 300 kHz
Info : ixp42x.cpu: hardware has 2 breakpoints and 2 watchpoints
0
Info : clock speed 300 kHz
Info : JTAG tap: ixp42x.cpu tap/device found: 0x29277013 (mfg: 0x009,
part: 0x9277, ver: 0x2)
[..]

$ telnet localhost 4444
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger
> halt
target was in unknown state when halt was requested
in procedure 'halt'
> poll
background polling: on
TAP: ixp42x.cpu (enabled)
target state: unknown
Looking into the documentation I found a bit of information on the XScale processors[X] which suggested that XScale processors might necessarily need the (otherwise optional) SRST signal on the JTAG interface to be able to single step the chip.

This confused me a lot since I was sure other people had already used JTAG on the NSLU2.

The options I saw at the time were:
  1. my NSLU2 did have a fully working JTAG interface (either due to the missing SRST signal on the interface or maybe due to a JTAG lock on later generation NSLU-s, as was my second slug)
  2. nobody ever single stepped the slug using OpenOCD or other JTAG debugger, they only reflashed, and I was on totally new ground
I even contacted Rod Whitby, the project leader of the NSLU2 project to try to confirm single stepping was done before. Rod told me he never did that and he only reflashed the device.

This confused me even further because, from what I encountered on other platforms, in order to flash some device, the code responsible for programming the flash is loaded in the RAM of the target microcontroller and that code is executed on the target after a RAM buffer with the to be flashed data is preloaded via JTAG, then the operation is repeated for all flash blocks to be reprogrammed.

I was aware it was possible to program a flash chip situated on the board, outside the chip, by only playing with the chip's pads, strictly via JTAG, but I was still hoping single stepping the execution of the code in RedBoot was possible.

Guided by that hope and the possibility the newer versions of the device to be locked, I decided to add a JTAG interface to my older NSLU2, too. But this time I decided I would also add the TRST and SRST signals to the JTAG interface, just in case single stepping would work.

This mod involved even more extensive changes than the ones done on the other NSLU, but I was so frustrated by the fact I was stuck that I didn't mind poking a few holes through the case and the prospect of a connector always sticking out from the other NSLU2, which was doing some small, yet useful work in my home LAN.

It turns out NOBODY single stepped the NSLU2

 

After biting the bullet and soldering JTAG interface with also the TRST and the SRST signals connected as the pinout page from the NSLU2 Linux wiki suggested, I was disappointed to observe that I was not able to single step the older NSLU2 either, in spite of the presence of the extra signals.

I even tinkered with the reset configurations of OpenOCD, but had not success. After obtaining the same result on the proprietary debugger, digging through a presentation made by Rod back in the hay day of the project and the conversations on the NSLU2 Linux Yahoo mailing list, I finally concluded:
Actually nobody single stepped the NSLU2, no matter the version of the NSLU2 or connections available on the JTAG interface!
So I was back to square 1, I had to either struggle with disassembly, reevaluate my initial options, find another option or even drop entirely the idea. At that point I was already committed to the project, so dropping entirely the idea didn't seem like the reasonable thing to do.

Since I was feeling I was really close to finish on the route I had chosen a while ago, I was not any significantly more knowledgeable in the NetBSD code, and looking at the NPE code made me feel like washing my hands, the only option which seemed reasonable was to go on.

Digging a lot more through the internet, I was finally able to find another version of the RedBoot source which was modified for Intel ixp42x systems. A few checks here and there revealed this newly found code was actually almost identical to the code I had disassembled from the slug I was aiming to run NetBSD on. This was a huge step forward.

Long story short, a couple of days later I had a hacked Apex that could go through the RedBoot data structures, search for available commands in RedBoot and successfully call any of the built-in RedBoot commands!

Testing with loading this modified Apex by hand in RAM via TFTP then jumping into it to see if things woked as expected revealed a few small issues which I corrected right away.

Flashing a modified RedBoot?! But why? Wasn't Apex supposed to avoid exactly that risky operation?


Since the tests when executing from RAM were successful, my custom second stage Apex bootloader for NetBSD net booting was ready to be flashed into the NSLU2.

I added two more targets in the Makefile in the code on the dedicated netbsd branch of my Apex repository to generate the images ready for flashing into the NSLU2 flash (RedBoot needs to find a Sercomm header in flash, otherwise it will crash) and the exact commands to be executed in RedBoot are also print out after generation. This way, if the command is copy-pasted, there is no risk the NSLU2 is bricked by mistake.

After some flashing and reflashing of the apex_nslu2.flash image into the NSLU2 flash, some manual testing, tweaking and modifying the default built in APEX commands, checking that the sequence of commands 'move', 'go 0x01d00000' would jump into Apex, which, in turn, would call RedBoot to transfer the netbsd-nfs.bin image from a TFTP to RAM and then execute it successfully, it was high time to check NetBSD would boot automatically after the NSLU was powered on.

It didn't. Contrary to my previous tests, no call made from Apex to the RedBoot code would return back to Apex, not even the execution of a basic command such as the 'version' command.

It turns out the default commands hardcoded into RedBoot were 'boot; exec 0x01d00000', but I had tested 'boot; go 0x01d0000', which is not the same thing.

While 'go' does a plain jump at the specified address, the 'exec' command also does some preparations so it allows a jump into the Linux kernel and those preparations break some environment the RedBoot commands expect. I don't know which those are and didn't had the mood or motivation to find out.

So the easiest solution was to change the RedBoot's built-in command and turn that 'exec' into a 'go'. But that meant this time I was actually risking to brick the NSLU, unless I
was able to reflash via JTAG the NSLU2.


(to be continued - next, changing RedBoot and bisecting through the NetBSD history)

[X] Linksys NSLU2 has an XScale IXP420 processor which is compatible at ASM level with the ARMv5TEJ instruction set

No comments: