Patching the timebomb in an R4 flashcart

Published on 22/7/2023

Years ago I got a 2DS with the intention of running homebrew on it. ntrboot, the only exploit available at the time, required a compatible flashcart to be performed. Flashcarts are meant to play DS backups, but can be used for brick recovery, and it's generally a good idea to have one lying around. The one I bought is the R4i-SDHC 3DS RTS.

There's many flashcarts, and some of them include a "timebomb", which means they stop working after a certain date, and you're stuck with an error message unless you update the firmware or replace the cart. Unfortunately, the one I bought had a timebomb. Bypassing it is actually really easy: just set the date on your console to the past, and the check will pass. Needless to say, this can result in other things relying on the console date to break (for example games that check it for cheat prevention), so it can be a major annoyance.

YSMenu and TWiLight Menu exist, they can be used as a menu replacement for many flashcarts, and they don't implement timebomb. It is recommended for the average user to replace their menu with one of them. Instead, I decided to patch out the check from the original kernel, both because of nostalgia (I've had the same flashcart as a kid so I wanted to keep everything original), and because I thought it would be easy.

Starting out

The flashcart firmware, better known as kernel, is installed on the microSD card, and updates can be downloaded from the vendor website. By diffing two kernel packages, one version apart, we can see that only two files are different: R4.dat, and R4iMenu/map.bin.

R4.dat is a Nintendo DS ROM, so we can easily dump its code with Tinke. The DS has two processors: one arm7 cpu, responsible for managing peripherals, and one arm9 cpu, which runs actual game code. Arm7 code is usually very boring, it's supposed to handle interaction with much of the hardware, and barely contains anything application specific (supposedly Nintendo wouldn't allow developers to modify its code). It feels logical to check the arm9 binary first.

After setting up tooling and whatnot, we're ready to find the check. The message that appears when the timebomb is triggered is this one:

timebomb_text.png

I thought of looking for those red strings, but I couldn't find them. What I found instead was the function responsible for printing text on screen. It's called like this:

char v1[20];
char v2[77];
// ...
memcpy(v2, &unk_2087600, sizeof(v2));
sub_201599C(v2, v1);
drawText(1, 20u, 100u, 255, 192, v1, 0x801Fu, 1u, 1u, 100);

Garbage looking data is copied to stack, which is then processed before being passed to the drawText() function. This highly suggests that strings are obfuscated.

String obfuscation

Let's dive into sub_201599C, AKA decodeString:

void decodeString(u16 const* input, char *out)
{
  u32 strSize = sub_2014588(*input) & 0xFF;
  u32 counter = 0;
  
  do
  {
    u16 nextHWord = input[counter + 1];
    out[counter] = (sub_2014588(nextHword) - 
        (u8)(word_208F81C[(counter + (strSize & 0xF) + (strSize >> 4))]));
    counter++;
  }
  while (strSize != counter);
  out[strSize] = 0;
}

The first item of the input data is passed to sub_2014588, which returns the final string size. Every other item of the input data goes through a decoding algorithm which uses a table of half words, which later turned out to be the table for crc16.

sub_2014588, which I called decodeU16, is an extremely interesting function; here's the disassembly:

and r3, r0, #0xff
bic r2, r4, #0xff
push {r4,r5,lr}
orr r4, r3, r2
and r3, r4, #1
bic r1, lr, #4
orr lr, r1, r3,lsl#2
mov r2, r4,lsl#4
and r3, r4, #2
bic r12, r12, #2
orr r12, r3, r12
and r2, r2, #0x80
mov r3, r4,lsl#5
bic lr, lr, #0x80
orr lr, r2, lr
bic r1, r12, #0x80
and r3, r3, #0x80
orr r12, r3, r1
bic r2, lr, #1
mov r3, r4,lsl#27
orr lr, r2, r3,lsr#31
bic r1, r5, #0xff
mov r2, r4,lsl#25
mov r3, r4,lsr#4
bic r12, r12, #1
orr r5, r1, r0,lsr#8
orr r12, r12, r2,lsr#31
bic r1, lr, #2
and r3, r3, #2
orr lr, r3, r1
mov r2, r4,lsr#3
and r3, r5, #1
bic r1, r12, #0x40
orr r12, r1, r3,lsl#6
bic r0, lr, #0x10
and r2, r2, #0x10
mov r3, r5,lsl#2
orr lr, r2, r0
bic r1, r12, #0x20
and r3, r3, #0x20
mov r2, r5,lsl#2
orr r12, r3, r1
bic r0, lr, #8
and r2, r2, #8
mov r3, r5,lsr#2
orr lr, r2, r0
bic r1, r12, #4
and r3, r3, #4
mov r2, r5,lsl#3
orr r12, r3, r1
bic r0, lr, #0x20
and r2, r2, #0x20
mov r3, r5,lsr#1
orr lr, r2, r0
and r3, r3, #0x10
mov r1, r5,lsr#1
bic r12, r12, #0x10
orr r12, r3, r12
and r1, r1, #0x40
mov r2, r5,lsr#3
bic lr, lr, #0x40
orr lr, r1, lr
bic r0, r12, #8
and r2, r2, #8
orr r12, r2, r0
and r3, lr, #0xff
mov r3, r3,lsl#8
and r0, r12, #0xff
add r0, r0, r3
mov r0, r0,lsl#16
mov r0, r0,lsr#16
pop {r4,r5,lr}
bx lr

Reversing this function was trickier than I expected, mainly because despite only needing r0 as argument, it uses r4 and r5 as temporary storage, both of which are not scratch registers according to the default ARM calling convention, and this is enough to break decompilers (I've tried multiple decompilers and they all seem to not handle this function correctly).

I wrote a horrible python script to translate each instruction to C code, then I've asked in a couple places if anyone could help me optimize it. stuckpixel hit me with this:

u16 decodeU16(u16 in) {
  u16 out = 0;

  out |= ((in & 0x0040) >>  6);
  out |= ((in & 0x0002) >>  0);
  out |= ((in & 0x1000) >> 10);
  out |= ((in & 0x4000) >> 11);
  out |= ((in & 0x2000) >>  9);
  out |= ((in & 0x0800) >>  6);
  out |= ((in & 0x0100) >>  2);
  out |= ((in & 0x0004) <<  5);
  out |= ((in & 0x0010) <<  4);
  out |= ((in & 0x0020) <<  4);
  out |= ((in & 0x0001) << 10);
  out |= ((in & 0x0200) <<  2);
  out |= ((in & 0x0080) <<  5);
  out |= ((in & 0x0400) <<  3);
  out |= ((in & 0x8000) >>  1);
  out |= ((in & 0x0008) << 12);
  
  return out;
}

Lovely. Now we can reimplement the algorithm to decode every text that is printed on screen. Once we find the one we want, we can no-op the timebomb check and call it a day. Sounds easy, right? Unfortunately, modifying any byte in the arm9 binary seems to make the kernel freeze on boot.

Integrity checks

Remember when I said that the arm7 binary usually doesn't contain anything special? As it turns out, it's the one responsible for generating checksums for both arm9 and arm7 binaries. Whoops, my bad.

Both arm9 and arm7 checksums are crc16 checksums, and you may ask

How can arm7 verify itself? It would need to embed the checksum, which would change the checksum itself.

In fact, it doesn't, arm9 is responsible for verification.

But isn't the problem just shifted to the other binary?

To answer that, let's look at how the checksum for arm9 is generated:

u16 crc = 0xFFFF;

for (i = 0; i < sizeInWords; i++) {
  u16 currHWord = 0;
  while (1) {
    currHWord = ((u16*)0x2000000)[i];
    u16 tmp = ((u16*)0x2000002)[i];

    if (currHWord != 0x2F3F || tmp != 0x4023)
      break;

    i += 8;
  }

  if (sizeInWords - 0x900 >= i || sizeInWords - 0x200 <= i )
    crc = (crc >> 8) ^ crc16_table[(currHWord ^ crc)];
}

The difference with standard crc16 is that if 0x3F2F2340 is encountered, the algorithm skips the next 16 bytes. Additionally it does some other bound checks, but I'm not sure what their purpose are.

As it turns out, there is a "secret area" which contains some data that doesn't affect the arm9 checksum. After a couple days of research I figured the content of the secret area as a whole:

struct SecretArea {
    u32 magic;
    u16 checksum9;
    u16 checksum7;
    u16 checksumLdr;
    u32 checksumArea;
    u32 dldiOffset;
};

checksumArea deserves a special mention. It's the checksum of all the other checksums:

void cipher(char const* input, u8 *outWords) {
  u8 output0[8];
  des_encrypt(output0, &DES_KEY_1, input);

  u8 output1[8];
  des_encrypt(output1, &DES_KEY_2, output0);
  des_encrypt(output0, &DES_KEY_2, output1);
  
  for (int i = 0; i != 8; i++) {
    outWords[i] = output0[i];
    outWords[i + 8] = output1[i];
  }
}

u32 genAreaChecksum(u32 mixedChk, u32 ldrChk)
{
  char tmpBuffer[9];
  sprintf(tmpBuffer, "%08lx", mixedChk);

  for ( i = 0; i != 8; ++i )
    s[i] = tmpBuffer[i];

  s[8] = 0;

  u32 outputWords[4];
  cipher(s, outputWords);

  if ( outputWords[0] >= 0x2000000u )
    return outputWords[0] - ldrChk;
  else
    return outputWords[0] + ldrChk;
}

//...
genAreaChecksum((checksum7 << 16) | checksum9, checksumLdr);

It's generated by putting together checksum9 and checksum7 as a word, then transforming this value as a hex string. Then, the result is used as the input block for DES encryption, which is done three times, each time with an input that is based on the output of the previous step. The keys are fixed and stored in the arm9 binary, while one of the tables used by the algorithm is custom. Finally, the output blocks are joined to make up 4 words, and the first one is compared against 0x2000000; if it's less checksumLdr is added to it, else it's subtracted. Fun stuff.

Once integrity check has been figured out, we can write a tool that automatically fixes checksums, so we can patch code without problems. And with that, we can finally boot a kernel meant to "explode" in 2024, in 2025:

success.jpg

Conclusion

This started as an afternoon project, and ended up taking one week. I didn't really expect it to be this convoluted, but i'm not complaining; I had limited debugging powers as the kernel requires the cartridge to run, so it was a good exercise for my reverse engineering intuition. Most importantly, I had lots of fun, especially reimplementing all the algorithms, which you can find as a standalone tool here.