Mouse Adventures #3: Writing a Disassembler

Published 12th Dec 2018

Wherein I rip apart the firmware file and write a disassembler for Holtek's weird RISC-ish microcontroller architecture, in a quest to learn some secrets from the mouse's source code.

Last time on this cursed quest...

In Part 1, I introduced the TeckNet HyperTrak Gaming Mouse and described how it receives status notifications from the mouse. In Part 2, I tracked down some variants of the same hardware (all sold under different brands, and one costing more than twice as much as mine - shocking!), figured out the basics of the main protocol, and made a start on extracting the firmware.

We've now got a MTP file which presumably contains the firmware. It's a couple of hundred bytes over 32KB, which heavily suggests it's 32KB worth of flash data along with a bit of extra metadata. How can we find out? Well, there's a few reasonable things we can try at this stage...

Tearing apart the MTP file

ash@krompfty:/mnt/c/src/re/tecknet/mtp_experiment$ file Tecknet.mtp
Tecknet.mtp: data

Okay, that's not useful, but it's still usually worth a shot just in case. Let's try strings.

ash@krompfty:/mnt/c/src/re/tecknet/mtp_experiment$ strings Tecknet.mtp
    HASM 2.93
V8.6
:B( )
0_eN
)!xW(
s!}E"!|
(K#q`
(!|!
h:|Vz
(VvMd"G#E$E%E&E
) f<
=Q)A"Te
&1f<
9f8L)
9 )$
W*)&
9E)$
=L)f<
&1'5
7  _$
!        
14$      *
!    E" 
|3*f8
9<*!x
p!p>*!t>*
PW*^*RG
SOGPE
!w!|
)=1P
*-G,B
@+GQB
t_u_v_w_
7 uG
        @   
@e_f_g_
81+aG
@a_b_c_d_
ZdGd
+2s4s6s
s2wuG
2w4w6w
+"_#_
@%#aG
+$_%_
@%#aG
,#G"E%E$E
@g_'G
@"G#D
@$G%D
x'_"_$_#_%_
#G"E
%G$E
D%  G
9%  G
D%Wi
}$- _
8W-wG
<W-wG
=m-wG
9`-/G
@c_'G
@c_oG
/.E.H.K.P.R.
V.].f.i.l.o.
0:.c_d_e_f_g_h_
@9.b_
bG(e
/2/9/@/G/N/bG
dD_E
T/bG
T/bG
T/bGe
aGbCcCdCeCfCgChC
m;q/
bG(e
=;h`G@
TBhrp
        @   
FtG1D
huG2D?
1G3D
@2G4D
r8_8Gg`
i8Gg`
z i"i
, -[i
h   GbaWi   G
vivivivivi
<Wi 
TQGg`RF
UQGg`RF
@QGg`RF
j$jQG
YQGRF
j2jQG
XQGRF
j2jQG
=$jQG
=8jQG
PRjWi
WiWjmj
jQGRB
TQUg`wF
jQGQUg`wF
jQWpjwj
UQUg`wF
jQGQUg`wF
jQGRB
@QUg`wF
jWiWi
y1k G
&k\khk
yfk G
yvk G9%+
_   G9%+
9%Wi
89-bi
QCtG
RCtG
@QG B
;G(eA_
T:u:p
p@_Wi
W\l:X
t:ycl:u9m
Pxl:q
l:u9e
;G(eA_
lAG?
lAG>
lAG?
lAG>
G(eA_
@SetG
6H4H6
H_We 
HuHs
]@jd}Hp
@I_J_JG>f
@E_F_G_d
H0GG
qHvN_NG
qHrN_NG.n
E|%nE]
    $   A   HG
Jn\n
nH}yn
SJG}
nIGh
mHuJG
TJG>f
@NGHz
H~EB
<N_NGHz
fH~.f
HwHs
pHud
gQGt|RG
<(5)5
=(6)6
QDRE
6H{~
@Cn~G
f6f5
  ` 
  !`!`!`!
!@!@!@!
! "`"
#4 t 
 4!t!t!t!
!X!X!X!
0g<4
qwitiG
vqvn_
ipiG
s/g51
/g6-
Pqpl
eGg`iF
|qqjX
eGg`
|k2~
9qpi_j
#   1   ?   
l]eG
'   5   C   
L   q   
l]st
lGgB
gUcEdEhE
eGg`rF
gGhEdE
eGg`
gUhEdE
-...
y.~.
`G_B
    0   1u
    0   1u
    wg+
ypfF
bd9#
[eqV
~\Nl
27ZG
Ucg8
5(3b
@v$kP
@g")qj
~=;e
Sd'Q
Ky@;_|We
2s9%
6+6n
v-n[
9qg~~O
 c():
4U.5
et2kPx
`PDco=
u<#4
`6BO])
i&%'
#`jE
x[CW
    Oug"E
o^}@
+;8XB
    39y
#X\y
$ARe
kh  W0:
`ZnO
^%u]
l5}C
vn?M
>yMn;A
)OiSv
[&I1
8/|f
4J N@M
8&nv
=<.^3;
M,PU
^Xsm
    q`B
aGb3/

Mostly garbage, except for the first two strings. HASM 2.93 sounds like it could be the signature of an assembler. Let's check for UTF-16 strings as well, just in case...

ash@krompfty:/mnt/c/src/re/tecknet/mtp_experiment$ strings -el Tecknet.mtp
E-Signal
USB Gaming Mouse

Now that's interesting - two readable, sensible strings. Let's hexdump the file and see what we get.

00000000: c02d 00ee 0001 0010 0008 00ff 3fee 0011  .-..........?...
00000010: 0008 00ff 3f17 0212 0008 00ff 3f00 4013  ....?.......?.@.
00000020: 0008 00ff 3f00 0014 0008 00ff 3f10 0015  ....?.......?...
00000030: 880e 0000 0948 4153 4d20 322e 3933 0000  .....HASM 2.93..
00000040: 4c88 0900 000a 5638 2e36 0000 7388 0500  L.....V8.6..s...
00000050: 000b 0787 da88 0600 0014 0200 025a 8808  .............Z..
00000060: 0000 1266 0001 6001 9665 0580 0000 0000  ...f..`..e......
00000070: 0a3e 8a3a 4228 2029 0400 8517 0528 0300  .>.:B( ).....(..
00000080: 0400 840f 0528 0000 00ae 200f 0528 0000  .....(.... ..(..
00000090: 60a8 050f 0528 0000 0400 420f 0528 0000  `....(....B..(..
000000a0: 0400 0b0f 0528 0000 0400 010f 2a28 0000  .....(......*(..
000000b0: 0400 1220 0810 4111 0400 5923 d7b4 65f2  ... ..A...Y#..e.
000000c0: 0400 040f f540 745f 051f 0100 e63d 0300  .....@t_.....=..
000000d0: 8517 2d28 f457 2d28 f557 2d28 0300 8b40  ..-(.W-(.W-(...@
000000e0: 8e0f 0100 6638 e63d 0300 8517 3928 8b57  ....f8.=....9(.W
000000f0: 3828 0300 0100 560f 8e00 663a 39a1 7f37  8(....V...f:9..7
00000100: 6837 e832 5d21 8d21 6e21 0da0 4a60 e831  h7.2]!.!n!..J`.1
00000110: ff33 6632 9031 1030 5f65 4ea0 8f25 640f  .3f2.1.0_eN..%d.
00000120: 3720 1f70 b429 2178 5728 1f74 010f f440  7 .p.)!xW(.t...@
00000130: f640 030f f540 4665 c55f c65f c75f c875  .@...@Fe._._._.u
00000140: 9473 0100 663c f528 f178 6928 907f 7728  .s..f<.(.xi(..w(
00000150: 9073 2c67 820f 9b67 8040 1073 9072 1078  .s,g...g.@.s.r.x
00000160: 907c b528 9079 8628 9075 0047 d140 820f  .|.(.y.(.u.G.@..
00000170: 8e67 1034 ea67 4a67 0da0 1030 107b 8e28  .g.4.gJg...0.{.(
00000180: 1077 1034 8d21 c3a1 5f65 1030 107a 9e28  .w.4.!.._e.0.z.(
00000190: 1076 1034 0047 80a5 ad67 7267 ea67 0047  .v.4.G...grg.g.G
000001a0: 80a5 15a0 19a0 2da0 0da0 1030 907a b528  ......-....0.z.(
000001b0: 9076 1034 0047 80a5 020f 8400 800f 8300  .v.4.G..........
000001c0: 200f b267 810f 9e67 8440 860f 0043 9b67   ..g...g.@...C.g
000001d0: 8140 1030 2171 1170 c873 217d 4522 217c  .@.0!q.p.s!}E"!|
000001e0: 1178 bb28 e92a 9178 bf28 9174 0767 c87f  .x.(.*.x.(.t.g..
000001f0: 3d65 947b c928 9477 850f 9b67 2631 053c  =e.{.(.w...g&1.<

Nothing really stands out here, aside from the strings we already saw. We're probably looking at code in some form. Trouble is, without any knowledge of the architecture, we can't really tell what's code and what isn't. The HASM signature we see at offset 0x35 could be something that the build tools incorporate into the binary that gets flashed onto the mouse, or it could just be part of the MTP file's header.

We've got two options here. We can further investigate the updater and ISPDLL.dll, or we can go look for some build tools. Let's try the latter!

Much to my surprise, it turns out that Holtek publishes a lot of development tools on their website. Free download, no authentication required, no need to fill in a form with your name and email address and company name so they can spam you. Incredible. A quick read through the "MCU Tools" section and there's some promising stuff:

ICE Software: includes HT-IDE3000, which is "Integrated development Environment software for all series of Holtek MCU"
Programmer Software: includes HOPE3000 (software for Holtek's programmers) and I3000 ("Bootloader ISP Programming Tool")
Development Platform Software: includes "Holtek USB Workshop" which promises to be a "Development Platform for USB MCU"

Let's grab HT-IDE3000 and Holtek USB Workshop, as those seem to be the most relevant to us. We know we're working with a USB MCU because of the controller model names we saw in the EFORMAT.INI file; it's a pretty safe bet that this mouse includes one of those.

The USB Workshop

Opening up the Holtek USB Workshop gets you a wizard which doesn't even render properly on a high-DPI display. "This wizard helps you create a USB solution project", it boasts. "Click Next to continue, or Cancel to exit the wizard", it continues. I'd hope that anyone programming a USB microcontroller would at least be able to figure out a wizard with Next and Cancel buttons.. or am I expecting too much? 🤔

We're first asked to choose a project type. Our options are I/O Control, Data, Event, Virtual COM Port, and "Holtek HT68FB550 Gaming Mouse". That's ever-so-slightly curious. It's worth a look though, even if it's just out of sheer curiosity.

Selecting the mouse and clicking Next skips past all the options to "Generated Files", which tells you that it's going to give you a bunch of assembly files, some documentation in PDF form and a GamingMouse.exe file to run on the host.

(This is the point where I just give up and change my system DPI to 100%.. because fuck, Windows is bad at HiDPI.)

Cool, so there's actually a lot of stuff in this project. Will it help us in any way? Hopefully! The code is fairly readable as far as this kinda stuff goes. It's all assembly, but it's fairly well-commented. The PDF includes a description of what it is and how it works.

"This text introduces how to use the HT68FB5x0 series devices to implement a USB Gaming Mouse function. This application uses the HT68FB550 as a master device, which together with the AVAGO ADNS-3050 small form factor entry-gaming optical navigation sensor, can implement gaming mouse solutions, which contain a full range of extensive functions and which have higher performance than standard mouse products. Functions like RGB backlight colour control and mouse key function definitions etc., can be achieved using PC application programs. The HT68FB5x0 series are Holtek’s Flash MCUs which contain both USB and SPI interfaces."

Yeah, that seems similar-ish to ours, so there's probably some info we can glean from this. The PDF also describes the protocol it uses, including something rather curious: they seem to implement the same scheme we saw in our mouse's protocol where the topmost bit of a command ID being set means it's a "read" command. Perhaps that's just a coincidence, though...

What about the MTP file, anyway?

If we build this sample project in HT-IDE3000, it produces a .CV file and a .MTP file. Examining the resulting .MTP file shows that it's almost surely the same format as the firmware we extracted from FwUpdate.exe: it's got the same starting structure and the same magic strings (with higher version numbers). So we now know for sure that this file contains our firmware. Now, how do we get anything out of it?

I've noticed over the years that tool developers have a weird habit of including lots of assorted EXEs and tools, so it's worth taking a look at what Holtek include.

The USB Workshop isn't particularly interesting: it's got a 'mcu' subfolder containing C headers and assembly include files for different kinds of Holtek microcontrollers, but that's about it.

On the other hand, HT-IDE3000 includes some more fun bits and pieces! There's a bunch of assorted tools for things like converting binary files to data that can be included from an assembly source file, for managing libraries, for managing voice resources (not relevant to us since our chip doesn't include voice support, alas). There's a couple of disassembly-related DLLs, but those seem to be exclusively for the IDE's consumption. Inside the 'MCU' folder we find .cfg and .fmt files for a lot of Holtek's chips. We'll come back to these in a bit.

Finally, there's OptionViewer.exe, which lets us open a MTP file and view information about it. Now we're getting something: This tells us what chip it is (the HT68FB560), displays the magic strings we already saw in the file, and some other properties.

(I've got to admit something silly: I didn't realise OptionViewer existed until just now while I was writing this post -- I somehow managed to miss it! It wasn't a big issue, because I was able to get the info I needed through the next section...)

I3000: USB Programmer

We now know what chip the mouse is using, but that's about it. But... Remember how the firmware updater was pulling in a bunch of functions from ISPDLL.dll? This might be documented somewhere!

A quick search for "loadprogdata" online brings up a random page where somebody's uploaded the "I3000 User's Guide". Where have we seen that before? Well... Holtek's website offers I3000, the "Bootloader ISP Programming Tool".

If we download and install that, we get a programming tool (what a surprise), a manual, and a directory called ISPDLL. That directory includes x86 and x64 versions of ISPDLL.dll, ISPDLL.h and ISPDLL.lib... or in other words, this lets us easily compile a tool that calls into that DLL.

Furthermore, the manual explains how to use it. As we guessed from the FwUpdate tool, ISPDLL is just a DLL that deals with flashing devices that are in the bootloader mode. Using the LoadProgData function, we can get the code out of the MTP file without having to figure out the format.

I wrote a little utility that uses LoadProgdata and GetMCUInfo to extract the stuff from the MTP. This can be compiled from the Visual Studio developer command prompt by simply running cl experiment.cpp ispdll.lib.

#include <stdio.h>
#include <stdint.h>
#include <Windows.h>
#include "ISPDLL.h"

#define MTP_SIZE 32961

int main(int argc, char **argv) {
    uint8_t bits[MTP_SIZE];

    FILE *f = fopen("Tecknet.mtp", "rb");
    fread(bits, 1, MTP_SIZE, f);
    fclose(f);

    PBYTE programBuf, optionBuf, dataBuf;
    WORD programSize, optionSize, dataSize;
    int result = LoadProgdata(bits, MTP_SIZE, programBuf, programSize, optionBuf, optionSize, dataBuf, dataSize);
    printf("Result: %d\n", result);
    printf("ProgramSize: %d\n", programSize);
    printf("OptionSize: %d\n", optionSize);
    printf("DataSize: %d\n", dataSize);

    f = fopen("program.bin", "wb");
    fwrite(programBuf, 1, programSize * 2, f);
    fclose(f);
    f = fopen("option.bin", "wb");
    fwrite(optionBuf, 1, optionSize * 2, f);
    fclose(f);
    f = fopen("data.bin", "wb");
    fwrite(dataBuf, 1, dataSize * 2, f);
    fclose(f);

    MCUINFO m;
    m.cbSize = sizeof(m);
    GetMCUInfo(&m);
    printf("MCU:%s PageSize:%d MaxProgramPage:%d MaxLockPage:%d BootloaderSize:%d\n", m.szMcuName, m.nPageSize, m.nMaxProgramPage, m.nMaxLockPage, m.nBootloaderSize);

    return 0;
}

Armed with this, we now have our prize: a 32KB program.bin file containing the mouse's firmware! Next up...

Disassembling the Firmware

How do we translate this into something we can read? We know what the syntax looks like, since we've got a bunch of sample code. We know what chip we're using. We, however, don't have a disassembler - that's a bit of a sticking point.

Let's have a read through the chip datasheet. It's satisfyingly comprehensive. A lot of the information in it isn't necessarily relevant to us right now, though -- much of it is really only necessary if you're writing code for it (we're not at that level just yet!), or you're building a device using it. I'll try and summarise some of the more relevant bits here -- this is a bit of a departure from the norm to me because I'm used to architectures like PowerPC and ARM so I'll also try to highlight things that differ from what I'm used to.

The HT68FB560 chip has 32KB of program memory (word-addressable; 16K words) and 768 bytes of data memory (byte-addressable). These are both accessible in different ways, which I'll describe below.

Program Memory: Code, Data

The 'program memory' is flash memory, split into two banks: bank 0 contains locations 0000h to 1FFFh, bank 1 contains locations 2000h to 3FFFh. These contain code, data, tables and interrupt entries.

Code will always be executed from the program memory. Data can be read from the program memory by loading the 16-bit address into registers TBHP:TBLP and then using the instruction TABRD [m]; it copies the low byte into the data memory location [m], and the high byte into the special register TBLH.

The program memory can be written to, since it is flash, but this is a fairly convoluted process that involves erasing a 32-byte or 64-byte block, reprogramming the whole thing, etc - as such it is a high-cost operation that should be avoided, and it's best used for bulk/permanent data storage like configuration.

The reset vector (where the CPU jumps to after a reset) is at 0000h. The interrupt vectors are located within the range 0004h to 0028h.

Data Memory: Registers, Short-Lived Data

The 'data memory' serves two purposes: it contains both special registers (listed on page 41 of the datasheet) and general-purpose RAM. The general purpose RAM is split into 6 banks of 128 bytes each.

Most instructions allow you to specify an 8-bit address within data memory (called 'direct memory addressing') using the notation [XXh]. This allows you to access all registers (through addresses 00h-7Fh) and all data in Bank 0 (through addresses 80h-FFh).

In order to access a dynamically calculated address (e.g. for arrays) or anything in Banks 1-5, you need to use indirect addressing. There are two pairs of registers for this: MP0/IAR0 and MP1/IAR1.

IAR0 acts as a 'proxy' for whatever address is specified in MP0 within Bank 0; for example, writing 9Fh to MP0 and then adding 3 to IAR0 is equivalent to just adding 3 to [9Fh].

IAR1 and MP1 work the same way, but allow you to access any bank. The bank they use is selected by the bank pointer in register BP.

Other Noteworthy Registers

ACC is the accumulator. It's basically the chip's main temporary register... but it's also accessible as if it were part of memory, because this chip treats every register as part of memory. That's just wild.

PCL is the low 8 bits of the program counter. You can do dynamic jumps (jump tables, etc) by modifying this. You can't jump outside of the current 256-word page, though, as the top 8 bits can only be changed by executing a JMP or CALL instruction. Ah well.

STATUS stores a bunch of flags: there's zero/carry/overflow (quite standard) and some system management stuff (see pages 45-46).

Finally, the stack pointer is noteworthy because it's not an accessible register -- in fact, the stack isn't accessible either! It's purely an internal structure within the CPU. You cannot store arbitrary values on it, you can't push/pop things. The only way to manipulate it is by using the CALL and RET instructions, or by having an interrupt fire. You've just got to be really careful not to go over 12 stack levels.

Instruction Set

This is described from page 179 onwards. Pages 181 and 182 have a handy table of every single instruction, which you can see below.

It's fairly basic, really. Branching is implemented using "Skip" instructions: if a particular condition is true, the instruction directly after the skip is skipped, so a typical sequence is something like SZ [9Fh]; JMP varWasNonZero.

One curious quirk is that some instructions (SZ, SNZ, SET, CLR) can work with a specified bit of a register/memory location, rather than working with the entire byte. This makes it incredibly easy to have single-bit flags... which is pretty important in a system where you have less than 1KB of RAM!

What Now?

If you've been following the datasheet, you'll see that there's precise descriptions of every single instruction and exactly what that instruction does. We're missing one very important thing we need to write a disassembler, though: we don't know how the instructions are encoded!

We do have an assembler, so we could do a bunch of experimenting with that to figure out how each and every instruction is encoded. That would work, but it's time-consuming. Thankfully, we don't quite need to do that!

Earlier in this post I mentioned a directory in the Holtek IDE's installation folder which included .cfg and .fmt files for a ton of Holtek chips. Let's come back to those now.

There's a pair of files for our chip, the HT68FB560. The CFG file is binary; that's not particularly interesting. The FMT file, however, is human-readable, and describes the opcode format for this chip. Jackpot!

Holtek HT68FB560 Cross-Assembler description file
        generated by Holtek MCUWizard Ver 2.049

%microcomputer
version 100
date    April. 15, 2011
type    word
rom     0000H, 03fffH, 0ffffH, 0000H
ram     00080H, 000ffH, 00008H
ram     00140H, 00146H, 00008H, Special
ram     00180H, 001ffH, 00008H
ram     00280H, 002ffH, 00008H
ram     00380H, 003ffH, 00008H
ram     00480H, 004ffH, 00008H
ram     00580H, 005ffH, 00008H

%register
a, wdt, wdt1, wdt2

%operand
0, 3, 0, 0ffffH, 0, 0, 0, 0ffffH
1, 7, 0, 000ffH, 0, 0, 0, 007fH, 0, 7, 0eH, 1
2, 3, 0,   0ffh,  0, 0, 0,  0ffh
3, 1, 0, 03fffH,  0, 0, 0, 07ffh,  0, 0bh, 0eh, 3
4, 5, 0, 007ffH,  0, 0, 7, 7,  0, 3, 0, 007fH,  0, 0ah, 0eH, 1

%mnemonic
1,        0000h, 0ffffH,        nop
1,        0001h, 0ffffH,        clr     wdt
1,        0001h, 0ffffH,        clr     wdt1
1,        0005h, 0ffffH,        clr     wdt2
1,        0003h, 0ffffH,        ret
1,        0004h, 0ffffH,        reti
1,        0002h, 0ffffH,        halt
1,        0080h, 0bf80H,        mov     &1, a
1,        0100h, 0bf80H,        cpla    &1
1,        0180h, 0bf80H,        cpl     &1
1,        0200h, 0bf80H,        sub     a, &1
1,        0280h, 0bf80H,        subm    a, &1
1,        0300h, 0bf80H,        add     a, &1
1,        0380h, 0bf80H,        addm    a, &1
1,        0400h, 0bf80H,        xor     a, &1
1,        0480h, 0bf80H,        xorm    a, &1
1,        0500h, 0bf80H,        or      a, &1
1,        0580h, 0bf80H,        orm     a, &1
1,        0600h, 0bf80H,        and     a, &1
1,        0680h, 0bf80H,        andm    a, &1
1,        0700h, 0bf80H,        mov     a, &1
1,        1000h, 0bf80H,        sza     &1
1,        1080h, 0bf80H,        sz      &1
1,        1100h, 0bf80H,        swapa   &1
1,        1180h, 0bf80H,        swap    &1
1,        1200h, 0bf80H,        sbc     a, &1
1,        1280h, 0bf80H,        sbcm    a, &1
1,        1300h, 0bf80H,        adc     a, &1
1,        1380h, 0bf80H,        adcm    a, &1
1,        1400h, 0bf80H,        inca    &1
1,        1480h, 0bf80H,        inc     &1
1,        1500h, 0bf80H,        deca    &1
1,        1580h, 0bf80H,        dec     &1
1,        1600h, 0bf80H,        siza    &1
1,        1680h, 0bf80H,        siz     &1
1,        1700h, 0bf80H,        sdza    &1
1,        1780h, 0bf80H,        sdz     &1
1,        1800h, 0bf80H,        rla     &1
1,        1880h, 0bf80H,        rl      &1
1,        1900h, 0bf80H,        rra     &1
1,        1980h, 0bf80H,        rr      &1
1,        1a00h, 0bf80H,        rlca    &1
1,        1a80h, 0bf80H,        rlc     &1
1,        1b00h, 0bf80H,        rrca    &1
1,        1b80h, 0bf80H,        rrc     &1
1,        1d00h, 0bf80H,        tabrd   &1
1,        1d00h, 0bf80H,        tabrdc  &1
1,        1e80h, 0bf80H,        daa     &1
1,        1f00h, 0bf80H,        clr     &1
1,        1f80h, 0bf80H,        set     &1
1,        0900h, 0ff00H,        ret     a, &2
1,        0a00h, 0ff00H,        sub     a, &2
1,        0b00h, 0ff00H,        add     a, &2
1,        0c00h, 0ff00H,        xor     a, &2
1,        0d00h, 0ff00H,        or      a, &2
1,        0e00h, 0ff00H,        and     a, &2
1,        0f00h, 0ff00H,        mov     a, &2
1,        2000h, 03800H,        call    &3
1,        2800h, 03800H,        jmp     &3
1,        3000h, 0bc00H,        set     &4
1,        3400h, 0bc00H,        clr     &4
1,        3800h, 0bc00H,        snz     &4
1,        3c00h, 0bc00H,        sz      &4

This is split into four sections: %microcomputer, %register, %operand and %mnemonic. The first two sections are pretty self-explanatory. The operand section is a bit nonsensical. The mnemonic section lists every possible kind of instruction.

There's entries in %operand from 0 to 4, and the instructions reference operands from 1 to 4, so it's probably a safe bet to say that these entries describe how the individual operands are encoded (and possibly also parsed from the input assembly?).

Parsing Mnemonics

We can make an educated guess at the other fields in the %mnemonic section. The second 16-bit value looks very much like a mask, as it's shared between many instructions, and almost exclusively contains consecutive bits. We see these masks used:

Hex Mask	Binary Mask	Instructions Observed
`FFFF`	`1111 1111 1111 1111`	Special instructions with no operands
`BF80`	`1011 1111 1000 0000`	Instructions with one operand, type 1
`FF00`	`1111 1111 0000 0000`	Instructions with one operand, type 2
`3800`	`0011 1000 0000 0000`	Instructions with one operand, type 3
`BC00`	`1011 1100 0000 0000`	Instructions with one operand, type 4

There's a very obvious correlation here. Furthermore, the first 16-bit value always lies exclusively within the bits that are set in the mask. I'm therefore assuming that if (opcode & mask_second_value) == magic_first_value for a given instruction, then opcode is an instance of that instruction, and the bits that lie outside of that mask will tell us what the operands are.

The first field is always 1. I had a look at some of the other .fmt files supplied with HT-IDE and some of these contain a higher number -- but crucially, if the number is higher, then there's also more 16-bit magic/mask pairs. This leads me to believe that these are used for instructions that span multiple words. We don't have any of those in the simple world of the HT68FB560, so we can just ignore it and pretend the 1 isn't there.

We now know enough to write a really simple disassembler that iterates through every word in the file and tells us what instruction it is. OK, that's not quite enough, as we still need operands, but it's a starting point!

Parsing Operands

We return to this mess.

0, 3, 0, 0ffffH, 0, 0, 0, 0ffffH
1, 7, 0, 000ffH, 0, 0, 0, 007fH, 0, 7, 0eH, 1
2, 3, 0,   0ffh,  0, 0, 0,  0ffh
3, 1, 0, 03fffH,  0, 0, 0, 07ffh,  0, 0bh, 0eh, 3
4, 5, 0, 007ffH,  0, 0, 7, 7,  0, 3, 0, 007fH,  0, 0ah, 0eH, 1

I've got a confession here: I spent a while staring at this and had no clue what was going on, so I ended up figuring out the mappings myself through trial and error, by using HT-IDE to assemble test code and then looking at what opcodes it produced. It was a lot easier to do this 4 times for 4 operand types than 60 times for 60 instructions!

This got me to the point where I could write something that produced correct, if ugly disassembly. A lot of this is because we have no names for the registers. Luckily, the Holtek SDK saves us once again by providing us with HT68FB560.inc, which includes names for the benefits of assembly programmers, like these...

IAR0    EQU [00H]
R0  EQU [00H]   ;old style declaration, not recommended for use
MP0 EQU [01H]
IAR1    EQU [02H]
R1  EQU [02H]   ;old style declaration, not recommended for use
MP1 EQU [03H]
BP  EQU [04H]
ACC EQU [05H]
PCL EQU [06H]
TBLP    EQU [07H]
TBLH    EQU [08H]
TBHP    EQU [09H]
STATUS  EQU [0AH]

A bit more parsing code in, and we've got something that looks pretty sensible. Here's the first 64 words of the result.

0000 : 3e0a : sz PDF
0001 : 3a8a : snz TO
0002 : 2842 : jmp 0042h
0003 : 2920 : jmp 0120h
0004 : 0004 : reti 
0005 : 1785 : sdz ACC
0006 : 2805 : jmp 0005h
0007 : 0003 : ret 
0008 : 0004 : reti 
0009 : 0f84 : mov a, 84h
000a : 2805 : jmp 0005h
000b : 0000 : nop 
000c : ae00 : jmp 1600h
000d : 0f20 : mov a, 20h
000e : 2805 : jmp 0005h
000f : 0000 : nop 
0010 : a860 : jmp 1060h
0011 : 0f05 : mov a, 05h
0012 : 2805 : jmp 0005h
0013 : 0000 : nop 
0014 : 0004 : reti 
0015 : 0f42 : mov a, 42h
0016 : 2805 : jmp 0005h
0017 : 0000 : nop 
0018 : 0004 : reti 
0019 : 0f0b : mov a, 0Bh
001a : 2805 : jmp 0005h
001b : 0000 : nop 
001c : 0004 : reti 
001d : 0f01 : mov a, 01h
001e : 282a : jmp 002Ah
001f : 0000 : nop 
0020 : 0004 : reti 
0021 : 2012 : call 0012h
0022 : 1008 : sza TBLH
0023 : 1141 : swapa [41h]
0024 : 0004 : reti 
0025 : 2359 : call 0359h
0026 : b4d7 : <<UNKNOWN>>
0027 : f265 : <<UNKNOWN>>
0028 : 0004 : reti 
0029 : 0f04 : mov a, 04h
002a : 40f5 : mov [0F5h], a
002b : 5f74 : clr [0F4h]
002c : 1f05 : clr ACC
002d : 0001 : clr wdt
002e : 3de6 : sz RESUME
002f : 0003 : ret 
0030 : 1785 : sdz ACC
0031 : 282d : jmp 002Dh
0032 : 57f4 : sdz [0F4h]
0033 : 282d : jmp 002Dh
0034 : 57f5 : sdz [0F5h]
0035 : 282d : jmp 002Dh
0036 : 0003 : ret 
0037 : 408b : mov [8Bh], a
0038 : 0f8e : mov a, 8Eh
0039 : 0001 : clr wdt
003a : 3866 : snz SUSP
003b : 3de6 : sz RESUME
003c : 0003 : ret 
003d : 1785 : sdz ACC
003e : 2839 : jmp 0039h
003f : 578b : sdz [8Bh]

We can see interrupt handlers (those reti instructions are a dead giveaway) in the region 0004h to 0028h, as we were led to expect by the datasheet. We've got sensible register accesses. We've got a bunch of conditional jumps (sdz and jmp combinations, sz and ret combinations, etc).

It's a good start. There's one major omission: we can't tell what's code and what's data, so at certain locations we come across data that gets decoded as nonsense instructions. We'd kind of need better analysis to make that work properly though, and that's out of scope for what's basically a Python script that loops through every word in a file and turns it independently into a string.

You can find the source code for my disassembler (and a slightly cleaned up version of this post's MTP extractor) on GitHub here: https://github.com/Treeki/TM155-tools

Next Time on Mouse Adventures

So, we've got a disassembler, and we've got a disassembled copy of the firmware. We have no clue what's going on right now, but we can fix that! In Part 4 I'll delve into the TeckNet configuration tool, the rest of the protocol, and start trying to make sense of the firmware.

If you've got any feedback on these posts, then please let me know on Twitter (@_Ninji) or elsewhere. I'd like to get better at writing this stuff :3

If you've really enjoyed this series of posts and want to help fund my hot chocolate habit, feel free to send me a couple of pounds: Ko-fi | Monzo.me (UK) | PayPal.me

Previous Post: Mouse Adventures #2: Extracting the Firmware
Next Post: Mouse Adventures #4: Writing a custom tool