SPA Attack Experiment 1
In late 2021 a book named “Hardware Hacking Handbook” was published. I don’t remember exactly how I came across it, but after checking the ToC and some friend feedback, I’ve decided to buy it. I quickly looked at the contents and stumbled on chapter 9, which includes a simple power analysis lab experiment. It’s an introduction to power analysis, and I was curious to do it in practice. Then, why not write about my experience replicating that lab?
During my high school and first University years, I’ve already spent some time playing with ATMEL microcontrollers. Even though I’ve been a bit distant from this scene, I’ve followed some of the Arduino evolution. Thus, the lab apparatus wasn’t entirely new to me, but the experiment was!
Scenario
This lab is a microcontroller asking for a password through a serial port. The attack vector uses the power supply noise and (somehow) learns about the correct password. This type of attack is not new; according to the book, they date back to the late ’90s, in the context of cryptography side-channel attacks. However, the exponential proliferation of microcontrollers and computers around us will make these types of attacks increasingly common.
Power Analysis
As mentioned, the attack relies on analyzing the power consumption signal of a target device using a reasonable precise probe. One of the easiest ways of doing this (although there are others), is using a shunt resistor between the target and the power source, as shown in the following drawing:
We then measure the voltage drop in the shunt resistor (Rs), and by Ohm’s law, we can get the current flowing into the target:
$$I=\frac{V_+-V_-}{Rs}$$To avoid using a differential oscilloscope which is more expensive, one could measure the two voltage potentials of the resistor and subtract to get the voltage drop. However, this can be further simplified by assuming that the voltage source (Vcc) is ideal and constant; we use a single probe and measure the negative potential V- voltage to ground and subtract that from an ideal Vcc to ground:
$$V_{Rs} = (V_{SS}-V_{GND}) - (V_- -V_{GND}) = V_{CC} - V_-$$So, the current consumption analysis can be done by capturing the voltage at the negative node of the resistor. We’d expect that this voltage would show some differences when the correct character of the password is used compared to incorrect password characters.
Simplifying
Measuring the current consumption is very reliable and should give the most accurate results. Alternatively, we can also measure the voltage variations caused by small and fast current sinks on the target. This is only possible because we’re in a real-world that doesn’t have ideal power sources: they take time to reestablish the voltage from current sink variations. That’s one of the reasons to use decoupling capacitors at the power source rails. These help keep the voltage as stable as possible.
That said, I won’t be measuring the current variations but instead voltage source noise. We need to get rid of any capacitor on the power source rails to do this. Initially, I was using the STK500 programmer board with external Vtarget, but I missed that even this rail was connected to some smoothing capacitors. Thus, any voltage noise information was being wiped by the capacitors.
Now, we need to power the target somehow. Without capacitors on the power line, it means that switching converters (buck, boost, and variants) will inject unwanted noise in our captured signal. Alternatively, we could use a linear voltage regulator (with lower noise), but I had none at hand.
So the solution I used was a cleaner power supply… what’s the basic noise clean power supply we know? A battery or power cells. So, I’ve connected 3xAAA power cells to get around 4.5V, which was sufficient to power my target microcontroller.
Lab Preparation
I used the following parts:
- Digital Scope (I used PicoScope 2406B, 1GS/s, 32MS)
- ATmega32 microcontroller
- USB to Serial converter (FTDI)
- 3xAAA power cells
- AVR programmer board (STK500, but any will do)
- Breadboard and a few wires
The lab setup is basic and described in the following diagram:
After mounting this setup, it looks like the following picture (the power from power cells is disconnected just to avoid wasting battery):
Before we can perform the experiments on noise analysis, we need to prepare the target. That’s described in the next section.
Target Software
I’ve used the same sample code as the one from the book with a few changes:
- Removed the random delays;
- Added trigger pin to low code right after the comparison.
Since it’s my first experiment, I wanted to confirm I was seeing the signal noise at the right time and not, for example, missing data because I didn’t capture enough time. Even though we shouldn’t have these shortcuts in a realistic scenario, other hacks can be done to help overcome these limitations (some ideas are shared in the end).
The code used for the target was the following (based on code from the book):
// Trigger is PB0 (pin 1)
int triggerPin = PB0;
String known_passwordstr = String("ilovecheese");
String input_passwordstr;
char input_password[20];
char tempchr;
int index;
// the setup routine runs once when you press reset:
void setup() {
// initialize serial communication at 9600 bits per second:
Serial.begin(9600);
pinMode(triggerPin, OUTPUT);
tempchr = '0';
index = 0;
}
// the loop routine runs over and over again forever:
void loop() {
//Wait a little bit after startup & clear everything
digitalWrite(triggerPin, LOW);
delay(250);
Serial.flush();
Serial.write("Enter Password:");
// wait for last character
while ((tempchr != '\n') && (index < 19)) {
if (Serial.available() > 0) {
tempchr = Serial.read();
input_password[index++] = tempchr;
}
}
// Null terminate and strip non-characters
input_password[index] = '\0';
input_passwordstr = String(input_password);
input_passwordstr.trim();
index = 0;
tempchr = 0;
digitalWrite(triggerPin, HIGH);
if (input_passwordstr == known_passwordstr) {
digitalWrite(triggerPin, LOW);
Serial.write("Password OK\n");
} else {
digitalWrite(triggerPin, LOW);
//Delay up to 500ms randomly
//delay(random(500));
Serial.write("Password Bad\n");
}
}
It’s worth mentioning that the password comparison line will call String.equals()
and this operator code is the following (taken from here):
...
unsigned char String::equals(const String &s2) const
{
return (len == s2.len && compareTo(s2) == 0);
}
...
It first compares both strings length, then it calls compareTo
function. The latter will then use strcmp
which is known to be inadequate for comparing passwords. This means that one part of the attack (not explored in this post) is to find the correct password size.
Refer to Appendix A for information on how I’ve programmed the target. I felt it was a bit off-topic to include in the main text. Once the target is programmed, we’re ready to do the first tests.
First tests and results…
I’ve used a Linux serial client (screen, minicom, etc.) to test if the target is working as expected. To communicate with the target, we need to ensure the baud rate is set to 9600. Then I typed a few characters until the password buffer was filled and waited for the response.
Once I verified that the serial communication was working as expected, I made the first captures from the oscilloscope. I’ve used channel A for the voltage noise capture (AC mode, 100mV, 2u to 5u) and channel B for the trigger (DC mode, 10V, 2V trigger on rising edge).
The following graph shows the voltage noise for two passwords attempts:
aaaaaaaaaaa
(represented by “V(a)”)iaaaaaaaaaa
(represented by “V(i)”)
I’ve included the trigger signals which allow us to see approximately when the comparison ends.
To avoid having too much information in the time graph, I’ve cropped it before 16us. Both signals are very similar from the trigger and start of the capture until around the 18us. The grid spacing matches 8MHz, the same as the target instruction cycle period (most of the target instructions take a single cycle). From this preliminary time graph, we can already make a couple of comments.
- It was surprising to me to find that the noise signals are very similar for the same internal target behavior. I was expecting more noise, more differences between the signals!
- Even though we passed 11 characters in length, we know that the comparison ends at the first different character found. This means we won’t see the voltage noise for all characters comparison in the graph, and we have to guess one password character at the time.
- We can see that the trigger lowering for
iaaaaaaaaaa
password happens later, which is in line with a string comparison that went a bit further in the target. Thus, it takes around 8 clock cycles between two characters in comparison. - The exit path of the comparison (comparison function return and trigger switch) should use similar instructions. Hence, we can see two identical voltage signals but having a time offset …
Can you spot where the string comparison deviates? It seems right after the 18us, as shown in the following time chart C1 mark.
We can see that the pair of spikes is lower when the password character is correct. To help confirm this hypothesis, we can look at the signals for all other wrong characters (A-Z
except I
), and see that the behavior is the same: they all have a pair of spikes above 2V in this time region.
Moving to the next character, we have the exact same behavior as shown in the following graph: approximately 8 clock cycles after the first comparison (in C2), we see the two distinct spikes again:
To have better confidence on these spikes being related to the comparison, we can confirm from the assembly listing if the number of cycles between a character comparison is around 8. Refer to Appendix A for this analysis.
To implement a brute force algorithm, we should look for data points above ~2V at specific periods in time, given by t, for the particular character index n, given the target clock period C:
I’ve used a Python script (spacode_crack.py) to programmatically brute-force the password. The output after a few trial and errors was:
:::plaintext
Found the following picoscope:
DriverVersion : PS2000A Linux Driver, 2.1.78.3011
...
timebase = 2
timebase_dt = 4e-09
noSamples = 8750
Sampling interval = 35.000000 us
Current password: i
Current password: il
Current password: ilo
Current password: ilov
Current password: ilove
Current password: ilovec
Current password: ilovech
Current password: iloveche
Current password: ilovechee
Current password: ilovechees
Found password: ilovecheese
$
It’s not printing all the attempts, but you can easily enable those. I had to try a few times (3..4) until it completely matched the password. The code is very simple and doesn’t use any technique to be more tolerant of electrical errors and deviations. For example, we could use the average of a sample of several captures for the same character attempt instead of using a single capture.
Final Words
This experiment was just a proof-of-concept: a few simplifications were used, deviating a bit from real-world scenarios. For instance, it wasn’t described how we could find the correct password size. There is a trigger signal to help know when the password comparison starts. I also used this signal to locate the end of the comparison more accurately, even though this information won’t be used in the brute-force script.
However, overcoming these difficulties shouldn’t be impossible either. There are always ways, techniques, and little hacks that can be done to help. I can think of the following for two problems:
For the trigger signal, used to know when the comparison is about to start. We know that as soon as we send the carriage return of the password, it will receive the character and compare the password. We would be capturing the signal a bit early and would probably need to extend the capture period until we learn where the comparison is made.
I think a timing attack could be used to find the correct password length. When the password matches the correct length, the code will call strcmp
, which will delay the “Password Bad” response. By comparing the “Password Bad” response time, it might be possible to know when the correct password size is used.
Nevertheless, it was a fun challenge, and I look forward to doing other hardware security experiences.
Cheers!
Appendix A
Preparing the Target
First we need to install the Arduino core for the corresponding chip model. Arduino does not directly support all Atmel chips. Fortunately, the community has ported the Arduino Core for many other Atmel chips include those I had stored during my University years in my electronics parts boxes. I had several spare microcontrollers but the one that I found to be working was an Atmega32.
To install the Arduino core we search for one port in Google that leads us to:
https://github.com/MCUdude/MightyCore
We first add this core index URL to the local Arduino configuration:
$ arduino-cli config add board_manager.additional_urls \
https://mcudude.github.io/MightyCore/package_MCUdude_MightyCore_index.json
Now we refresh the index with:
$ arduino-cli core update-index
Updating index: package_index.json downloaded
Updating index: package_index.json.sig downloaded
Updating index: package_drazzy.com_index.json downloaded
Updating index: package_MCUdude_MightyCore_index.json downloaded
Let’s confirm we now have support for Atmega32 with:
$ arduino-cli core search atmega32
ID Version Name
atmel-avr-xminis:avr 0.6.0 Atmel AVR Xplained-minis
MightyCore:avr 2.1.3 MightyCore
We request Arduino to download the core stuff with:
$ arduino-cli core install MightyCore:avr
Tool arduino:[email protected] already installed
Tool arduino:[email protected] already installed
Tool arduino:[email protected] already installed
Downloading packages...
MightyCore:[email protected] downloaded
Installing platform MightyCore:[email protected]...
Configuring platform....
Platform MightyCore:[email protected] installed
Alright, now I didn’t need to install the bootloader because I will use ICSP directly to program the microcontroller. When we use arduino-cli compile/upload ...
command it will already include the bootloader in the binary as will be shown next.
Compiling
First we grab the Lab1 code from the Hardware Hacking Handbook and put it in a Arduino source code file (I called mine PsaSketch):
Now we run the arduino-cli
command to compile, providing the chip information:
$ arduino-cli compile -b MightyCore:avr:32:clock=8MHz_internal PsaSketch.ino
We’re ready to upload the code to the chip.
Uploading Code
You can use any programmer you prefer, I’m just sharing what I’ve used, STK500. I don’t think it should be very common used board now-a-days, probably easier to use a USBAsp or Arduino-ISP programming. But it really doesn’t matter as long as you can program the chip.
Now I’m using STK500 to program the chip, the following description only applies if you’re using the same board… you may skip for the next section. I’ve to connect two headers of ISP6Pin to SPROG3 (the red one that will have ATmega32 plugged), as shown with the arrow in the following picture:
The square highlight connection is for the spare RS232 connection and the circle highlight is to connect the “trigger” led (PB0).
Now that the board is ready, we can call the Arduino cli tool to upload the binary:
$ arduino-cli upload -b MightyCore:avr:32:clock=8MHz_internal \
-P stk500 -p /dev/ttyUSB0 -v .
...
avrdude: Version 6.3-20201216
Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
Copyright (c) 2007-2014 Joerg Wunsch
...
Using Port : /dev/ttyUSB0
Using Programmer : stk500
AVR Part : ATmega32
Chip Erase delay : 9000 us
PAGEL : PD7
BS2 : PA0
RESET disposition : dedicated
RETRY pulse : SCK
serial program mode : yes
parallel program mode : yes
Timeout : 200
StabDelay : 100
CmdexeDelay : 25
SyncLoops : 32
ByteDelay : 0
PollIndex : 3
PollValue : 0x53
Memory Detail :
Block Poll Page Polled
Memory Type Mode Delay Size Indx Paged Size Size #Pages MinW MaxW ReadBack
----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
eeprom 4 10 64 0 no 1024 4 0 9000 9000 0xff 0xff
flash 33 6 64 0 yes 32768 128 256 4500 4500 0xff 0xff
lfuse 0 0 0 0 no 1 0 0 2000 2000 0x00 0x00
hfuse 0 0 0 0 no 1 0 0 2000 2000 0x00 0x00
efuse 0 0 0 0 no 0 0 0 0 0 0x00 0x00
lock 0 0 0 0 no 1 0 0 2000 2000 0x00 0x00
signature 0 0 0 0 no 3 0 0 0 0 0x00 0x00
calibration 0 0 0 0 no 4 0 0 0 0 0x00 0x00
Programmer Type : STK500V2
Description : Atmel STK500
Programmer Model: STK500
Hardware Version: 2
Firmware Version Master : 2.10
Topcard : Unknown
Vtarget : 5.1 V
SCK period : 35.3 us
Varef : 3.2 V
Oscillator : 3.686 MHz
avrdude: AVR device initialized and ready to accept instructions
Reading | ################################################## | 100% 0.01s
avrdude: Device signature = 0x1e9502 (probably m32)
avrdude: NOTE: "flash" memory has been specified, an erase cycle will be performed
To disable this feature, specify the -D option.
avrdude: erasing chip
avrdude: reading input file "/tmp/arduino-sketch-AFA799AD3D8937B186F6188F7AC20AAA/PsaSketch.ino.with_bootloader.hex"
avrdude: writing flash (32768 bytes):
Writing | ################################################## | 100% 6.97s
avrdude: 32768 bytes of flash written
avrdude done. Thank you.
As soon as we upload the code, we can see the led light on (the chip code starts running right away). This happens because VTARGET jumper on this board is shorted, thus we’re powering the chip directly from the STK500.
Appendix B
Assembly Instructions
The piece of code that performs the comparison of the characters can be analyzed from the target binary disassembly, at strcmp
function:
:::text
strcmp():
102c: fb 01 movw r30, r22 ; Z=r31:r30
102e: dc 01 movw r26, r24 ; X=r27:r26
1030: 8d 91 ld r24, X+ ; r24=X, X+=1 (2C)
1032: 01 90 ld r0, Z+ ; r0=Z, Z+=1 (2C)
1034: 80 19 sub r24, r0 ; r24-=r0 (1C)
1036: 01 10 cpse r0, r1 ; if r0 == 0 ? (!1C,2C)
1038: d9 f3 breq .-10 ; 0x1030 <strcmp+0x4>
103a: 99 0b sbc r25, r25 ; r25=r25-r25=0
103c: 08 95 ret
If we imagine a trace from the cpse
instruction, we count between 7 to 8 clock cycles:
cpse
, since the condition is false, it will execute the next instruction and consume one clock cycle;- next instruction is
breq
and will always happen (?? not clear why) and takes two clock cycles; - then, two
ld
instructions that take four clock cycles; - the subtraction which takes one clock cycle;
- …and we reach again the
cpse
instruction, after 8 clock cycles.
References
https://learn.sparkfun.com/tutorials/installing-an-arduino-bootloader/all
https://docs.arduino.cc/built-in-examples/arduino-isp/ArduinoToBreadboard