SPA Attacks Without Triggers

In my initial side-channel attack experiment using Simple Power Analysis (SPA), I worked with a target system designed and configured to demonstrate the attack. This configuration incorporated an artificial trigger within the target code, a common feature in many proof-of-concept demonstrations for side-channel attacks. The trigger helps to pinpoint the exact moment when a particular operation of interest, such as an encryption function, password comparison, or flag-checking step, is executed.

However, modifying the firmware to include such a trigger in real-world scenarios may be infeasible. The absence of a trigger signal significantly increases the difficulty of executing the same attack. In this blog post, I delve into the challenges associated with conducting a SPA attack without the convenience of a backdoor trigger, the strategies I employed to overcome these obstacles and potential avenues for future exploration in this area.

An important aspect to consider when analyzing side-channel attacks on CPU execution instead of cryptographic algorithms is the difference, like the patterns generated during the operation. Cryptographic algorithms typically involve complex mathematical operations, often causing more pronounced and easily discernible current consumption patterns. These patterns can be relatively easier to identify and correlate with specific actions within the cryptographic process.

On the other hand, detecting relevant patterns in CPU execution, mainly when focusing on a few instructions, such as password comparison, can be a more intricate task. The power consumption variations caused by these instructions may be subtler and harder to distinguish from the background noise, making identifying and analyzing the relevant patterns significantly more challenging. This key difference emphasizes the need for a more refined approach when targeting CPU execution in a side-channel attack and the importance of leveraging advanced data analysis techniques and innovative strategies to uncover the hidden patterns within the power consumption data.

Hardware Setup

To collect the signal captures, I used a set of laboratory hardware components and connections similar to the ones used in the previous experiment (SPA Part 1). The diagram below illustrates the setup.

The USB oscilloscope was connected to a host PC for data acquisition. Four probes were used to capture the signals. Two probes were connected to the power line, one probe to the target TX and one to the artificial trigger at port PB0. With this setup, I collected accurate and reliable signal data, which was then used for further analysis and experimentation.

It's not mandatory to use the same hardware specifications.

Software Setup

The target serial line (TX, RX) will show a prompt in the host using the TX from Target:

Password:

The host can then send the password to the RX of Target. When the password is wrong, the Target sends to the host, through the TX line, the following message:

Password Bad!

We know the comparison happens between sending the password and receiving the result message from the target. These events can be seen in the following oscilloscope capture.

From the capture information, the signal times take around 350us between completely sending the password and receiving the response.

First Captures

Before starting any attack, examining the raw signals and ensuring nothing is wrong is a good idea. Like, all signals have activity and are not grounded due to some mistake.

I’ve used the following stimulus and setup:

350us long capture
95% pre-trigger to capture more content before the trigger
trigger at 2V, falling edge, on probe C that is connected to target TX pin
send password ‘x’ to target

Given this scenario, the capture resulted in the signals shown in the following plot.

[figure_hyRyyOfS]

For those who may not be familiar with these plots, here's a brief explanation of what each signal represents:

The orange signal corresponds to the target's TX (transmit) line. Since the signal is transmitted using the UART protocol, the voltage level on the pin changes to a low level to indicate the start of transmission, called the start bit. In this experiment, the TX signal is captured because it provides insight into the moment the target is about to check whether the password is correct.
The green signal represents a hidden or backdoor trigger. This trigger may not exist in a real-world scenario unless the target code is modified. However, the backdoor trigger was added in this experiment to confirm the hypothesis that the 350us range includes the password comparison code.
Finally, the purple and blue signals correspond to the voltage at the shunt resistor nodes. This allows us to calculate the current passing through the circuit using Ohm's law (V=IR), assuming a static resistance. These signals provide insight into the current consumption of the target device. They can infer which operations are being performed at a given time.

If we zoom in, we can see that the voltage closer to the power source is more stable. In contrast, the negative node, which is closer to the load, has more amplitude of oscillation:

[figure_SewMH8C1]

That's why sometimes, considering the positive node, a static voltage is a reasonable approximation.

Now subtract and divide by 4.6R to get the current (not necessary, but, for the sake of completeness), as shown in the following plot:

[figure_apsoHwsE]

Now, we can have a closer look at the current consumption signal:

[figure_u4RFhwzK]

Apart from a low-frequency envelope, the signals are good. In the plot, each pair of peaks corresponds to an instruction that lasts the same duration as the target CPU clock period, which will be reviewed in the frequency analysis below.

Multiple Captures

I took a few captures using the same password input and hardware setup, trying to figure out how much interference there was and if subtracting the signals directly could work.

I calculated the current for these multiple captures, but keep in mind that filtering and post-processing still need to be done on the signals. The result can be seen in the plot below.

[figure_1EdK2xc8]

Looking at the signals closer, we see that there are multiple mismatches:

[figure_um0MVVim]

There's some resemblance between the signals, but they do not overlap. There's a phase shift, and we see glitches in multiple places. This is more evident if we do a closer zoom.

[figure_5z4uS36l]

So, counting on these captures to spot differences between different inputs just won't work, because there are inconsistencies all over the place even when the inputs are the same!

Frequency Analysis

Computing the Fast Fourier Transform (FFT) and generating a frequency-domain graph is a well-established technique for identifying the frequency components within a given signal. When applied to the current set of signals, this method produced the visualization displayed in the subsequent figure.

[figure_KE5TRhEO]

From this frequency plot, we can infer that:

The peak frequency components are very similar between the different captures;
The first peak is at around 210kHz;
There are multiple peaks until we reach a very high peak at 8MHz, followed by another at 16MHz, and then 24MHz;
There are many frequency components with considerable magnitude (10dB), at frequencies below the 8MHz.

Given this information, there's a high chance the target is configured to operate at a CPU frequency of 8MHz (125ns per instruction). The peaks at 8MHz, 16MHz, and 24MHz constitute the harmonics of an 8MHz square wave: the clock, in this case.

The extra frequency components at 10dB reveal that the current consumption has numerous noise sources, contributing to signal disturbance seen in the time series plots.

As a result, frequency plots are unsuitable for determining a single instruction difference, despite their similarity in multiple captures. However, they can still provide insight into the frequency components of the current consumption signal.

Signal Filtering

One of the essential steps in signal processing involves filtering a particular range of frequencies to enhance the signal's clarity. In this instance, a significant portion of the frequency noise content is present below 8 MHz. Consequently, I employed a high-pass filter with a cutoff frequency of 7 MHz. The frequency response of the resulting filter is illustrated in the following figure.

[figure_3XWxstJ2_freqz]

This 5th-order filter will attenuate approximately 80dB one decade before 7MHz, which means 1/10000 of the current.

[figure_3XWxstJ2]

Let's include the different current captures, employing the same filter.

[figure_ajVNpIrn]

At first sight, the signals look a little different, but if we look closer:

[figure_ajVNpIrn_zoom]

There is a lag between the signals. This lag makes it harder to visualize the signals' differences. We can try to correct this lag with cross-correlation analysis.

Correlation analysis

Cross-correlation is a mathematical operation that compares and determines two signals' similarities. When used to correct the lag between signals, cross-correlation helps to align the two signals by finding the time shift that maximizes their similarity. This process involves sliding one of the signals across time, comparing it with the other signal at each point, and measuring the degree of similarity between them.

By performing cross-correlation, the time lag between two signals can be accurately determined and corrected. This is particularly useful in this case, where it is necessary to synchronize or align two signals that may be out of phase.

The cross-correlation between the first and second current captures are shown in the following plot.

[figure_2mkHAM9s]

It doesn't look the way I was expecting… This cross-correlation says that we get the best signals correlation if we keep them unchanged. If we restrict the lag between -500 and 500, we get the following:

[figure_2LDh5vUK]

The differences are soo small between the signals that comparing the signals will only start to make a difference (less correlation) when the lag increases substantially.

Let's look at the time from the figure above, where the signals were out of sync.

[figure_xoDo7Lwx]

At first sight, the signals are much better. To get these corrected signals, I did a cross-correlation between the current signals other than fhp_irs_x_0. The lag that led to the best correlation value was used to correct the signal lag.

I tried the same steps but instead considered a reduced capture period: between 200us and 204us. The result is shown in the following figure.

[figure_sm656gw6]

The lag correction has been improved, which is not surprising. Choosing a single lag value that enhances the signal correlation of all time is more complicated than for a shorter time. The lag correction for the entire time was 26 units, while for 4 microseconds, it was -4 units.

Rather than calculating and applying a single lag to the entire signal, we could use individual lags for each window. The result for a 40us crop is shown in the figure below.

[figure_wFN0k819]

It's hard to understand how good it is, other than there is a degree of variation between the different captures. We can see the signals don't completely overlap, and there are some spikes, even considering this is a filtered signal.

Looking at a shorter time interval at the end of the previous capture:

[figure_wFN0k819_zoom]

I've chosen the time interval to purposely capture the window transition in the middle of the plot at 236us, where the lag correction may change. There's a small spike, but I don't think it impacts the overall signal behavior. Due to the lag difference, the dominant (shown in the plot more clearly) signals change from red and blue to red and green.

Now that the signals overlap better, we may try to identify differences more accurately.

Difference Analysis

The first idea to identify signals' differences is simply subtracting the two signals we want to compare. The equal or similar regions will lead to values close to zero of difference. In contrast, less similar regions will lead to higher amplitude. What do we get?

[figure_yg78FWB3]

These differences are higher than the original signal amplitudes (~5mA). Looking at the shorter interval (figure below), we can see that two different lag corrections led to very different patterns. The lag correction on the right looks better (less difference amplitude).

[figure_yg78FWB3_zoom?]

The goal should be the minimum difference amplitude since these signals were generated from the same stimulus (password "x"). We'd like to have a very similar output.

Before attempting alternative difference calculation methods, it could be worth returning to signal lag correction and trying to improve it. An idea from this difference analysis is that we could use a metric other than the correlation to decide on the lag correction value.

Signal Lag Improvement - Take 2

Taking the idea from the previous analysis, let's try to improve the signal correlation analysis so that we've more overlapping between the signals based on absolute error and not the correlation. The sum of absolute error (SAE) of a specific lag is given by:

$$ SAE(l)=\sum_{n=0}^{N}|v_n - u_{n+l}| $$

For a pair of current vectors given by $v_n$ and $u_n$, and capture of dimension $N$. In this experiment, I’m using a sampling period of 4ns. A capture of 40us corresponds to 10000 points ($N$=10000). One of the vectors is kept as is ($v_n$), while for the other ($u_n$) I’m varying the lag between -10 to 10, using the variable $l$.

Using the SAE instead of cross-correlation to determine the best lag value, I got the following plot of the phase-corrected currents.

[figure_bXzsSoFw_xcorr]

We can see that given the cyclic behavior of the SAE formula, the first current signal is out of sync compared to the others. To improve this, I've added a penalizing term that decreases the chances of periodic lags being better than the others. The final correlation score is given by:

$$ S(l)=SAE(l)+wP(l,T,\sigma) $$

The penalizing term depends on the lag, and the cyclic period of the lag given by $T$, and $\sigma$ which is the square root of the variance of the Guassian used in function $P$. A weight factor $\sigma$ is used to make the penalizing term more or less impactful in the correlation score.

The $P$ formula is written as:

def gaussian_periodic_penalty(lag: int, period: int, sigma: float):
    abs_lag = np.abs(lag)
    min_distance = np.minimum(((abs_lag - period) % period),
                              (period - ((abs_lag - period) % period)))
    penalty = np.exp(-((min_distance)**2) / (2 * sigma**2))
    penalty = np.where(abs_lag < period / 2, 0, penalty)
    return penalty

Plotting the five current signals again after lag correction, we get:

[figure_0bghZBcO]

No symmetric lag corrections were done, and all signals were overlapping. The penalizing function is working as expected. The two first current signals' difference plot is shown in the following figure.

[figure_fG1NbGwG]

The lag was OK in the first window but got slighly worse in the second one. Instead of using individual lags for each window, an alternative is to use the most common lag from all windows and apply this lag.

I'm using the same penalizing function from the previous plot but looking for the most frequent lag and using it to correct the phase of the signals. We get the following plot as a result.

[figure_S3mGenKh_zoom]

The plot shows the first current signal (x_0), and the second current signal (x_1), lag corrected. Then, in yellow, we can see the absolute error signal (x_01) between x_0 and x_1. In terms of absolute error, it doesn't show a transition between the first and second windows, as it happened before (as shown in figure_fG1NbGwG). We also don't see a window transition glitch caused by different lag corrections between the windows.

Considering the bigger picture, the five current signals (following figure) overlap reasonably well.

[figure_S3mGenKh_xcorr]

All is going great! right?

Well... for a longer range of time, the absolute errors for multiple signal pairs are shown in the following figure.

[figure_ytHtPOfu]

While it's true that for some specific regions, the absolute error is low (meaning the signals are very similar), there are other regions where it is pretty high. So, the game is not over yet... 🙂

Ending words

This experiment aimed to investigate scenarios where a precise trigger signal is absent in the target and where larger capture analysis is required to identify artifacts that aid in locating interesting regions of the signals.

The analysis predominantly focused on signal processing using simple algorithms. Signal processing is a vast field, and exploring other approaches in this experiment, such as a combination of machine learning and signal processing algorithms, may be worthwhile.

For instance, we could use a loss function, such as absolute error (mean, maximum, cumulative, etc.), in a supervised machine learning algorithm to minimize the error. The model parameters could include signal processing tools such as lag correction, y-offset, high pass, or low pass cut-off frequency.

One interesting observation that emerged from this experiment is the limitation of breadboards in capturing signals with frequencies higher than 10MHz. This realization should have been obvious but was overlooked initially. Moving forward, an avenue to explore in future experiments would be to use a target on a PCB rather than a breadboard, as PCBs are better equipped to handle higher-frequency signals. This could lead to more accurate and reliable signal readings, consequently with less noise.

It's worth noting that the high sampling rate used in this experiment (250MHz) helped to reduce the effects of aliasing noise in the signal captures. This allowed me to obtain more accurate readings and helped to minimize the potential for errors due to aliasing. However, a low-pass filter at the oscilloscope probe may still be advisable to ensure optimal signal quality in future experiments.

You may have noticed I've named the figures from this post with random IDs. These figures were built using a tool I created while going through this experiment: hwpwn. The figures' original data are attached in this post, along with the hwpwn flow configurations, so anyone can do other experiments with this data easily. For more information about hwpwn, I plan to post another post about it soonish... stay tuned.

PS: If you find any error, suggestion, or just want to make any comment, feel free to send me an email message, or PM in twitter.