Hack The Box / Challenges / Crypto / Ebola

This challenge took me a while to solve, in this writeup I tried to describe most of my reasoning while solving it.

We've the following description message:

"We suspect that some terrorists have a plan to use the Ebola virus. We have managed to collect an encypted message and its key. Can you help us decrypt the message?

We've two files:

key.txt
encrypted.bin

The key.txt seems to be a DNA sequence. From [1] we can see that is a plain format:

A sequence in plain format may contain only IUPAC characters and spaces (no numbers!).

Note: A file in plain sequence format may only contain one sequence, while most other formats accept several sequences in one file.

An example sequence in plain format is:

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC
CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGC
CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGG
AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCC
CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAG
TTTAATTACAGACCTGAA

The IUPAC characters are:

A = adenine           
C = cytosine            
G = guanine             
T = thymine

U = uracil
R = G A (purine)        
S = G C 
V = G C A
K = G T (keto)    
B = G T C
D = G A T
M = A C (amino)
W = A T 
H = A C T
N = A G C T (any)
Y = T C (pyrimidine)

The key.txt contents are:

CTGAAATGTTCCGCGAGCCGAACCGATTCACCGCCTAGAAACGTATTGTGCTGGTGTGCGGCGGTTAGAGATATT
AGGTAGCGCCGTTACTCTAACATTTCGAATCAACCTTTCAGGGGAGTCACTGCCATCGTAAGTAGAGTACTTAGC
ATCGATGGCCATGCCTACTAATTACAGGCTGAATGACACTAAACCTTAGTTCACTGACCCGTTTTGTCATGTACT
...

While reading [2] I got a better understanding of DNA sequencing in general and is quite an interesting problem even if we don't need this knowledge in the end to solve this challenge! I wasn't aware of how the DNA sequencing was done and the difficulties.

Anyway, this seems to be an assembled sequence, so it's not intermediary sequences like the reads, overlappings or contigs.

The first idea that comes to my mind would be to do some translation of these characters into binary data. I don't even know what algorithm was used to encrypt yet, maybe the key can be used in its original form? The key is 1024 bytes long. A common key size with this numeric value would be RSA Asymetric Encryption, 1024 bits.

Maybe we can translate each character into a bit? But we've 4 possible values for each DNA bit... so 4^1024 possible combinations of the key? From [3]:

"Therefore each base pair represents one of four values as the cell reads through the DNA; twice as many values as one computer bit."

CTGAAATGTTCCGCGAGCCGAACCGATTCACCGCCTAGAAACGTATTGTGCTGGTGTGCGGCGGTTAGAGATATT
AGGTAGCGCCGTTACTCTAACATTTCGAATCAACCTTTCAGGGGAGTCACTGCCATCGTAAGTAGAGTACTTAGC
ATCGATGGCCATGCCTACTAATTACAGGCTGAATGACACTAAACCTTAGTTCACTGACCCGTTTTGTCATGTACT
...

Lets try convert the (A, C, G, T) to (00, 01, 10, 11) and then convert the binary number to binary data.

That leads to:

00000000: 9c07 5aec eb0a c58a e930 2d17 79f7 7bef  ..Z......0-.y.{.
00000010: 5331 14f4 eeb5 2642 15b0 60a5 63fc d89e  S1....&B..`.c...
00000020: 86d0 d334 94e1 b1fa 1e92 4148 f9c1 c890  ...4......AH....
00000030: 294d 6272 ad57 61d2 65df 47f6 5805 c671  )Mbr.Wa.e.G.X..q
00000040: 7f0c 4c00 be64 69fd b3a6 a1c3 8d83 febb  ..L..di.........
00000050: 3c96 c497 a9e6 ce21 540e 5168 8493 252a  <......!T.Qh..%*
00000060: abed db49 f0f5 0299 735c 6720 a4cb b7a2  ...I....s\g ....
00000070: d777 f13d 8998 568b 3f28 c295 f81a 3516  .w.=..V.?(....5.
00000080: 789b 7eba 669f 8ff2 bd0d 5ed1 6ebf cdf3  x.~.f.....^.n...
00000090: 2f1b 6d22 756c 0155 cf1f 338c e45f 102e  /.m"ul.U..3.._..
000000a0: 5d23 d604 914e 0b7a 2be2 3919 9d5b 8703  ]#...N.z+.9..[..
000000b0: 3ad4 18b2 d506 c088 4ac7 d92c 1144 6b12  :.......J..,.Dk.
000000c0: 436f aeb8 da32 e846 e570 caa0 fb08 50af  Co...2.F.p....P.
000000d0: 4b38 27bc dea7 8136 b674 0fea 3e4f e013  K8'....6.t..>O..
000000e0: 6a40 24cc 7d82 b91d ff85 9adc c98e b452  j@$.}..........R
000000f0: 373b 45a8 09aa 59dd 1ca3 76e7 807c ace3  7;E...Y...v..|..

Now we've a potential 256 bytes long key. What cipher was used? Some ciphers have specific key lengths constraints (i.e. AES) so we'll have to try those that accept 256 bytes keys. Also we can use three types: stream ciphers, block ciphers and asymmetric crypto.

A key of 256 bytes is 2048 bits, that rings a bell in RSA specific keys.

While trying to convert the big number of the key.bin to decimal to try use it in RSA I found something interesting. I had to replace lower case to upper case the hex characters (a to A, b to B).. and I noticed vim was saying 32 characters replaced in all characters I tried to change. Plugging the hex output into a text frequency analysis tool from dcode.fr:

6       32×     6.25%
C       32×     6.25%
0       32×     6.25%
B       32×     6.25%
A       32×     6.25%
5       32×     6.25%
D       32×     6.25%
7       32×     6.25%
4       32×     6.25%
3       32×     6.25%
1       32×     6.25%
E       32×     6.25%
2       32×     6.25%
F       32×     6.25%
8       32×     6.25%
9       32×     6.25%
#N : 16 Σ = 512.00      Σ = 100.00

They look random. So the characters frequency is static between the characters, making think this key was generated with a random number generator. As expected, we get the same frequency for the original gene sequence:

C       256×    25%
T       256×    25%
G       256×    25%
A       256×    25%

Even the bigrams have equal frequency, so it was a good random generator.

Looking at cipher text, which is quite big (2010 bytes long to contain a flag), we also see some patterns of repeated characters.

00000000: f3d3 8309 0748 15ee b309 8144 5dea a409  .....H.....D]...
00000010: 26b3 eaa4 83a4 09b3 6e09 b326 ea75 8323  &.......n..&.u.#
00000020: 09a4 835d 4415 eaa4 0944 eeee 6e83 a4a4  ...]D....D..n...
00000030: 0967 d344 26d3 0944 a409 154f 7583 6e09  .g.D&..D...Ou.n.
00000040: 4fb3 75b3 ee09 444f 09ea 6e75 5d83 b375  O.u...DO..nu]..u
00000050: 83c3 f009 0748 15ee b309 8144 5dea a409  .....H.....D]...
00000060: c344 a483 b3a4 8309 5007 60b7 dc09 4f44  .D......P.`...OD
00000070: 5da4 7509 b32f 2f83 b35d 83c3 0944 6e09  ].u..//..]...Dn.
00000080: bf28 ab91 0944 6e09 df09 a444 d8ea ee75  .(...Dn....D...u
00000090: b36e 8315 eaa4 0915 ea75 485d 83b3 88a4  .n.......uH]....
000000a0: 2309 156e 8309 446e 0967 d3b3 7509 44a4  #..n..Dn.g..u.D.
000000b0: 096e 1567 2309 ae85 b35d b323 094a 15ea  .n.g#....].#.J..
000000c0: 75d3 094a eac3 b36e 2309 b36e c309 75d3  u..J...n#..n..u.
000000d0: 8309 1575 d383 5d09 446e 09c2 b3d8 48ea  ...u..].Dn....H.
...

We see a lot of D and Dn, seems to have lots of repeated patterns. And in the end we see:

000007b0: 1fd8 83a4 f0da da9b f34d ac54 bc1b 88ae  .........M.T....
000007c0: 9e67 1bd3 e967 1b75 151b 269e 6ef3 5d9e  .g...g.u..&.n.].
000007d0: ee1b 0748 9eee b251 dada                 ...H...Q..
                              ^^^^

Maybe this type of data is image... or sound. In the end we've 0xdada chracters but we can find the same in other places.. so it doesn't seem to be anything interesting.

Looking at ciphertext statistics, we find that only 60 values are used from the possible 256 values. The alphabet is 26 characters, so this value is not much higher than the double of 26. This leads to belive that this cipher isn't any advanced cipher, if it was the case we would see a much better frequency distribution over the different possible values.

But I don't know any classical cipher that has a binary ciphertext..

   C  T  G  A  A  A  T  G  T  T  C  C  G  C  G  A

  f3 d3 83 09 07 48 15 ee b3 09 81 44 5d ea a4 09

11110011 11010011 100000011 00001001

I continued looking for possible ciphers, looking at the key and the ciphertext.

If I had to say which character would translate to a space I'd say the 09. It is the most frequent character.

The Caesar cipher we convert the letter to alphabet index, add the key index and wrap wround the maximum index value to get the ciphertext letter.

c_i = (m_i + k_j) % sizeof(alphabet)

I noticed that only 60 values are used, what if the alphabet includes numbers and lowercase characters?

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890

This is 62 characters alphabet. But the other problem is that the characters are distributed, not in sequence. However, if this is a classical cipher, the length of 2010 can make more sense for statistical analysis in substitution ciphers.

I've converted the existing characters to a custom non-binary alphabet and we get:

7tb BMD5m aLU3h Hm3hbh mX mH3YbG hbULD3h L55Xbhh WtLHt Lh DPYbX PmYm5 LP 3XYUbmYbr6 
BMD5m aLU3h rLhbmhb QBVny PLUhY mKKbmUbr LX pIie LX z hLw35YmXbD3h D3YMUbmdhG DXb LX 
WtmY Lh XDWG kcmUmG ND3Yt N3rmXG mXr Ytb DYtbU LX qmwM3d3G nbwDHUmYLH 8bK3M5LH DP uDX1D6 
7tb 5mYYbU DHH3UUbr LX m aL55m1b XbmU Ytb BMD5m 8LabUG PUDw WtLHt Ytb rLhbmhb Ymdbh LYh 
...

Plugging this in several basic ciphers in dcode.fr I wasn't able to crack it. But.. when I tried in guballa.de (a very good cracker) I got more luck.

7he EAO5a vIR3s Sa3ses aN aS3TeC seRIO3s I55Ness MhISh Is OFTeN FaTa5 IF 3NTReaTed6 
EAO5a vIR3s dIsease GEUlt FIRsT aPPeaRed IN fZzw IN k sIm35TaNeO3s O3TAReaosC ONe 
IN MhaT Is NOMC pbaRaC LO3Th L3daNC aNd The OTheR IN gamA3o3C lemOSRaTIS 8eP3A5IS 
OF rON1O6 7he 5aTTeR OSS3RRed IN a vI55a1e NeaR The EAO5a 8IveRC FROm MhISh The dIsease
...

We still have to convert some of the chars to fix the text but... I tried to look for the original text in the Internet which I found:

01  The Ebola virus causes an acute, serious illness which is often fatal if untreated.
02  Ebola virus disease (EVD) first appeared in 1976 in 2 simultaneous outbreaks, one
03  in what is now, Nzara, South Sudan, and the other in Yambuku, Democratic Republic of
04  Congo. The latter occurred in a village near the Ebola River, from which the disease
05  takes its name.

06  The 2014–2016 outbreak in West Africa was the largest and most complex Ebola outbreak
07  since the virus was first discovered in 1976. There were more cases and deaths in this
08  outbreak than all others combined. It also spread between countries, starting in Guinea
09  then moving across land borders to Sierra Leone and Liberia.

10  The virus family Filoviridae includes three genera: Cuevavirus, Marburgvirus, and
11  Ebolavirus. Within the genus Ebolavirus, five species have been identified: Zaire,
12  Bundibugyo, Sudan, Reston and Taï Forest. The first three, Bundibugyo ebolavirus, 
13  Zaire ebolavirus, and Sudan ebolavirus have been associated with large outbreaks
14  in Africa. The virus causing the 2014–2016 West African outbreak belongs to the Zaire
15  ebolavirus species.

The first thing to understand is if fixing one mapping, will not break other words. For example, the word aNd. We can find it in several places and it seems it seems to repeat the same code (with uppercase N). The same with EAO5a which stands for Ebola. Other example is 7he and The. This one demostrates that uppercase and lowercase are distint. We conclude that "The" actually stands for "the" and "7he" for "The".

Using the example taken from the Internet, we can fix some of the mappings, first 350 characters lead to:

The Ebola virus causes an acute, serious illness which is often fatal if untreated.
Ebola virus disease (EVD) first appeared in 1976 in 2 simultaneous outbreaks, one
in what is now, Nzara, South Sudan, and the other in Yambuku, Democratic Republic of
Congo. The latter occurred in a village near the Ebola River, from which the disease
takes its name.

nn4t is thought that fruit bats of the xteropodidae familY are natural Ebola virus 
hosts. Ebola is introduced into the human population through close contact with the
blood, secretions, organs or other bodilY fluids of infected animals such as 
chimpanzees, gorillas, fruit bats, monkeYs, forest antelope and porcupines found ill
...
the stools). HaboratorY findings include low white blood cell and platelet counts and 
elevated liver enzYmes.nnyTJqXjWkNcwWh2wWtoWccnTrclWEbcliDnn

After 350 characters we get some colisions and breaks, that is because our ciphertext deviates from our provided message. Looking for the second paragraph in the Internet we find it in the same page as the previous paragraph as:

It is thought that fruit bats of the Pteropodidae family are natural Ebola virus hosts.
Ebola is introduced into the human population through close contact with the blood,
secretions, organs or other bodily fluids of infected animals such as chimpanzees,
gorillas, fruit bats, monkeys, forest antelope and porcupines found ill or dead or
in the rainforest.

Ebola then spreads through human-to-human transmission via direct contact (through
broken skin or mucous membranes) with the blood, secretions, organs or other bodily
fluids of infected people, and with surfaces and materials (e.g. bedding, clothing)
contaminated with these fluids.

We can find the nn4t between the two paragraphs, so I set "n" to be mapped to "\n". This leads to:

The Ebola virus causes an acute, serious illness which is often fatal if untreated.
Ebola virus disease (EVD) first appeared in 1976 in 2 simultaneous outbreaks, one
in what is now, Nzara, South Sudan, and the other in Yambuku, Democratic Republic of
Congo. The latter occurred in a village near the Ebola River, from which the disease
takes its name.

4t is thought that fruit bats of the xteropodidae familY are natural Ebola virus hosts.
Ebola is introduced into the human population through close contact with the blood,
secretions, organs or other bodilY fluids of infected animals such as chimpanzees,
gorillas, fruit bats, monkeYs, forest antelope and porcupines found ill or dead or
in the rainforest.

Ebola then spreads through humanutouhuman transmission via direct contact (through
broken skin or mucous membranes) with the blood, secretions, organs or other bodilY
fluids of infected people, and with surfaces and materials (e.g. bedding, clothing)
contaminated with these fluids.yealthucare workers have freQuentlY been infected
while treating patients with suspected or confirmed EVD. This has occurred through
close contact with patients when infection control precautions are not strictlY
practiced.Jurial ceremonies that involve direct contact with the bodY of the deceased
can also contribute in the transmission of Ebola. xeople remain infectious as long as
their blood contains the virus.

The incubation period, that is, the time interval from infection with the virus to
onset of sYmptoms is 2 to 21 daYs. yumans are not infectious until theY develop sYmptoms.
Kirst sYmptoms are the sudden onset of fever fatigue, muscle pain, headache and sore
throat. This is followed bY vomiting, diarrhoea, rash, sYmptoms of impaired kidneY and
liver function, and in some cases, both internal and eVternal bleeding (e.g. oozing 
from the gums, blood in the stools). HaboratorY findings include low white blood cell 
and platelet counts and elevated liver enzYmes.

yTJqXjWkNcwWh2wWtoWccnTrclWEbcliD

Now I add the second paragraph (and others until "contains the virus.") to our expected message and continue generating the mapping. Some fixes need to be done between newlines and spaces but we get the following text:

The Ebola virus causes an acute, serious illness which is often fatal if untreated. Ebola
virus disease (EVD) first appeared in 1976 in 2 simultaneous outbreaks, one in what is 
now, Nzara, South Sudan, and the other in Yambuku, Democratic Republic of Congo. The latter
occurred in a village near the Ebola River, from which the disease takes its name.

It is thought that fruit bats of the Pteropodidae family are natural Ebola virus hosts. 
Ebola is introduced into the human population through close contact with the blood, 
secretions, organs or other bodily fluids of infected animals such as chimpanzees, 
gorillas, fruit bats, monkeys, forest antelope and porcupines found ill or dead or 
in the rainforest.

...
Haboratory findings include low white blood cell and platelet counts and elevated liver enzymes.

HTBqXjWkNcwWh2wWtoWccnTrclWEbcliD

The last paragraph is still missing some mapping fixing. We found it in the same article [4]:

The incubation period, that is, the time interval from infection with the virus to onset 
of symptoms is 2 to 21 days. Humans are not infectious until they develop symptoms. First 
symptoms are the sudden onset of fever fatigue, muscle pain, headache and sore throat. 
This is followed by vomiting, diarrhoea, rash, symptoms of impaired kidney and liver 
function, and in some cases, both internal and external bleeding (e.g. oozing from the 
gums, blood in the stools). Laboratory findings include low white blood cell and platelet
counts and elevated liver enzymes.

Taking this last one into account...

The Ebola virus causes an acute, serious illness which is often fatal if untreated. Ebola 
virus disease (EVD) first appeared in 1976 in 2 simultaneous outbreaks, one in what is now, 
Nzara, South Sudan, and the other in Yambuku, Democratic Republic of Congo. The latter 
occurred in a village near the Ebola River, from which the disease takes its name.

It is thought that fruit bats of the Pteropodidae family are natural Ebola virus hosts. 
...
Laboratory findings include low white blood cell and platelet counts and elevated liver enzymes.

HTBqXjWkNcwWh2wWtoWccnTrclWEbcliD

We've a possible flag. Not all characters in the flag can be found in the mapping so it is possible that we still might need to do something. Let's give a shot...

HTB{qXjWkNcwWh2wWtoWccnTrclWEbcliD}
    ....  . . . .  ...   . .  . ..

Does not work. :(

The missing characters are:

  • character 2 not found in mapping!
  • character c not found in mapping!
  • character D not found in mapping!
  • character i not found in mapping!
  • character j not found in mapping!
  • character q not found in mapping!
  • character W not found in mapping!

The current key for substitution cipher is:

 2         D                  W     c     ij      q    
1 345678 AC EFGHIJKLMNOPQRSTUV YZ ab defgh  klmnop rstuvwxyz
g uIl.TR b, Ef(LiBFSwnopqrctVx y9 az de1Yh  2Dm kN Cs)-v6PH7

sorted: 12  67 9 abcdefghi klmnopqrstuvwxyz BCDEFHILNPRSTVY

With n = \n.                                                   EAO5a
The original flag text: eNbYmes6nn  y7JqXjWopcMWh2MWTOWScN7Rc5WEAc5iD n  n
Which stands for:       enzymes.\n\nHTB{   kN w h w to c nTr l Eb l } \n\n

Assuming q will be { and D translates to }. This does look like some text but some characters dont seem to make much sense. For instance, EAO5a would be Ebola, but in this case we've i in the place of a. Maybe it translates to upper case a, A?

If we go back to the key, it resulted in a 256 file. That fits perfectly to a full char range key (0-255).

00000000: 6c0b a5dc d705 ca45 d630 1e2b b6fb b7df  l......E.0.+....
00000010: a332 28f8 dd7a 1981 2a70 905a 93fc e46d  .2(..z..*p.Z...m
00000020: 49e0 e338 68d2 72f5 2d61 8284 f6c2 c460  I..8h.r.-a.....`
00000030: 168e 91b1 5eab 92e1 9aef 8bf9 a40a c9b2  ....^...........
00000040: bf0c 8c00 7d98 96fe 7359 52c3 4e43 fd77  ....}...sYR.NC.w
00000050: 3c69 c86b 56d9 cd12 a80d a294 4863 1a15  <i.kV.......Hc..
00000060: 57de e786 f0fa 0166 b3ac 9b10 58c7 7b51  W......f....X.{Q
00000070: ebbb f23e 4664 a947 3f14 c16a f425 3a29  ...>Fd.G?..j.%:)
00000080: b467 bd75 996f 4ff1 7e0e ade2 9d7f cef3  .g.u.oO.~.......
00000090: 1f27 9e11 ba9c 02aa cf2f 334c d8af 201d  .'......./3L.. .
000000a0: ae13 e908 628d 07b5 17d1 3626 6ea7 4b03  ....b.....6&n.K.
000000b0: 35e8 2471 ea09 c044 85cb e61c 2288 9721  5.$q...D...."..!
000000c0: 839f 5d74 e531 d489 dab0 c550 f704 a05f  ..]t.1.....P..._
000000d0: 8734 1b7c ed5b 4239 79b8 0fd5 3d8f d023  .4.|.[B9y...=..#
000000e0: 9580 18cc be41 762e ff4a 65ec c64d 78a1  .....Av..Je..Mx.
000000f0: 3b37 8a54 0655 a6ee 2c53 b9db 40bc 5cd3  ;7.T.U..,S..@.\.

The initial ciphertext is:

00000000: f3d3 8309 0748 15ee b309 8144 5dea a409  .....H.....D]...
00000010: 26b3 eaa4 83a4 09b3 6e09 b326 ea75 8323  &.......n..&.u.#

0xf3 is 0x54 = T
0xd3 is 0x7c = |
0x83 is 0x75 = u
...

Actually this was the solution for the challenge, but I did something wrong in my code that was leading to bad results! :(

It doesn't seem to match anything we got. If we had to build a potential key, it would look like:

00000000: 7800 0000 0000 0045 0020 0000 0000 0000  x......E. ......
00000010: 0000 0000 006f 0000 0000 0057 0000 0079  .....o.....W...y
00000020: 0000 002c 0000 6300 3971 0000 0000 0070  ...,..c.9q.....p
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 6900 0000 6200 5300 0042 0066  ....i...b.S..B.f
00000050: 2844 0000 5800 0000 0000 0000 4c72 0000  (D..X.......Lr..
00000060: 5600 0000 0000 0077 0000 0000 0000 6e00  V......w......n.
00000070: 0000 0000 0074 0046 0000 0000 0000 0000  .....t.F........
00000080: 0076 0065 007a 0000 6b00 0000 0000 0000  .v.e.z..k.......
00000090: 0036 0000 0000 0000 0000 0048 0000 6300  .6.........H..c.
000000a0: 0000 0000 7300 0000 0000 0037 7100 4e00  ....s......7q.N.
000000b0: 0000 6961 0000 0044 0000 0000 6a00 0031  ..ia...D....j..1
000000c0: 0000 5964 0000 0000 0000 0050 0000 0000  ..Yd.......P....
000000d0: 0000 0068 0000 432d 6d00 0a00 2900 0032  ...h..C-m...)..2
000000e0: 0000 0000 0000 6700 0032 7500 0049 6c00  ......g..2u..Il.
000000f0: 2e00 0054 0000 0000 0052 0000 0000 0000  ...T.....R......

To build this key, I had to reverse from the substitution cipher key and the mapping used between binary to text.

I tried different variations of (A,T,C,G):

(00, 01, 10, 11) key_1
(01, 10, 11, 00) key_2
(10, 11, 00, 01) key_3
(11, 00, 01, 10) key_4

Then converting the binary number to binary file and got:

key_1:
00000000: 9c07 5aec eb0a c58a e930 2d17 79f7 7bef  ..Z......0-.y.{.
00000010: 5331 14f4 eeb5 2642 15b0 60a5 63fc d89e  S1....&B..`.c...
...

key_2:
00000000: e158 af31 3c5f 1adf 3e45 7268 8e08 8c30  .X.1<_..>Erh...0
00000010: a446 6909 33ca 7b97 6ac5 b5fa b401 2de3  .Fi.3.{.j.....-.
...

key_3:
00000000: 36ad f046 41a0 6f20 439a 87bd d35d d145  6..FA.o C....].E
00000010: f99b be5e 441f 8ce8 bf1a ca0f c956 7234  ...^D........Vr4
...

key_4:
00000000: 4bf2 059b 96f5 b075 94ef d8c2 24a2 269a  K......u....$.&.
00000010: 0eec c3a3 9960 d13d c06f 1f50 1eab 8749  .....`.=.o.P...I

None seems to have the potential key content:

00000000: 7800 0000 0000 0045 0020 0000 0000 0000  x......E. ......
00000010: 0000 0000 006f 0000 0000 0057 0000 0079  .....o.....W...y

I continued looking at possible ciphers. It must be:

  1. Cipher whose ciphertext isn't random even if key is random (excludes one-time pad type ciphers and many advanced stream and block ciphers)
  2. It must produce the same text code for a repeating cipher code, this excludes vigener type ciphers, with rotating keys.
  3. The key is big, much bigger than the alphabet

Trying several mono-alphabetic ciphers in dcode.fr I found the following potential ciphers, just by plugging a repeating message like "aaaaaaaaaaaaaa..":

  • Affine cipher (e(m) = m*A+B % 26)
  • Hill cipher (M.P = C mod 26)

To exclude the affine cipher, we can look for a plot of the estimated key. It should show pieces of linear functions (until m*A+B doesn't overflow 26 or 256 depending on alphabet size).

Also, knowing two points we can get A and B, assuming alphabet size of 256.

Say,  m=00 leads to 120, 0*A+B = 120 -> B = -120
Then, m=09 leads to 32, 9*A-120 = 32 -> A = (32+120)/9 = 16

Let's test these coefficients on other m,c pair:

m=7 leads to 69, 7*16 - 120 = -8 (248 = 0xF8) ... not working it does not match 69.

I went back to the missing characters and the flag.

  • character 2 not found in mapping!
  • character c not found in mapping!
  • character D not found in mapping!
  • character i not found in mapping!
  • character j not found in mapping!
  • character q not found in mapping!
  • character W not found in mapping!

The original flag text:

eNbYmes6nn  y7JqXjWopcMWh2MWTOWScN7Rc5WEAc5iD n  n

Which stands for:

enzymes.\n\nHTB{   kNow h w to c nTr l Eb l } \n\n

Our current replaced characters alphabet based on key is:

12  67 9 abcdefghi klmnopqrstuvwxyz  BCDEF HI  LN   P RSTV  Y    (line 1)
                  j                 A     G  JK  MNO Q    WX Z   (line 2)

We can only translate the missing characters to characters that where not used (line 2). Say, c for example is O, and i is A (based on some analysis on the text and potential letters).

 12   67 9 abcdefghi klmnopqrstuvwxyz ABCDEF HI  LN  OP RSTVW Y    (line 1)
0  345  8           j                       G  JK  MN  Q     X Z   (line 2)

The substituion key becomes:

1 345678 AC EFGHIJKLMNOPQRSTUV YZ abcdefghi klmnop rstuvwxyz
g uIl.TR b, Ef(LiBFSwnopqrctVx y9 azOde1YhA 2Dm kN Cs)-v6PH7

And the text:

eNbYmes6nn  y7JqXjWopcMWh2MWTOWScN7Rc5WEAc5iD n  n
enzymes.\n\nHTB{Xj-kNOw-h2w-to-cOnTrOl-EbOlA} \n\n

It is becoming something more interesting. I'd say W is a word split character like underscore and hyphen.

We've three problems:

  1. The missing characters, j, X, 2;
  2. The W which should be the split character;
  3. The 2 is strange. "how" seems to be the only possible word.

Let's organize the missing characters and try build a set of possible values. Assuming the end message is something like: "we know how to control ebola" (if we search in google for "Wj kNOw how to cOnTrOl EbOlA" it will fuzz search and give us something possible... we can find the first match as:

(PDF) What Do We Know About Controlling Ebola Virus Disease ...
https://www.researchgate.net/.../317804878_What_Do_We_Kno...
07/11/2017 - What Do We Know About Controlling Ebola Virus Disease Outbreaks? Article (PDF ... 
The West African Ebola virus disease (EVD) outbreak was. unparalleled in size ......
Mabey D, Flasche S, Edmunds WJ. Airport screening ...

And this lead me to the "we" as the first word!) So back to our set of possible flags, first what is missing?

HTB{W1!kN2w!h3w!to!c2nTr2l!Eb2l4}
  • The ! is a single spacing character ! => {_, -}

  • The 1 must lead to something like: e, E or 3 if we consider l33t characters. But only the 3 is available since e and E are already mapped. 1 => {3}

  • The 2 must be an o, O or 0. We've 0 and O as possible unmapped characters. 2 => {O, 0}

  • The 3 must be also o, O or 0. So it must not be the same as 2. 3 => {O, 0} != 2

  • The 4 must be: a, A or 4. Since both a and A are already used, it can only be 4. 4 => {4}

This results in:

HTB{W3!kN2w!h3w!to!c2nTr2l!Eb2l4}

So its !, 2 and 3 character set.

With a simple python script we get all possible outcomes:

for cex in ['-', '_']:
  for c2 in ['O', '0']:
    for c3 in ['O', '0']:
      if c2 == c3:
        continue
      print("HTB{W3%skN%sw%sh%sw%sto%sc%snTr%sl%sEb%sl4}" % (cex, c2,cex,c3,cex,cex,c2,c2,cex,c2))

It's just four possible values, trying them in HTB:

HTB{W3-kNOw-h0w-to-cOnTrOl-EbOl4} - incorrect
HTB{W3-kN0w-hOw-to-c0nTr0l-Eb0l4} - incorrect
HTB{W3_kNOw_h0w_to_cOnTrOl_EbOl4} - incorrect
HTB{W3_kN0w_hOw_to_c0nTr0l_Eb0l4} - correct!

This completes the challenge.... still I didn't understood the cipher so I might come back for it.

Ending Words

While solving the challenge I contacted alamot, the creator of this challenge and he provided me the source of the challenge (a zip file encrypted whose password is the flag.) Looking at the source I can see I was near the solution of the cipher.

The first 4x4 sets of the key are:

CTGA AATG TTCC GCGA GCCG AACC GATT CACC ...

Lets try convert the (A, C, G, T) to (00, 01, 10, 11) and then convert the binary number to binary data.

C=01, T=11, G=10, A=00 => 01111000 => 0x78 which is right!

So what went wrong with my conversion of key to binary?? it gave 0x6c..

Looking at my code I have A,T,C,G instead of A,C,G,T but.. I tried the 4 variations, right? wrong. I just tried the four code variations of a single sequence, there is 4! possible sequences = 4*3*2*1...

( {A, T, C, G} , {00, 01, 10, 11} )

This is 4! * 4! possible combinations right? let's try get them in python..

maps = []
for c1 in ['A', 'T', 'C', 'G']:
  for c2 in ['A', 'T', 'C', 'G']:
    if c2 == c1:
      continue
    for c3 in ['A', 'T', 'C', 'G']:
      if c3 == c1 or c3 == c2:
        continue
      for c4 in ['A', 'T', 'C', 'G']:
        if c4 == c3 or c4 == c2 or c4 == c1:
          continue
        for b1 in ['00', '01', '10', '11']:
          for b2 in ['00', '01', '10', '11']:
            if b2 == b1:
              continue
            for b3 in ['00', '01', '10', '11']:
              if b3 == b2 or b3 == b1:
                continue
              for b4 in ['00', '01', '10', '11']:
                if b4 == b3 or b4 == b2 or b4 == b1:
                  continue
                maps.append((c1,c2,c3,c4,b1,b2,b3,b4))
                print('{%s, %s, %s, %s} => {%s, %s, %s, %s}' % (c1,c2,c3,c4,b1,b2,b3,b4))
print("len=",len(maps))
len= 576

Which is 4!*4!... confirms our theory. So back to my code, I failed because I didn't used the right sequence...

void f4(void)
{
    int fkey = open("key.txt",O_RDWR);
    unsigned char key[1024];
    read(fkey,key,sizeof(key));
    close(fkey);

    // A=00, T=01, C=10, G=11
    int fout = open("key_1.txt",O_RDWR | O_CREAT | O_TRUNC);
    for( int i = 0 ; i < sizeof(key) ; i++ ) {
        char *w="XX";
        switch(key[i])
        {
        case 'A': // OK
            w = "00"; break;
        case 'T': // Wrong!
            w = "01"; break;
        case 'C': // Wrong..
            w = "10"; break;
        case 'G': // Wrong..
            w = "11"; break;
        }
        printf("%c => %s\n",key[i],w);
        write(fout,w,2);
    }
    close(fout);

    return;
}

After correcting the sequence, I get the key_1.txt, convert to binary that leads to:

00000000: 780e f598 9605 8f45 9720 1b3e e7ae e69a  x......E. .>....
00000010: f223 3cac 996f 1dc1 3f60 d05f d2a8 b479  .#<..o..?`._...y
00000020: 4db0 b22c 7c93 63a5 3971 c3c4 a783 8470  M..,|.c.9q.....p
00000030: 17cb d1e1 5bfe d3b1 dfba cead f40f 8de3  ....[...........
00000040: ea08 c800 69dc d7ab 625d 5382 4b42 a966  ....i...b]S.KB.f
00000050: 287d 8c7e 579d 8913 fc09 f3d4 4c72 1f15  (}.~W.......Lr..
00000060: 569b b6c7 a0af 0177 e2f8 de10 5c86 6e51  V......w....\.nQ
00000070: beee a32b 4774 fd46 2a14 817f a435 2f3d  ...+Gt.F*....5/=
00000080: e476 e965 dd7a 4aa1 6b0b f9b3 d96a 8ba2  .v.e.zJ.k....j..
00000090: 1a36 db11 efd8 03ff 8a3a 2248 9cfa 3019  .6.......:"H..0.
000000a0: fb12 bd0c 73c9 06e5 1691 2737 7bf6 4e02  ....s.....'7{.N.
000000b0: 25bc 3461 bf0d 8044 c58e b718 33cc d631  %.4a...D....3..1
000000c0: c2da 5964 b521 94cd 9fe0 8550 a604 f05a  ..Yd.!.....P...Z
000000d0: c624 1e68 b95e 432d 6dec 0a95 29ca 9032  .$.h.^C-m...)..2
000000e0: d5c0 1c88 eb41 673b aa4f 75b8 8749 6cf1  .....Ag;.Ou..Il.
000000f0: 2e26 cf54 0755 f7bb 3852 ed9e 40e8 5892  .&[email protected].

Comparing to our computed key:

00000000: 7800 0000 0000 0045 0020 0000 0000 0000  x......E. ......
00000010: 0000 0000 006f 0000 0000 0057 0000 0079  .....o.....W...y
00000020: 0000 002c 0000 6300 3971 0000 0000 0070  ...,..c.9q.....p
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 6900 0000 6200 5300 0042 0066  ....i...b.S..B.f
00000050: 2844 0000 5800 0000 0000 0000 4c72 0000  (D..X.......Lr..
00000060: 5600 0000 0000 0077 0000 0000 0000 6e00  V......w......n.
00000070: 0000 0000 0074 0046 0000 0000 0000 0000  .....t.F........
00000080: 0076 0065 007a 0000 6b00 0000 0000 0000  .v.e.z..k.......
00000090: 0036 0000 0000 0000 0000 0048 0000 6300  .6.........H..c.
000000a0: 0000 0000 7300 0000 0000 0037 7100 4e00  ....s......7q.N.
000000b0: 0000 6961 0000 0044 0000 0000 6a00 0031  ..ia...D....j..1
000000c0: 0000 5964 0000 0000 0000 0050 0000 0000  ..Yd.......P....
000000d0: 0000 0068 0000 432d 6d00 0a00 2900 0032  ...h..C-m...)..2
000000e0: 0000 0000 0000 6700 0032 7500 0049 6c00  ......g..2u..Il.
000000f0: 2e00 0054 0000 0000 0052 0000 0000 0000  ...T.....R......

It seems to match all the characters... using that key we get:

$ ./test
The Ebola virus causes an acute, serious illness which is often fatal if untreated.
Ebola virus disease (EVD) first appeared in 1976 in 2 simultaneous outbreaks, one in 
...
stools). Laboratory findings include low white blood cell and platelet counts and 
elevated liver enzymes.

HTB{W3_kN0w_hOw_to_c0nTr0l_Eb0l4}

References

[1] https://www.genomatix.de/online_help/help/sequence_formats.html

[2] https://towardsdatascience.com/dna-sequence-data-analysis-starting-off-in-bioinformatics-3dba4cea04f?gi=b5c3df04ab71

[3] https://www.decodedscience.org/comparing-genetic-code-dna-binary-code/55476

[4] https://www.who.int/en/news-room/fact-sheets/detail/ebola-virus-disease

jemos / Apr, 4 2020