Cryptanalysis Using Pattern Recognition Tools
Electronic versions
Documents
42.6 MB, PDF document
Abstract
For cryptanalysis, an important task is to identify the encryption algorithm that
was used to encrypt a plain-text file. The main objective of this dissertation is to
find the best classification algorithm that can identify the encryption method for
block and stream cipher algorithms.
This work provides a comparison of classification of encryption output between
different types of block and stream cipher algorithms, with an evaluation between ECB and CBC modes using 8-bit and 16-bit codes. It provides results for different numbers of keys for all the encryption algorithms (block and stream cipher algorithms) that were analysed and determines the accuracy in each case.
We have created an encryption dataset that was used for the experimental evaluation. Different block and stream cipher algorithms were used to encrypt the
source dataset which were a random sampling of text file data taken from the Internet in 2010 that included various types of data such as reports, papers, news, text from websites and journals. These samples ranged in sizes from 100 bytes to 10000 bytes. An initial analysis of the encrypted text shows that the data is random in nature. The Frequency Test shows a uniform distribution for the encrypted text. The Chi-square test also indicates the distribution of character codes is uniform. A compression test using the PPM text compression algorithm also shows that the encrypted text is uncompressible and therefore is random in nature. These tests show that the encrypted data is therefore difficult to classify.
The block and stream cipher algorithms used to encrypt the data used 8-bit and
16-bit codes. The study included two groups of block cipher algorithms: the first
group considered the following block cipher algorithms: DES (64-bit), IDEA (128-
bit), AES (128, 192, 256-bit) and RC2 (42, 84, 128-bit). The second group included another seven block cipher algorithms: RC2, RC6, Blowfish, Twofish, XTA, CAST and DESede (3DES), all with the same key size (128-bit). As well, the following stream cipher algorithms were investigated: Grain 128-bit, HC 128-bit, RC4 128-bit, VMPC 128-bit and Salsa20 128-bit.
The results from the classification experiment show that Pattern Recognition techniques are useful tools for cryptanalysis as a means of identifying the type of
encryption algorithm used to encrypt the data. As well, the result shows that increasing the number of encryption keys will result in reducing the classification
accuracy. The results also show that it is possible to achieve an accuracy above
40% with some classifiers when each file is encrypted with different numbers of
keys using block ciphers. It was also clear that increasing the number of files used
also improves accuracy. The RoFo classifier had the best performance when identifying the encryption method for ciphered data, while IBL' s performance was the worst. Moreover, the performance of the classifiers improved significantly when identification of four different algorithms was considered. It was noted that the three versions of AES (128, 192 and 256-bit) were not distinguishable within AES. Further, RC2 (128-bit) does not match the other versions of the same encoding RC2 (42, 84-bit).
For stream cipher algorithms, the results show that it is more difficult to classify encrypted output compared to block cipher algorithms. This is due to the bit based streaming approach adopted by the algorithms and the randomly distributed characters that are consequently produced in the encrypted output.
was used to encrypt a plain-text file. The main objective of this dissertation is to
find the best classification algorithm that can identify the encryption method for
block and stream cipher algorithms.
This work provides a comparison of classification of encryption output between
different types of block and stream cipher algorithms, with an evaluation between ECB and CBC modes using 8-bit and 16-bit codes. It provides results for different numbers of keys for all the encryption algorithms (block and stream cipher algorithms) that were analysed and determines the accuracy in each case.
We have created an encryption dataset that was used for the experimental evaluation. Different block and stream cipher algorithms were used to encrypt the
source dataset which were a random sampling of text file data taken from the Internet in 2010 that included various types of data such as reports, papers, news, text from websites and journals. These samples ranged in sizes from 100 bytes to 10000 bytes. An initial analysis of the encrypted text shows that the data is random in nature. The Frequency Test shows a uniform distribution for the encrypted text. The Chi-square test also indicates the distribution of character codes is uniform. A compression test using the PPM text compression algorithm also shows that the encrypted text is uncompressible and therefore is random in nature. These tests show that the encrypted data is therefore difficult to classify.
The block and stream cipher algorithms used to encrypt the data used 8-bit and
16-bit codes. The study included two groups of block cipher algorithms: the first
group considered the following block cipher algorithms: DES (64-bit), IDEA (128-
bit), AES (128, 192, 256-bit) and RC2 (42, 84, 128-bit). The second group included another seven block cipher algorithms: RC2, RC6, Blowfish, Twofish, XTA, CAST and DESede (3DES), all with the same key size (128-bit). As well, the following stream cipher algorithms were investigated: Grain 128-bit, HC 128-bit, RC4 128-bit, VMPC 128-bit and Salsa20 128-bit.
The results from the classification experiment show that Pattern Recognition techniques are useful tools for cryptanalysis as a means of identifying the type of
encryption algorithm used to encrypt the data. As well, the result shows that increasing the number of encryption keys will result in reducing the classification
accuracy. The results also show that it is possible to achieve an accuracy above
40% with some classifiers when each file is encrypted with different numbers of
keys using block ciphers. It was also clear that increasing the number of files used
also improves accuracy. The RoFo classifier had the best performance when identifying the encryption method for ciphered data, while IBL' s performance was the worst. Moreover, the performance of the classifiers improved significantly when identification of four different algorithms was considered. It was noted that the three versions of AES (128, 192 and 256-bit) were not distinguishable within AES. Further, RC2 (128-bit) does not match the other versions of the same encoding RC2 (42, 84-bit).
For stream cipher algorithms, the results show that it is more difficult to classify encrypted output compared to block cipher algorithms. This is due to the bit based streaming approach adopted by the algorithms and the randomly distributed characters that are consequently produced in the encrypted output.
Details
Original language | English |
---|---|
Awarding Institution | |
Supervisors/Advisors |
|
Thesis sponsors |
|
Award date | Feb 2013 |