Cryptanalysis Using Pattern Recognition Tools

    Student thesis: Doctor of Philosophy

    Abstract

    For cryptanalysis, an important task is to identify the encryption algorithm that
    was used to encrypt a plain-text file. The main objective of this dissertation is to
    find the best classification algorithm that can identify the encryption method for
    block and stream cipher algorithms.
    This work provides a comparison of classification of encryption output between
    different types of block and stream cipher algorithms, with an evaluation between ECB and CBC modes using 8-bit and 16-bit codes. It provides results for different numbers of keys for all the encryption algorithms (block and stream cipher algorithms) that were analysed and determines the accuracy in each case.
    We have created an encryption dataset that was used for the experimental evaluation. Different block and stream cipher algorithms were used to encrypt the
    source dataset which were a random sampling of text file data taken from the Internet in 2010 that included various types of data such as reports, papers, news, text from websites and journals. These samples ranged in sizes from 100 bytes to 10000 bytes. An initial analysis of the encrypted text shows that the data is random in nature. The Frequency Test shows a uniform distribution for the encrypted text. The Chi-square test also indicates the distribution of character codes is uniform. A compression test using the PPM text compression algorithm also shows that the encrypted text is uncompressible and therefore is random in nature. These tests show that the encrypted data is therefore difficult to classify.
    The block and stream cipher algorithms used to encrypt the data used 8-bit and
    16-bit codes. The study included two groups of block cipher algorithms: the first
    group considered the following block cipher algorithms: DES (64-bit), IDEA (128-
    bit), AES (128, 192, 256-bit) and RC2 (42, 84, 128-bit). The second group included another seven block cipher algorithms: RC2, RC6, Blowfish, Twofish, XTA, CAST and DESede (3DES), all with the same key size (128-bit). As well, the following stream cipher algorithms were investigated: Grain 128-bit, HC 128-bit, RC4 128-bit, VMPC 128-bit and Salsa20 128-bit.
    The results from the classification experiment show that Pattern Recognition techniques are useful tools for cryptanalysis as a means of identifying the type of
    encryption algorithm used to encrypt the data. As well, the result shows that increasing the number of encryption keys will result in reducing the classification
    accuracy. The results also show that it is possible to achieve an accuracy above
    40% with some classifiers when each file is encrypted with different numbers of
    keys using block ciphers. It was also clear that increasing the number of files used
    also improves accuracy. The RoFo classifier had the best performance when identifying the encryption method for ciphered data, while IBL' s performance was the worst. Moreover, the performance of the classifiers improved significantly when identification of four different algorithms was considered. It was noted that the three versions of AES (128, 192 and 256-bit) were not distinguishable within AES. Further, RC2 (128-bit) does not match the other versions of the same encoding RC2 (42, 84-bit).
    For stream cipher algorithms, the results show that it is more difficult to classify encrypted output compared to block cipher algorithms. This is due to the bit based streaming approach adopted by the algorithms and the randomly distributed characters that are consequently produced in the encrypted output.
    Date of AwardFeb 2013
    Original languageEnglish
    Awarding Institution
    • Bangor University
    SponsorsMinistry of Higher Education and Scientific Research, Kurdistan Regional Government
    SupervisorSaad Mansoor (Supervisor) & Ludmila Kuncheva (Supervisor)

    Cite this

    '