Compression-based Methods for the Automatic Cryptanalysis of Classical Ciphers

Electronic versions

Documents

  • Noor Al-Kazaz

    Research areas

  • PhD, School of Computer Science and Electronic Engineering, compression, PPM, cryptanalysis, plain text recognition, word segmentation

Abstract

The study documented in this thesis investigates the effectiveness of compression in the field of cryptanalysis, specifically for the automatic cryptanalysis of classical ciphers, initially for the English language. Several new
compression-based cryptanalysis methods are developed against these ciphers.
The new methods use the well-known compression scheme—prediction
by partial matching (PPM)—and have been applied to automatic cryptanalysis
for three main classical ciphers: simple substitution, transposition and
Playfair ciphers. The extensive set of case studies adopted in this research
have validated the new methods, which have proven to be very effective
in the cryptanalysis of these cases with a high success rate—for substitution
ciphers, 92% of the cryptograms were correctly solved with no errors
and 100% with just three errors or less; a 100% decryption success rate was
achieved for transposition ciphers and 87% was achieved for Playfair ciphers.
This study led to the decipherment of more challenging cases, such as very
short ciphertexts with no probable words. The Gzip compression scheme
has also been applied to the automatic decryption of simple substitution
and transposition ciphers, but the results showed that Gzip, in comparison
to PPM, was not as effective. A third compressor, Bzip2, could not be used
as the nature of that scheme made its use unfeasible.
The PPM compression-based cryptanalysis methods offered significant
improvements in decryption accuracy in a diverse range of experiments while
being computationally more efficient compared to previously published techniques. In addition, extensive investigations were conducted to determine
the most appropriate type of PPM scheme to be applied in the cryptanalysis
of these ciphers. These findings have highlighted why better models are
of vital importance in cryptology. In particular, the study has shown how
a good model of the source (i.e. the PPM compression model)–a method
that shows a high level of performance when applied to different language
modelling tasks–can also be effectively used in the automatic decryption of
different classical ciphers.
As spaces have been traditionally omitted from ciphertext, a full cryptanalysis
mechanism which also automatically adds spaces to decrypted texts,
again using a compression-based approach, has also been proposed to achieve
readability.
This work has also investigated whether the newly devised cryptanalysis
methods are applicable to another language (specifically Arabic as it is a
language non-related to English). Arabic is a rich morphological language
with its own characteristics that differentiate it from other languages. The
current study has specifically adapted new compression-based methods for
the automatic cryptanalysis of classical Arabic ciphers (simple substitution,
transposition and Playfair ciphers). Although the experiments conducted
with Arabic ciphers have generally been less effective than those with classical
English ciphers, excellent results have been achieved—for Arabic substitution
ciphers, 72% of the cryptograms were successfully solved without
any errors and over 91% with just three errors or less; a 97% decryption
success rate was achieved for Arabic transposition ciphers, with this result
being 73% for Arabic Playfair ciphers.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Thesis sponsors
  • Iraqi Ministry of Higher Education and scientific research
  • University of Baghdad
Award date3 Jun 2019

Research outputs (1)

View all