Compression-based Methods for the Automatic Cryptanalysis of Classical Ciphers
Electronic versions
Documents
1.1 MB, PDF document
- PhD, School of Computer Science and Electronic Engineering, compression, PPM, cryptanalysis, plain text recognition, word segmentation
Research areas
Abstract
The study documented in this thesis investigates the effectiveness of compression in the field of cryptanalysis, specifically for the automatic cryptanalysis of classical ciphers, initially for the English language. Several new
compression-based cryptanalysis methods are developed against these ciphers.
The new methods use the well-known compression scheme—prediction
by partial matching (PPM)—and have been applied to automatic cryptanalysis
for three main classical ciphers: simple substitution, transposition and
Playfair ciphers. The extensive set of case studies adopted in this research
have validated the new methods, which have proven to be very effective
in the cryptanalysis of these cases with a high success rate—for substitution
ciphers, 92% of the cryptograms were correctly solved with no errors
and 100% with just three errors or less; a 100% decryption success rate was
achieved for transposition ciphers and 87% was achieved for Playfair ciphers.
This study led to the decipherment of more challenging cases, such as very
short ciphertexts with no probable words. The Gzip compression scheme
has also been applied to the automatic decryption of simple substitution
and transposition ciphers, but the results showed that Gzip, in comparison
to PPM, was not as effective. A third compressor, Bzip2, could not be used
as the nature of that scheme made its use unfeasible.
The PPM compression-based cryptanalysis methods offered significant
improvements in decryption accuracy in a diverse range of experiments while
being computationally more efficient compared to previously published techniques. In addition, extensive investigations were conducted to determine
the most appropriate type of PPM scheme to be applied in the cryptanalysis
of these ciphers. These findings have highlighted why better models are
of vital importance in cryptology. In particular, the study has shown how
a good model of the source (i.e. the PPM compression model)–a method
that shows a high level of performance when applied to different language
modelling tasks–can also be effectively used in the automatic decryption of
different classical ciphers.
As spaces have been traditionally omitted from ciphertext, a full cryptanalysis
mechanism which also automatically adds spaces to decrypted texts,
again using a compression-based approach, has also been proposed to achieve
readability.
This work has also investigated whether the newly devised cryptanalysis
methods are applicable to another language (specifically Arabic as it is a
language non-related to English). Arabic is a rich morphological language
with its own characteristics that differentiate it from other languages. The
current study has specifically adapted new compression-based methods for
the automatic cryptanalysis of classical Arabic ciphers (simple substitution,
transposition and Playfair ciphers). Although the experiments conducted
with Arabic ciphers have generally been less effective than those with classical
English ciphers, excellent results have been achieved—for Arabic substitution
ciphers, 72% of the cryptograms were successfully solved without
any errors and over 91% with just three errors or less; a 97% decryption
success rate was achieved for Arabic transposition ciphers, with this result
being 73% for Arabic Playfair ciphers.
compression-based cryptanalysis methods are developed against these ciphers.
The new methods use the well-known compression scheme—prediction
by partial matching (PPM)—and have been applied to automatic cryptanalysis
for three main classical ciphers: simple substitution, transposition and
Playfair ciphers. The extensive set of case studies adopted in this research
have validated the new methods, which have proven to be very effective
in the cryptanalysis of these cases with a high success rate—for substitution
ciphers, 92% of the cryptograms were correctly solved with no errors
and 100% with just three errors or less; a 100% decryption success rate was
achieved for transposition ciphers and 87% was achieved for Playfair ciphers.
This study led to the decipherment of more challenging cases, such as very
short ciphertexts with no probable words. The Gzip compression scheme
has also been applied to the automatic decryption of simple substitution
and transposition ciphers, but the results showed that Gzip, in comparison
to PPM, was not as effective. A third compressor, Bzip2, could not be used
as the nature of that scheme made its use unfeasible.
The PPM compression-based cryptanalysis methods offered significant
improvements in decryption accuracy in a diverse range of experiments while
being computationally more efficient compared to previously published techniques. In addition, extensive investigations were conducted to determine
the most appropriate type of PPM scheme to be applied in the cryptanalysis
of these ciphers. These findings have highlighted why better models are
of vital importance in cryptology. In particular, the study has shown how
a good model of the source (i.e. the PPM compression model)–a method
that shows a high level of performance when applied to different language
modelling tasks–can also be effectively used in the automatic decryption of
different classical ciphers.
As spaces have been traditionally omitted from ciphertext, a full cryptanalysis
mechanism which also automatically adds spaces to decrypted texts,
again using a compression-based approach, has also been proposed to achieve
readability.
This work has also investigated whether the newly devised cryptanalysis
methods are applicable to another language (specifically Arabic as it is a
language non-related to English). Arabic is a rich morphological language
with its own characteristics that differentiate it from other languages. The
current study has specifically adapted new compression-based methods for
the automatic cryptanalysis of classical Arabic ciphers (simple substitution,
transposition and Playfair ciphers). Although the experiments conducted
with Arabic ciphers have generally been less effective than those with classical
English ciphers, excellent results have been achieved—for Arabic substitution
ciphers, 72% of the cryptograms were successfully solved without
any errors and over 91% with just three errors or less; a 97% decryption
success rate was achieved for Arabic transposition ciphers, with this result
being 73% for Arabic Playfair ciphers.
Details
Original language | English |
---|---|
Awarding Institution | |
Supervisors/Advisors |
|
Thesis sponsors |
|
Award date | 3 Jun 2019 |
Research outputs (1)
- Published
Visualisation Data Modelling Graphics (VDMG) at Bangor
Research output: Contribution to conference › Paper › peer-review