getting to know CipherGAN

Machine learning is now used to a greater or lesser extent in various industries. The direction of cryptographic analysis was no exception. In this article, we'll look at a generative adversarial network, CipherGAN, used to define a basic cipher mapping across banks of unpaired ciphertext and plaintext.

CycleGAN

CipherGAN is based on another generative adversarial network, CycleGAN. This type of generative adversarial network is used to transfer image style. So CycleGAN can be trained to convert images from one domain (for example, Fortnite) to another, for example, PUBG. The training is unsupervised, meaning there is no way to uniquely match images from both of these domains.

This opens up the possibility of doing a lot of interesting things like enhancing the quality of photos, colorizing images, style transfer, etc. All you need is a source and target dataset (which is simply a catalog of images). The general operating principle of this network is presented in the following diagram.

But let's return to the main topic of our article – CipherGAN. As mentioned above, this network is used to define a basic cipher mapping across banks of unpaired ciphertext and plaintext.

Substitution ciphers have an obfuscation property that ensures that the encryption key and the corresponding ciphertext are indistinguishable. In particular, confusion is the main building block of modern symmetric cryptography, which has the property of diffusion. Thus, understanding the structure and vulnerability of substitution ciphers can form the basis for cryptanalysis of modern cryptography. Table 2 presents various studies on artificial intelligence-based cryptanalysis

However, CipherGAN is capable of cracking language data encrypted using various variations of ciphers and the Vigenère cipher with a high degree of accuracy and for vocabularies significantly larger than previously achieved.

Let us recall what the Vigenère cipher is. For encryption, a table of alphabets called a tabula recta or Vigenère square (table) can be used. In relation to the Latin alphabet, the Vigenère table is composed of lines of 26 characters, with each subsequent line shifted by several positions.

Each encryption step uses different alphabets, chosen depending on the keyword character. For example, suppose the source text looks like this:

ATTACKATDAWN

The person sending the message writes the keyword (“LEMON”) in a loop until its length matches the length of the original text:

LEMONLEMONLE

The first character of the original text (“A”) is encrypted with the sequence L, which is the first character of the key. The first character of the ciphertext (“L”) is found at the intersection of row L and column A in the Vigenère table. Similarly, for the second character of the source text, the second character of the key is used; that is, the second character of the ciphertext (“X”) is obtained at the intersection of row E and column T. The rest of the plaintext is encrypted in a similar way.

Исходный текст:       ATTACKATDAWN

Ключ:                 LEMONLEMONLE

Зашифрованный текст:  LXFOPVEFRNHR

Of course, by modern standards, this is, to put it mildly, not the most powerful encryption algorithm, but it can be used for educational purposes.

Installing CipherGAN

CipherGAN is written in Python, which greatly simplifies working with this network. To install, do:

pip install -r CipherGAN/requirements.txt

In order to train our neural network, we use data generators. The cipher_generator deserves special attention (https://github.com/for‑ai/CipherGAN/blob/master/data/data_generators/cipher_generator.py), which can be used to generate data for displacement and Vigenère ciphers.

The settings necessary for generation are passed to the script in the form of flags. For example, to generate a word-level Vigenère cipher (key:CODE) with a sample length of 200, call:

python CipherGAN/data/data_generators/cipher_generator.py \
  --cipher=vigenere \
  --vigenere_key=345 \
  --percentage_training=0.9 \
  --corpus=brown \
  --vocab_size=200 \
  --test_name=vigenere345-brown200-eval \
  --train_name=vigenere345-brown200-train \
  --output_dir=tmp/data \
  --vocab_filename=vigenere345_brown200_vocab.txt

Let's start training

In order to train a neural network, you can use the script train.pypresented below.

import shutil
import os
import tensorflow as tf

from .hparams.registry import get_hparams
from .models.registry import _MODELS
from .data.registry import _INPUT_FNS, get_dataset
from .metrics.registry import get_metrics
from .train_utils.lr_schemes import get_lr
from .train_utils.vocab_utils import read_vocab

tf.flags.DEFINE_string("model", "cycle_gan", "Which model to use.")
tf.flags.DEFINE_string("data", "cipher", "Which data to use.")
tf.flags.DEFINE_string("hparam_sets", "cipher_default", "Which hparams to use.")
tf.flags.DEFINE_string("hparams", "", "Run-specific hparam settings to use.")
tf.flags.DEFINE_string("metrics", "xy_mse",
                       "Dash separated list of metrics to use.")
tf.flags.DEFINE_string("output_dir", "tmp/tf_run",
                       "The output directory.")
tf.flags.DEFINE_string("data_dir", "tmp/data", "The data directory.")
tf.flags.DEFINE_integer("train_steps", 1e4,
                        "Number of training steps to perform.")
tf.flags.DEFINE_integer("eval_steps", 1e2,
                        "Number of evaluation steps to perform.")
tf.flags.DEFINE_boolean("overwrite_output", False,
                        "Remove output_dir before running.")
tf.flags.DEFINE_string("train_name", "data-train*",
                       "The train dataset file name.")
tf.flags.DEFINE_string("test_name", "data-eval*", "The test dataset file name.")

FLAGS = tf.app.flags.FLAGS
tf.logging.set_verbosity(tf.logging.INFO)
 

def _run_locally(train_steps, eval_steps):
  """Run training, evaluation and inference locally.

  Args:
    train_steps: An integer, number of steps to train.
    eval_steps: An integer, number of steps to evaluate.
  """
  hparams = get_hparams(FLAGS.hparam_sets)
  hparams = hparams.parse(FLAGS.hparams)
  hparams.total_steps = FLAGS.train_steps
 

  if "vocab_file" in hparams.values():
    hparams.vocab = read_vocab(hparams.vocab_file)
    hparams.vocab_size = len(hparams.vocab)
    hparams.vocab_size += int(hparams.vocab_size % 2 == 1)
    hparams.input_shape = [hparams.sample_length, hparams.vocab_size]
 

  output_dir = FLAGS.output_dir
  if os.path.exists(output_dir) and FLAGS.overwrite_output:
    shutil.rmtree(FLAGS.output_dir)
 

  if not os.path.exists(output_dir):
    os.makedirs(output_dir)
 

  def model_fn(features, labels, mode):
    lr = get_lr(hparams)
    return _MODELS[FLAGS.model](hparams, lr)(features, labels, mode)
 

  train_path, eval_path = get_dataset(FLAGS.data_dir, FLAGS.train_name,
                                      FLAGS.test_name)
  train_input_fn = _INPUT_FNS[FLAGS.data](train_path, hparams, training=True)
  eval_input_fn = _INPUT_FNS[FLAGS.data](eval_path, hparams, training=False)
 

  run_config = tf.contrib.learn.RunConfig()
 

  estimator = tf.contrib.learn.Estimator(
      model_fn=model_fn, model_dir=output_dir, config=run_config)
 

  eval_metrics = get_metrics(FLAGS.metrics, hparams)
  experiment = tf.contrib.learn.Experiment(
      estimator=estimator,
      train_input_fn=train_input_fn,
      eval_input_fn=eval_input_fn,
      eval_metrics=eval_metrics,
      train_steps=train_steps,
      eval_steps=eval_steps)
  experiment.train_and_evaluate() 
 

def main(_):
  _run_locally(FLAGS.train_steps, FLAGS.eval_steps) 
 

if __name__ == "__main__":
  tf.app.run() 

As an example, the parameters required to train a neural network are presented based on the data generated in the previous step.

python -m CipherGAN.train \
  --output_dir=runs/vig345 \
  --test_name="vigenere345-brown200-eval*" \
  --train_name="vigenere345-brown200-train*" \
  --hparam_sets=vigenere_brown_vocab_200

Conclusion

Of course, now artificial intelligence does not yet know how to hack AES or GOST. The CipherGAN neural network presented in the article is still able to decrypt training ciphers, like the Vigenère cipher. However, in the future, neural networks may well become an effective assistant for cryptanalysts.


The article was prepared on the eve of the start of the course “Cryptographic Information Security”. Find out more about the course.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *