Steganography

Steganography is the idea of hiding a message inside another message. Steganography differs from cryptography in that there is no key required. To find the message, you simply need to know where to look. Here is an example. A message is hiding in this text.

Steganography is frequently
examined in popular
culture as a way to
reflect upon how
easily one can
trick others.

In hidden message is the word “secret,” which can be found by looking at the first letter of each line. There are thousands of ways to hide messages. We will look at a few common ones. One simple trick to hide a message in a document is to change the font color to match the background color. This is the digital analog of invisible ink.

Hiding data in images

A very common technique is hiding data in an image. To understand how it works, we first need to know how colors work in computer images. Every color is broken into three components: red, green, and blue. Each of those components is commonly 8 bits, ranging from 0 to 255. In total, there are 2563 ≈ 16.8 million colors. The key to hiding things in images is that it is very difficult for the human eye to distinguish between all of the 16.8 million colors, especially if there is only a change in the least significant bits of a pixel's components, like if we go from RGB components of (160, 200, 80) to (160, 200, 79).

It will help to think of the 8-bit components in binary form, like 10001010 or 00101110. The leftmost bits are the most significant ones, and a change in those will have a dramatic difference on the color. For instance, the difference between 00000000 and 10000000 is the difference between 0 and 128. The rightmost bits are the least significant, and a change in those will be likely be imperceptible to the human eye. Changing 00000000 to 00000001 changes 0 to 1.

We can actually hide an entire image inside another. In order to do that, we will do some bitwise operations. There are four operations that we will need:

To hide an image, we will need an image to hide it in. We'll call that the carrier image. Let's assume that r1 is the red component of a pixel of the carrier image, and r2 is the red component in the message we want to hide. The following operation is part of hiding the image in the carrier:

r1 = (r1 & 0b11111000) ^ (r2 >> 5)

This operation first zeros out the bottom 3 bits from the carrier, leaving the top 5 bits alone. Then it shifts the most significant bits from the image we're hiding so that they are in the 3 least significant positions. Finally, we add the two together. In the result, the top 5 bits will be from the carrier and the bottom 3 will be from the hidden image. We do a similar thing for the green and blue components, and we do that for all the pixels in the image. Here is some Python code that will do the whole process.

from tkinter import *
from PIL import Image, ImageTk

def change():
    global image1, photo1, image2, photo2
    pix1 = image1.load()
    for i in range(photo1.width()):
        for j in range(photo1.height()):
            r1,g1,b1 = pix1[i,j]
            r2,g2,b2 = pix2[i,j]
            r1 = (r1 & 0b11111000) ^ (r2 >> 5)
            g1 = (g1 & 0b11111000) ^ (g2 >> 5)
            b1 = (b1 & 0b11111000) ^ (b2 >> 5)
            pix1[i,j] = r1,g1,b1
    photo1=ImageTk.PhotoImage(image1)
    canvas.create_image(0,0,image=photo1,anchor=NW)

def load_files():
    global image1, photo1, image2, photo2
    image1=Image.open('img1.jpg').convert('RGB')
    photo1=ImageTk.PhotoImage(image1)
    image2=Image.open('img2.jpg').convert('RGB')
    photo2=ImageTk.PhotoImage(image2)
    canvas.configure(width=photo1.width(), height=photo1.height())
    canvas.create_image(0,0,image=photo1,anchor=NW)

root = Tk()
button = Button(text='Change', font=('Verdana', 18), command=change)
canvas = Canvas()
canvas.grid(row=0)
button.grid(row=1)
load_files()
mainloop()

To run the program, you will first need to install the Python Imaging Library. Use pip install pillow at the command prompt to do that. Next, you will need two image files, preferably both the same size. They are called img1.jpg and img2.jpg in the code above. When you run the program and click the “Change” button, the second image will be hidden in the first. I haven't provided the code to extract that image, but you should be able to work out what it is based on the discussion above about how to hide the image.

You might try experimenting with the number of bits to use for the carrier and the number for the hidden image. The more bits for the carrier, the better the hidden image will be concealed, but the worse its quality will be when revealed. It also helps to make sure the carrier image is busy. A plain image, especially one with a sky or a lot of solid colors, will make it easier for human eyes to spot the hidden image.

One way to hide text in other text

To hide data, it often helps to convert it to a binary format. Here is a helper function that will convert a string of text into a binary string of 0s and 1s:

def to_binary(message):
    m = message.encode()
    h = bytes.hex(m)
    return ''.join('{:04b}'.format(int(x,16)) for x in h)

It starts by converting the string into a Python byte string. We then use bytes.hex to turn the byte string into a hex string. The last line converts the hex to binary. The purpose of using the call to the format method is to make sure leading 0s are preserved. For instance, we want hex number 6 (0110) to stay as 0110, and not become 110, which is what Python's bin function would do. Here is a helper function that takes a binary string and turns it back into text.

def from_binary(message):
    s = int(message, 2)
    h = hex(s)[2:]
    return bytes.fromhex(h)

Now we can look at a neat trick for hiding a message. The letter a has ASCII/Unicode value 97. Farther up in the Unicode table, at character 1072, there is another character that is pixel-for-pixel identical to the a at character 97. It's the letter a in the Cyrillic alphabet. We can use this fact to hide a message inside some carrier text.

Let's use “aardvarks are animals any day” as our carrier text. Let's try to hide the letter Z (ASCII/Unicode value 90). The value 90 translates to 01011010 in binary (and we can use the code above to do that). Let's overlay this on the carrier message, like below:

01   0    1   1   0   1    0
aardvarks are animals any day

Specifically, we've lined up the binary with the a's of the message. Every a that is lined up with a 0 will stay as an ordinary a. Every a that is lined up with a 1 will get changed to a Cyrillic (character code 1072) a. Here is code that will hidea message using this technique:

def hide_message(carrier, message):
    s = ''
    b = to_binary(message)
    i = 0

    for c in carrier:
        if c != 'a':
            s += c
        elif i < len(b):
            if b[i] == '0':
                s += 'a'
            else:
                s += chr(1072)
            i += 1
        else:
            s += 'a'
    return s

We have to be a little careful in the likely event that the carrier message has more a's than we have bits in the hidden message. That's what the i < len(b) part is for. To show a hidden message, we do things in reverse, like below:

def find_message(carrier):
    s = ''
    for c in carrier:
        if c == 'a':
            s += '0'
        elif c == chr(1072):
            s += '1'
    return from_binary(s)

Here is a little code to test this out:

message = 'Zebra'
carrier = 'aardvarks are animals any day '*20
h = hide_message(carrier, message)
print(h)
print(find_message(h))

Note that there will be some extra \x00 characters at the end of the message. If you want to, you can write a little code to remove them.

More about steganography

We have looked at a few simple ways of hiding data. There are so many more. One nice approach to hiding data used by the popular software steghide is as follows: Start with a passphrase. This is used to seed a PRNG that selects random pixels in a carrier image. The message to be hidden is converted to binary and hidden in the least significant digits of those pixels. The steghide program does more than just this to avoid detection, but that's the basic idea. And there's nothing special about images. Data can be hidden in audio and other types of files in a similar way.

Steganography can be detected by statistical and machine learning techniques. These techniques look at ordinary images and text and build up some statistics about what ordinary images and text looks like. If an image is hiding a significant amount of data, these tools will notice that the patterns of bits in the least significant bit portions of the pixels don't look like they would for an ordinary image. If you are only storing a very small amount of data or if you are clever about how you hide things, then you can get around these detection tools.