R package StegosauR - Capabilities

[back to intro] - [to part 1] - [to part 2] - [to part 3] - [practical example]

Basic Information

The current version of StegosauR, v.0.1.0, was developed under R 3.5.1. To operate, it requires a series of additional R packages (see installation instructions on GitHub). Reliability tests were performed with the following versions:

## [1] "jpeg 0.1-8"       "tiff 0.1-5"       "png 0.1-7"        "Unicode 12.0.0-1"
## [5] "dplyr 0.8.3"      "openssl 1.4.1"    "magrittr 1.5"

Compatibility test

So far, I could test StegosauR on four different systems: three equipped with Windows and one with Ubuntu. I ran my test using a 100x100 pixels, completely white .jpg file. Having a completely white image helps to highlight how the file looks after a message is encrypted. My test message was a 444 characters long randomly generated Lorem Ipsum text. All machines were able to encode and decode the message correctly. The most recent Windows computer (number 1 in tab. 1) took 2.5-5 seconds to accomplish each task. The Ubuntu machine is a 10 years old Samsung Netbook. It took a bit longer than the others to run StegosauR, but it did its job flawlessly nonetheless. A file encoded with one machine could be decoded correctly by all the other three.

Table 1 - Encoding and decoding times (444 characters) for different hardware/software setups.
	OS version	R version	RAM	CPU	encryption time	decryption time
1	Windows 10 Pro (x64)	3.5.0 (x64)	8GB	Intel i5-6500 3.2GHz	2.5	2.4
2	Windows 10 Home (x64)	3.5.1 (x64)	16GB	AMD FX-8320 3.5GHz	4.8	4.9
3	Windows 7 Enterprise (x64)	3.4.2 (x64)	8GB	Intel i5-2520M 2.5GHz	5.2	5.1
4	Ubuntu 18.03 LTS (i686)	3.4.4 (i686)	2GB	Intel Atom N270 1.6GHz	32.7	34.9

For reference, storing a 5000 words piece of text (33000+ characters) in a 1000x1000 pixel image took 230 seconds with machine number one.

Storage test

I ran also some additional tests to understand how many characters can be stored in a given image. These tests were performed on a series of .jpg files ranging from 50x50 to 1000x1000 pixels. Every image is in RGB format, so it has three color channels. The maximum number of available places to encode a message is given by image width x height x number of channels. This means that a 100x100 pixels RGB image has a total of 30000 potential encoding slots. As shown in previous sections, StegosauR needs 12 slots to encode every single character (spaces, new line separators, etc. included), plus 12 additional slots to store the information on message length. This means that our 100x100 image could theoretically store 2499 characters (the original 30000, minus the necessary 12 for message length, all divided by 12).

Still, it is not really possible to fit a 2499 character text into a 100x100 pixel image. The pseudo-random generator employed by StegosauR has its limits, and it would require an unpractical amount of computing time to generate all the unique coordinate sets within an image (i.e. unique combinations of width, lenght and channel). For this reason, the maximum message length for a 100x100 image is actually in the range of 600 characters. Future versions of StegosauR might include an option to sacrifice speed in order to get more storage space, but this would add yet another parameter that needs to be comunicated to the decoding user (beside the passwords and the image itself), reducing the overall ease of use for this application.

The pseudo-random generator produces a unique sequence of numbers based on the user-defined passwords (secret and salt, if any) and on message length. Different combinations of these parameters can then result in more or less long sequences of unique coordinate sets and, as a result, more or less valid slots for character encoding. Given all these variables, the storage test proceeded in this way: I attempted to store increasingly long character sequences within each test image. Each attempt with a given character sequence was tested with 25 password combinations (5 different secrets and 5 different salts).

The 5 different secrets and salts include StegosauR default settings (secret=“StegosauR password”, salt=NA), as well as single words, multiple random words, single characters and sequences of random characters.

#different secrets ans salts used in the storage test.

secret <- c("StegosauR password","secret", "!" ,";>Tr8b]v{uD-!bF+", "multiply short post stick throw")

salt <- c(NA,"salt", "!" ,";>Tr8b]v{uD-!bF+", "multiply short post stick throw")

Text storage was tested at increments of 10 characters for 50x50 and 100x100 pixel images, 100 characters for 200x200 and 300x300 images, 500 character for 400x400 and 500x500 mages, 1000 characters for 600x600 and 700x700 images, 2000 for the 800x800 image, 2500 for the 900x900 image and finally 5000 for the 1000x1000 image. These different steps are needed because testing large images with small increments would have probably taken a few years of uninterrupted computations. Here are the results.

Table 2 - Storage test with different RGB images.
image size (pixels)	max. theoretical capacity (characters)	min. tested capacity	max. tested capacity	average tested capacity	average efficiency (avg. tested capacity vs. max. theoretical capacity)
50x50	624	70	260	140	22%
100x100	2499	240	1130	590	23%
200x200	9999	500	4300	2080	20%
300x300	22499	1000	7400	4300	19%
400x400	39999	2500	13000	6960	17%
500x500	62499	3500	19500	10240	16%
600x600	89999	4000	27000	13680	15%
700x700	122499	5000	36000	17800	14%
800x800	159999	6000	44000	21840	13%
900x900	202499	5000	55000	25500	12%
1000x1000	249999	5000	65000	29000	11%

And here are the same values in graphical formal.

StegosauR storage test

These tests were carried out with machine number 2 in table 1. The processing time increases very linearly with message length.

StegosauR encoding times

So, well, StegosauR is not a terribly efficient application. I am sure that the random number generator could be tweaked to produce a higher number of unique combinations, but at this stage I am quite happy with the overall results.

Visual inspection

As illustrated in the previous sections, StegosauR encodes messages by slightly altering the color of multiple pixels within a digital image. But is this color change visible? Hiding some text in a perfectly white image does result in some visible artifacts. In this .tiff image it’s clearly visible that some pixels are noticeably darker than others. Before encryption, the image was completely white, so it’s easy to spot which pixels contain hidden information. Below is an attempt at showing this coloring effect using a screenshot of the original .tiff. The arrows point at some of the slightly darker pixels (admittedly, it is much clearer on the zoomed-in original .tiff).

StegosauR encoded color change

Using any non-monochromatic image, like a normal photograph, helps to hide these anomalous pixels among the rest of the noise. Probably, a dedicated piece of software would still be able to detect weird noise patterns and separate images that contain hidden messages from those that don’t, especially in the case of very long messages that alter a high number of pixels.

StegosauR offers an option to include random noise across all pixels prior to encryption. It is just necessary to add noise = TRUE while using encosaur(). This functionality is quite aggressive though, and can cause some very visible artifacts. It can also slow down the encoding process quite a lot since it cycles through all pixels and all channels.

Known Issues

In theory, StegosauR can process any Unicode character. In practice, it depends on the capabilities of the underlying package Unicode. Let’s take “Ä” as an example. This character can be converted in its corresponding Unicode value and back without any issue.

library(Unicode)

char <- "Ä"

#covert to UTF-8
char.utf8 <- iconv(char, Encoding(char), "UTF-8")

#convert to Unicode
as.u_char(utf8ToInt(char.utf8))

## [1] U+00C4

#back to the original character
intToUtf8(as.u_char(as.u_char(utf8ToInt(char.utf8))))

## [1] "Ä"

The same conversion might not work with more exotic symbols. The only case I could spot so far involves character U+0092, which is sometimes used as an apostrophe. Its conversion into Unicode works well, but the step from Unicode to the original character returns just its hexadecimal value.

library(Unicode)

char <- "’"

#covert to UTF-8
char.utf8 <- iconv(char, Encoding(char), "UTF-8")

#convert to Unicode
as.u_char(utf8ToInt(char.utf8))

## [1] U+2019

#back to the original character
intToUtf8(as.u_char(as.u_char(utf8ToInt(char.utf8))))

## [1] "’"

In this situation, it is just better to remove the problematic characters or replace them with similar-looking ones.

char <- "’"

#replace with gsub
gsub("’", "'", char)

## [1] "'"

Malicious content

All physical and digital devices able to carry information should always be treated with due caution. Do not open suspicious email attachments. Do not plug USB drives found on the ground into your computer. Images containing hidden messages are no exception. In the unlikely event that you receive an image encoded with StegosauR with instructions on how to decode it, don’t do it unless you trust the source.

It is possible to hide R code (or any code, probably) in an image using StegosauR, although so far I did not find a way to execute that code immediately upon decryption. External help, such as running decosaur()inside an eval(parse()) sequence seems to be needed to achieve this purpose. I can’t exclude that more knowledgeable people might still find a way to skip this step though.