StegosauR
conceals a message in a relatively
straightforward way. Let’s take a sample text:
txt <- "This is StegosauR"
Since messages can be stored only in numeric format, the first step consists in converting the given combination of letters and symbols into a sequence of numbers. The first portion of this first task is achieved by converting any element of the message into Unicode control codes. Then, our sample text becomes:
library(Unicode)
txt.u <- as.u_char(utf8ToInt(txt))
txt.u
## [1] U+0054 U+0068 U+0069 U+0073 U+0020 U+0069 U+0073 U+0020 U+0053 U+0074 U+0065 U+0067 U+006F
## [14] U+0073 U+0061 U+0075 U+0052
The advantage of control codes, or code points, resides in the fact that they use a hexadecimal format. This means, that, while they still contain letters, there is now a very limited amount of them to deal with. Now we can remove the “U+” portion of each sequence, and then replace every component with a unique two-digit number.
#remove "U+"
txt.u <- sub("U\\+", "", txt.u)
#group everything under a single vector, then split into individual elements
txt.u <- paste(txt.u,collapse="")
txt.u <- strsplit(txt.u, "")[[1L]]
#define what will replace what and proceed with the substitution.
#code for unlist substitution taken from: https://stackoverflow.com/questions/7547597/dictionary-style-replace-multiple-items
map <- setNames(c("22", "23","24","25","32","33","34","35","42","43","44","45","52","53","54","55"),
c("0", "1", "2","3","4","5","6","7","8","9","A","B","C","D","E","F"))
#create empty container
v <- numeric()
#fill it with numbers
for (i in c(1:length(txt.u))) {
x <- as.numeric(map[unlist(txt.u[i])])
v <- cbind(v, x)
}
as.vector(v)
## [1] 22 22 33 32 22 22 34 42 22 22 34 43 22 22 35 25 22 22 24 22 22 22 34 43 22 22 35 25 22 22
## [31] 24 22 22 22 33 25 22 22 35 32 22 22 34 33 22 22 34 35 22 22 34 55 22 22 35 25 22 22 34 23
## [61] 22 22 35 33 22 22 33 24
Consistency is extremely important here: all elements must always have the same number of digits. All elements can be grouped into a single number or split into individual digits. Yet, knowing that all these elements have the same length allows us to rebuilt the original message structure. For example
v.collapsed <- paste(as.vector(v), collapse="")
v.collapsed
## [1] "2222333222223442222234432222352522222422222234432222352522222422222233252222353222223433222234352222345522223525222234232222353322223324"
This long integer can be converted back to its original message because we know that every hexadecimal element is composed by two digits, and every control code has four elements.
#split in groups of two digits
v.rebuilt <- numeric()
for (w in seq(2,nchar(v.collapsed),2)) {
y <- substr(v.collapsed,w-1,w)
v.rebuilt <- cbind(v.rebuilt, y)
}
as.vector(v.rebuilt)
## [1] "22" "22" "33" "32" "22" "22" "34" "42" "22" "22" "34" "43" "22" "22" "35" "25" "22" "22"
## [19] "24" "22" "22" "22" "34" "43" "22" "22" "35" "25" "22" "22" "24" "22" "22" "22" "33" "25"
## [37] "22" "22" "35" "32" "22" "22" "34" "33" "22" "22" "34" "35" "22" "22" "34" "55" "22" "22"
## [55] "35" "25" "22" "22" "34" "23" "22" "22" "35" "33" "22" "22" "33" "24"
#produce a data frame with four columns
df.v <- as.data.frame(matrix(v.rebuilt, nrow = length(v.rebuilt)/4, ncol = 4, byrow=TRUE))
df.v
## V1 V2 V3 V4
## 1 22 22 33 32
## 2 22 22 34 42
## 3 22 22 34 43
## 4 22 22 35 25
## 5 22 22 24 22
## 6 22 22 34 43
## 7 22 22 35 25
## 8 22 22 24 22
## 9 22 22 33 25
## 10 22 22 35 32
## 11 22 22 34 33
## 12 22 22 34 35
## 13 22 22 34 55
## 14 22 22 35 25
## 15 22 22 34 23
## 16 22 22 35 33
## 17 22 22 33 24
#substitute the two-digit elements with the original hexadecimal values
map <- setNames(c("0", "1", "2","3","4","5","6","7","8","9","A","B","C","D","E","F"),
c("22", "23","24","25","32","33","34","35","42","43","44","45","52","53","54","55"))
df.v[] <- map[as.vector(unlist(df.v))]
df.v
## V1 V2 V3 V4
## 1 0 0 5 4
## 2 0 0 6 8
## 3 0 0 6 9
## 4 0 0 7 3
## 5 0 0 2 0
## 6 0 0 6 9
## 7 0 0 7 3
## 8 0 0 2 0
## 9 0 0 5 3
## 10 0 0 7 4
## 11 0 0 6 5
## 12 0 0 6 7
## 13 0 0 6 F
## 14 0 0 7 3
## 15 0 0 6 1
## 16 0 0 7 5
## 17 0 0 5 2
#convert the hexadecimal values to Unicode control codes
codes <- as.u_char(paste(df.v$V1, df.v$V2, df.v$V3, df.v$V4, sep=""))
codes
## [1] U+0054 U+0068 U+0069 U+0073 U+0020 U+0069 U+0073 U+0020 U+0053 U+0074 U+0065 U+0067 U+006F
## [14] U+0073 U+0061 U+0075 U+0052
#restore the original message
paste(sapply(codes, intToUtf8), collapse="")
## [1] "This is StegosauR"
This is just a proof of concept. StegosauR
works a bit
differently. For instance, Unicode includes 1,114,112 potential code
points, ranging from U+0000 to U+10FFFF. This means that converting a
text to control points could result in various combinations of four-,
five- and six-digit control codes. Since consistency is key, StegosauR
converts all code points into a six-digit format. Then every element of
the six-digit code is converted into two-digit values as shown above.
This means that eventually each character in our message is represented
by a twelve-digit number. Just as an example:
letter <- "R"
letter.u <- as.u_char(utf8ToInt(letter))
letter.u <- sub("U\\+", "", letter.u)
letter.u
## [1] "0052"
map <- setNames(c("22", "23","24","25","32","33","34","35","42","43","44","45","52","53","54","55"),
c("0", "1", "2","3","4","5","6","7","8","9","A","B","C","D","E","F"))
letter.u <- strsplit(letter.u, "")[[1L]]
letter.u <- map[letter.u]
letter.u <- paste(letter.u,collapse="")
letter.u <- as.numeric(letter.u) + 999900000000
letter.u
## [1] 999922223324
nchar(letter.u)
## [1] 12