How Stealth's Encrypted Search Works

Alexandre Helle
January 10, 2023

Creating a highly secure, scalable and flexible encrypted search isn’t easy. In our last blog post, we learned exactly why the world of cryptography has struggled for so long to create a high quality, secure encrypted search.

Cyborg is tackling the encrypted search problem head-on by leveraging a powerful concept in cryptography: Forward Privacy. Here’s how we applied it to Stealth to deliver the first searchable, end-to-end encrypted cloud storage platform.

PART I: Indexing

In order to make data searchable, it first needs to be indexed. An index can be as simple as a book’s table of contents, or as complex as a cryptographic index like the one uses. While they’re different in complexity, all indexes serve the same purpose: to help quickly access and retrieve data from a large collection.

Parsing & Normalization

The first step of indexing is parsing. Files come in all sorts of formats; PDF, Word Documents, etc., so when we parse data, we strip away its format and extract what's left, a file’s plaintext. To do this, we use tools called parsers, which remove the formatting and metadata from a file and leave the bare plaintext— the part we’re interested in searching.

Once we have the plaintext, we normalize it, or transform it into a format that’s usable by our search engine. This process consists of tokenization (splitting text into words or tokens), stemming (transforming words like “eating” or “eats” into “eat”), and a few other technical steps which we’ll skip for now.

In the end, we’re left with a list of tokens found in the text we’re indexing and some information about each of these tokens (such as frequency, position, etc.).

Cryptographic Hashing

With the tokens from the previous steps, we can move on to a process called hashing. A hash is a mathematical function that converts one value to another. When we hash data, we mask the original data with another value and perform a mathematical operation on it. You can think of a hash value like a fingerprint. In the same way an individual's fingerprint is unique, so too are hash values, except rather than being a swirly design imprinted on our fingers, a hash value essentially consists of a very large number. Hashing is also deterministic, meaning that if you hash the same data on separate occasions, it will always provide the same hash value.

When the size of a hash value is large and random enough, it is impossible to learn anything about the data itself. This is known as a cryptographic hash function. Stealth uses a series of them when hashing data. These methods create hash values for original input data, however, it’s impossible given current technology to try and retrieve the input data only knowing the output hash. It would be like being asked to figure out exactly how much and in what order butter, flour and salt went into that delicious croissant you had for breakfast. You know the output value, the croissant, but it would be almost impossible to reverse-engineer the exact one you just ate even if you had an idea of what the inputs might be.

Hash values can exist for individual files, a group of files or a complete hard drive. When a file is uploaded to our server, different one-way hash values are generated based on tokens found in that file. Those cryptographic hashes are then uploaded to a Stealth server.

But what if two or more different files contain similar keywords? Wouldn’t that make it difficult to keep hashes organized and unique?

When a previously hashed keyword is found, we rotate the hash value so a keyword found in two different files will produce two hash values that cannot be linked. This rotation uses the user’s own search key, which ensures that no one— not Cyborg, another user or a hacker— can guess which hash values correspond to which keywords, nor be able to link hash values together.

Now imagine thousands and thousands of files all containing similar keywords, wouldn’t keeping track of all those hash values get messy?

To keep things organized, we generate a map to keep track of all previously encountered keywords. The map is end-to-end encrypted with a key known only by the user, then uploaded to Cyborg servers for safe storage. With end-to-end encryption and one-way hashing, there is no way for us to know what your unique keywords are, what your files contain or how they are related.

PART II: Searching

So let’s say you’re a user trying to find a file in Stealth. You type in your search query and, using your search key, Stealth generates something called a trapdoor. Remember how cryptographic hashing makes it impossible to get back to your data? Well a trapdoor acts as a hidden method to quickly regain access to your data while continuing to make it impossible for others to find the trapdoor. This is secure because only you, the user, can generate the trapdoor, since it uses your search key. Similar to how end-to-end encryption works, your search key is protected with a master key derived from your password, so no one but you can access it.

With the generated trapdoor, Cyborg servers can return the encrypted File ID’s which match the query you submitted. Your Stealth app decrypts, scores and ranks these in order to finally present the search results to you.

Further Reading

For further technical information on the cryptography underlying Stealth's security promises, there are several papers on the topic. To learn more about the trapdoor function at the heart of our search engine, we suggest reading "Delegatable Pseudorandom Functions and Applications".

CONCLUSION

With Stealth's encrypted search, at no point does your data need to be put at risk with decryption in order to simply retrieve your data.

Our technology promises that:

  • Your encrypted data will always be searchable, but only by you.
  • Your encrypted data will not be revealed, or leaked by the cryptographic indexes.
  • Your search queries are also encrypted, so no one can know what you’re searching for.
  • Your search results are encrypted, so no one can know the results.

Although Stealth servers are storing your data, we’ve built our technology so that nobody, not even us, can see the contents of your data. Your data stays safe and also easy to access.

Try Stealth for yourself and get 25GB of free, end-to-end encrypted & searchable storage today!