Before we dive in, I want to talk about some basic concepts about storing sensitive data. Two ways to obscure data at rest are encryption and hashing. Encryption and hashing both obfuscate your data, but the one you choose depends on how you will need to access the data. We will talk about that in a bit. If you know all about encryption, hashing, and just want to see its application to databases, skip down to Applying Encryption and Hashing to Databases. If you would like a refresher, keep reading.
Encryption versus HashingFirst, we should talk about what encryption and hashing are and the differences between them. Encryption is applying an algorithm to a message in such a way that only authorized recipients can read the message. Encryption is a reversible process, meaning that you can derive the plaintext message from the encrypted message (ciphertext) as long as you have the key. There are two main types of encryption: symmetric and asymmetric.
Symmetric encryption uses the same key to encrypt and decrypt a message. Let's take a simple shift (or Caesar) cipher. To encrypt data with a shift cipher, you would choose a number between 1 and 26 (or however large your alphabet is) and use that number to "shift" the letters of the alphabet up or down. For example, if our number is 13, and we want to encrypt the word 'apple', we would turn 'apple' into numbers based on the letters' positions in the alphabet: 1, 16, 16, 12, 5. Then we add 13 to those numbers. If a number is greater than 26 (the number of letters in the alphabet), we loop around. So for example, 14 would become 1. In our example, 1, 16, 16, 12, 5 would become 14, 3, 3, 25, 18. If we translate that back into letters, we get 'nccyr'. This is the text we would send to the person we want to communicate with. We would also have to let them know that our 'key' is 13. That way, they can do the reverse operation on our ciphertext ('nccyr') to get the plaintext ('apple'). By the way, a shift cipher with a key of 13 is also called ROT-13.
Obviously, this is not a very secure way of transmitting messages because it is easy to break. If you did not know that it was encrypted using a shift cipher, you could use frequency analysis with enough cipher texts to figure out the key. For example, 'e' is the most common letter in the English language, so if we see the same letter a lot, we can assume that is the cipher text for 'e', figure out that letter's distance from 'e' and try that as a key.
On the other hand, asymmetric encryption (also called public-key cryptography) is where a person's public key is published so that anyone can encrypt a message with that public key. However, the only person that can read the message is the person with the corresponding private key. When I say corresponding, I mean that the public key and private key are mathematically related. Ideally, they are related in such a way that deriving one from the other is computationally unfeasible.
Hashing is similar to encryption in that it mathematically transforms a given plaintext into ciphertext. The difference is that hashing is not reversible like encryption is. If I give you the hash for a given piece of plaintext, you will not be able to recover that plaintext (unless it is a terrible hashing algorithm). Common hashing algorithms are MD5 (Message Digest 5), SHA-1 (Secure Hash Algorithm 1), and SHA-256 (part of SHA-2, SHA-256 produces a 256 bit (32 byte) digest). A digest is the product of putting a piece of plaintext through the algorithm. Hashing is great for verification of a transmitted piece of data. If you hash the same plaintext twice with the same algorithm, the digests will be the same.
You could figure out the plaintext from an arbitrary hash by generating all possible iterations of plaintext until you find the hash that matches. This sounds hard (and it is), but depending on the nature of the plaintext, you might have some shortcuts. For example, if you have a hash of a password, and you know the password meets certain criteria (like it has to be a certain number of characters and only contain certain characters), then that makes the number of hashes you need to generate smaller.
When Should I Use Encryption Instead of Hashing?Encryption is best when you have to transmit a message securely via an insecure channel (like the Internet) and you want to be able to get the message back on the other side. For example, an online merchant would encrypt your credit card number to transmit it across the Internet because the merchant needs to use the plaintext number to complete the sale. If the merchant hashed your credit card number, he would not be able to derive the plaintext credit card number from the digest. In order to find out the number, the merchant would have to generate every possible credit card number and check for the hash. Is it possible? Sure. Is this the best way to do it? Nope.
When Should I Use Hashing Instead of Encryption?Hashing shines when you want to verify that two messages are the same without the need to recover the original message. A good use for hashing is storing passwords. When you type your password into a web page, the web page just needs to verify that the password you supplied matches what it knows to be your password. Instead of storing the password in plaintext, it could store a hash of your password and compare that against a hash that you supply. So, instead of transmitting your password, all you have to do is transmit the hash of it. That is much more secure!
SaltingRemember that a determined attacker could compute the entire population of plaintexts in order to find the corresponding hash. Let's say an attacker got a hold of your database of hashed passwords. From the length of each hash, he is able to determine that you are using a SHA-256 hash. Because he has done this before, he has a list of commonly used passwords and their corresponding SHA-256 hashes. He will probably find one or two hits in your database, and those accounts would then be compromised.
Is there a way to make it harder for this attacker? Yes, by using a salt. A salt is a few additional bytes that augments the message you want to hash. Ideally, those bytes are not easily guessed ('1234' is not a good salt!). Typically, the salt is concatenated to the password, and that combination is hashed. For example, if our salt is 'salty!' and the password is 'password', we would hash 'passwordsalty!' before sending it to the server. The salt needs to be stored somewhere because it needs to be used to verify the hash. If it is used once and then thrown away, we will not be able to verify and hashes made with that salt. Instead of storing the salt, you could derive it from some other piece of information available to you, but be careful not to make it easily guessed. If an attacker can guess your salt, it is no longer effective. Ideally, your salt would vary per account or record.
You could also salt a message before encrypting it, and the idea is the same.
Applying Encryption and Hashing to DatabasesMost modern databases provide an option to encrypt or hash individual fields in a record before you store them in the database so that the plaintext is never stored. Choosing what to encrypt or hash depends on the value of the information you are protecting and could be dictated by regulation or industry standard. As a general rule of thumb, I would hash anything you only need to verify (like passwords) and encrypt anything you need to retrieve later (like credit card data, social security numbers, things like that). There are potential performance (you have to do more operations on the data before storing or retrieving it) and complexity (you have to encrypt / decrypt / hash records before storing or retrieving them) penalties for implementing this in your database, but if security of the data is important, accepting those penalties would be worth it.
Application Versus Database EncyptionThere is a question of whether you should let the database handle the encryption or hashing or if the application passing the information to the database should handle it. The answer is it depends. The application may have more ways of storing the data. Most databases are somewhat limited. I will show you an example with MariaDB, and it uses SHA2 and AES (it also uses other algorithms, but they are not as secure). If you wanted AES that is stronger than what the database provides or you want more control over it (like being able to specify certain parameters of the encryption), the application would be the way to go. Also, you have to worry about key storage if you encrypt using the database. Since you have to supply the key every time you store, update, or retrieve a record, that key needs to be accessible to the database application somehow. The application might already be built for that and have a more robust solution. It depends on how much you trust your application to have handled the cryptography correctly. If you are writing a simple application and do not want to (or cannot) introduce another application to handle the cryptography, using the database to do it is definitely better than nothing.
Filesystem EncryptionAnother layer of encryption you might want to introduce is filesystem encryption. This is helpful if you are worried about someone stealing the data in an offline attack (like physically taking the hard drive storing the database from the machine). Filesystem encryption will not help in an online attack because the database will need to be decrypted for the database software to process it.
Some databases offer complete table-level encryption. Microsoft is one, and MariaDB is another.
We will set this up for our MariaDB installation. For our examples, we will be using MariaDB 10.1.10 on CentOS 7. We will use the file_key_management plugin to enable table level encryption. We need to set up a few things to enable the plugin on the server:
- The key used to encrypt the data (we need to generate this)
- The passphrase to decrypt the key file
- The configuration options in the server's configuration file.
To set up the encryption key, we will use OpenSSL. MariaDB requires that we use 128, 192, or 256 bit AES key. We will use 256-bit keys because they are the most secure option supported:
openssl enc -aes-256-cbc -k SuperSecretPa$$word -md sha1 -P
- enc tells OpenSSL to encypt
- -aes-256-cbc is the cipher we want to use
- -k specifies the password to use
- -md specifies the hash to use with AES (we will use SHA1 for this example)
- -P tells OpenSSL to print the initialization vector (IV) and the key then exit (we will need these for the encryption we want to do in the database)
Let's see what we get:
MariaDB expects the key and IV in this format:
So in our example above, we would create a file that looks like this:
We will save this file as key.txt. You can save it as whatever you like.
Next, we will encrypt this key with a passphrase.
Encrypting the Key
We will use openssl again to encrypt the key. MariaDB requires that we use AES CBC with SHA1 to encrypt the key:
openssl enc -aes-256-cbc -md sha1 -k AnotherSecretPa$$word -in key.txt -out /etc/mgt.ek
- This time we are using a different password to encrypt the key file than the key we used to make the encryption key.
- As input, we will use the key.txt file we made above
- We will output the encrypted key file to /etc/mgt.ek Ideally, the key would be on a different machine so that if this machine was compromised, the attacker would have to compromise the machine with the key on it as well. Make sure that you do not create this file and move it later. I could not get it to work if I created the file then moved it into the directory.
sudo chown mysql:mysql /etc/mgt.ek sudo chmod 600 /etc/mgt.ek
At this point, you should get rid of the key.txt that contains your plaintext encryption key. We do not need it anymore, and we would not want it to fall into the wrong hands.
It is best to put the passphrase to decrypt the key in another file (instead of the server's configuration file). We will put it in /etc/pass.pp. It would also be good to put this file on a different server as we talked about above for the key.
chmod 600 /etc/pass.pp
Now we have to set up MariaDB to use encryption.
Setting up the MariaDB Configuration File
On CentOS 7, the MariaDB server configuration file is in /etc/my.cnf.d/server.cnf. Your distribution might put the file somewhere else. We will add the following to the [mariadb-10.1] section of the file. Some of the commands we will use only work on more recent versions of MariaDB (10.1.7 or later).
Save that file, and restart MariaDB:
[mariadb-10.1] # Encrypt aria tables if you are # using that database engine aria-encrypt-tables # Encrypt binary logs (records of changes to databases) # Requires MariaDB >= 10.1.7 encrypt-binlog # Encrypt innodb logs innodb-encrypt-log=ON # Force encryption of newly created tables innodb-encrypt-tables=FORCE # Use decryption key 1 by default (the key we made) innodb-default-encryption-key-id=1 # Encrypt temporary tables that MariaDB creates encrypt-tmp-disk-tables # Encrypt any temporary files that MariaDB creates encrypt-tmp-files # Load the file_key_management_plugin # This plugin provides encryption key management # functions. plugin-dir=/usr/lib64/mysql/plugin plugin-load-add=file_key_management_plugin.so file-key-management # Choose an encryption algorithm # MariaDB supports two: AES CBC and AES CTR # AES CBC is the default. AES CTR is recommended # because CBC is vulnerable to a "Padding Oracle" # attack file_key_management_encryption_algorithm=aes-ctr # If you made multiple keys, you can enable # these options to have the keys rotate # Threads specifies how many threads to use # for encryption. The value to choose depends # on how your machine is configured. # # The key age is defined in seconds. In this # example, the keys will be rotated every # 3600 seconds (60 minutes) innodb-encryption-threads=4 innodb-encryption-rotate-key-age=3600 # This is the path to the keys that will be used for # encryption file_key_management_filename=/etc/mgt.ek # This is the path to the passphrase used to encrypt # the key (MariaDB needs this to decrypt the key) file_key_management_filekey=FILE:/etc/pass.pp
sudo systemctl restart mariadb.service
Let's make a new database and table. The table will have the following fields:
- First Name
- Last Name
- Username (e-mail address)
- Credit card number
- Credit card expiration date
- Credit card CVV2 code
Now, we will put in some test data:
We should see if the database is truly encrypted. The database file lives in /var/lib/mysql/mydata (since mydata is the name we gave the database). If we try to find the strings we inserted into the database:
As you can see, we could not find some of the plaintext strings we inserted into the database. If you really wanted to make sure, you could copy the database files to another machine, and try to load them without the key.
If you would like more granular encryption, or an additional layer of protection, field-based obfuscation would be a good choice. You can certainly do both if protecting the data warrants it.
First, we should figure out what we want to encrypt, what we want to hash, and what can stay plaintext. Remember that we had the following fields:
- First Name
- Last Name
- Username (e-mail address)
- Credit card number
- Credit card expiration date
- Credit card CVV2 code
Now that we know what we want to do, let's work on how to get it done.
Encryption (AES_ENCRYPT / AES_DECRYPT)There are a few encryption and decryption functions in MariaDB, but the ones we will use are AES because they are the most secure (the others like DES, are not very secure). The documentation for AES_ENCRYPT is here. 128-bit AES is used by default, and according to the documentation, 256-bit is possible if you modify the source. For our purposes, 128-bit is fine. My personal recommendation is that if you want something stronger but do not want to recompile MariaDB, you should use something in your application that takes care of the encryption.
The basic syntax for AES_ENCRYPT is:
This brings up an important consideration. Key storage can be a tough problem. The key has to be accessible to the database so that it can encrypt and decrypt the data. If you store it in the database (say in a separate table), all of your encryption keys would be compromised in the event of a compromise of the database (or the database server).
So if you have to store the keys with the database (i.e. you are not using some sort of key management solution at the application level), we can try to make it a bit more secure. First, we will not use the same encryption key per record. This is because if that one key is compromised, all of the records are compromised. In our example above, every account would get its own encryption key. So how will we do that? We will derive the encryption key from certain account details. It is either that or generate the keys randomly and store them somewhere. I figure deriving the key is a little better since the attacker would need to work to figure out how we do it. You might argue that this is security through obscurity, but I am not relying on how I am generating the key to provide additional security. I want the attacker to have to work for the keys to the castle a bit. If the database is compromised, and the database has any part in generating the keys, the keys will be compromised.
For our example, we will use a SHA2 (SHA-256) digest of the last name concatenated with the e-mail address as the key. This choice is somewhat arbitrary, but I wanted to make the key a function of more than one field to add just a bit more variability.
For our Bob Smith example above, the key would be generated like so:
SHA2(CONCAT('Smith', 'email@example.com'), 256)
In addition, we will use a random one-time use string each time we encrypt. So if the record changes, we will use a new one-time use string. This string is called a nonce (a number used once). A nonce does not have to be a number. The reason for this nonce is to introduce some variability into each instance of our encryption. We will add this nonce to the key we created above when we encrypt, and we will store the nonce with the record we are encrypting (we could store it in another table, but that is not really any more secure than storing it with the record).
We will use OpenSSL to generate a random string. You can use whatever random source you like:
openssl rand -hex 32
which gives us:
We will take the first 16 characters (DEBEE0841E1C0BCE).
Since we will be storing this nonce with the encrypted records, we will have to alter our table to hold these values. The records for each of the three encrypted records will look like this:
nonce|HEX of <Encrypted Value>
We will store the hexadecimal equivalent of the encrypted value since that is easier to process and dump if we need to. The nonce is 16 bytes, the hex string is 2 times (the hex string is 2 bytes for each byte in the string we apply HEX to) the encrypted string. The encrypted string is 16 x ((length_of_string / 16) + 1) according to the documentation, and the separator ('|') adds one character.
So, for each of the fields we want to encrypt:
- Credit Card Number: 81 bytes
- Nonce is 16 bytes
- Separator is 1 byte
- Encrypted string is 16 x ((16 / 16) + 1) = 32 bytes
- HEX of encrypted string is 64 bytes
- Date: 49 bytes
- Nonce is 16 bytes
- Separator is 1 byte
- Encrypted string is 16 x ((7 / 16 + 1) = 16 bytes
- HEX of encrypted string is 32 bytes
- CVV2 code: 49 bytes
- Nonce is 16 bytes
- Separator is 1 bytes
- Encrypted string is 16 x ((3 / 16) + 1) = 16 bytes
- HEX of encrypted string is 32 bytes
ALTER TABLE accounts MODIFY COLUMN cc VARCHAR(81); ALTER TABLE accounts MODIFY COLUMN ccDate VARCHAR(49); ALTER TABLE accounts MODIFY COLUMN ccCVV VARCHAR(49);
Now let's try to add in our encrypted data. For this account, the password is SHA2(CONCAT('Smith', 'firstname.lastname@example.org'), 256), and the nonce for each of the fields is:
- Credit Card Number: DEBEE0841E1C0BCE
- Date: 53305B21F8A5B16D
- CVV2 Code: BD09A1CDD7880E43
UPDATE accounts SET cc=CONCAT('DEBEE0841E1C0BCE', '|', HEX(AES_ENCRYPT('1234567890123456', CONCAT(SHA2(CONCAT('Smith', 'email@example.com'), 256), 'DEBEE0841E1C0BCE')))) WHERE id=1; UPDATE accounts SET ccDate=CONCAT('53305B21F8A5B16D', '|', HEX(AES_ENCRYPT('11/2017', CONCAT(SHA2(CONCAT('Smith', 'firstname.lastname@example.org'), 256), '53305B21F8A5B16D')))) WHERE id=1; UPDATE accounts SET ccCVV=CONCAT('BD09A1CDD7880E43', '|', HEX(AES_ENCRYPT('444', CONCAT(SHA2(CONCAT('Smith', 'email@example.com'), 256), 'BD09A1CDD7880E43')))) WHERE id=1;
It might be a little hard to read, but you can see that the database is now storing our encrypted values:
That is great, but it is useless if we cannot get the data back out. So to decrypt, we use AES_DECRYPT. The syntax is as you would expect. We have to give it the encrypted string and our password. We will try to decrypt the credit card number. First, let's set up some variables to make things more readable:
SELECT @ccField:=(SELECT cc FROM accounts WHERE id=1); SELECT @nonce:=(SUBSTRING_INDEX(@ccField, '|', 1)); SELECT @encryptedCC:=(SUBSTRING_INDEX(@ccField, '|', -1)); SELECT @decryptKey:=(SHA2(CONCAT('Smith', 'firstname.lastname@example.org'), 256));
Remember that the stored credit card number was the hex of the encrypted credit card number where the key was the password (which we called @decryptKey) combined with the nonce. So to decrypt, all we have to do is:
SELECT AES_DECRYPT(UNHEX(@encryptedCC), CONCAT(@decryptKey, @nonce));
And there it is:
The other fields we encrypted work similarly.
Hashing (using SHA2)As we discussed above, we will hash the password field using SHA2. The documentation for SHA2 is here. We will use a similar method to hash the password that we used to encrypt the sensitive fields. We will salt the hash. This salt should vary per record (or account in this case). We will use the same hashing method we used above. Before we get down to it, we need to modify the schema of the database to accommodate the larger passwords we are going to store.
The salt will be 16 characters as it was before, we will have a separator character, and then the hash. We will be using SHA-2 256 which is 256 / 4 (bits per character) = 64 bytes. So we will need 81 (16 byte salt + 1 byte separator + 64 byte hash) characters to store the password.
The SQL for that:
ALTER TABLE accounts MODIFY COLUMN password VARCHAR(81);
I generated a salt / nonce using the same method as we used above: 739D2724C0684691.
To create our password, the SQL will look like this:
CONCAT('739D2724C0684691', '|', SHA2(CONCAT('739D2724C0684691', 'GreatPassword'), 256))
Let's take a look at the result:
To verify it, all you need to do is take the SHA-2 256 sum of the password the user presents with the salt from the database put in front. Then, verify it is the same as the hash after the separator from the record in the database. There should be no reason to retrieve the plaintext of the password, so we are not comparing plaintext to plaintext but rather hash + salt to hash + salt.
Let's take a look at the final record:
This is much better than the plaintext we had before.
Final ThoughtsIn this post, we went through a simple example of applying encryption and hashing to data in a database. There are certain things you need to take into account when thinking about applying this to your data:
- Any applications that work with the data will have to be aware of the encryption and hashing. That means they need to use queries to encrypt and decrypt data and hash data, and these queries have to correspond to whatever method was used to encrypt or hash the data in the first place.
- This will have a performance impact on your application / database because of the overhead associated with encryption, decryption, and hash operations. There are always tradeoffs to applying security to a system. You have to decide how much risk you are willing to accept given the password and complexity "drawbacks."
- If the attacker compromises the server that the database is on and steals the actual database files, they will be encrypted. Again, all they have to do is read the configuration files, see that the encryption keys are stored on this machine and pull those too. The solution there would be to store the keys on another machine. This is only a mitigation though because if the machine that the keys are stored on is compromised, then it is game over. With that being said, the name of the game is building layers of security and not putting all of your eggs in one basket.
- A more common way of stealing data from from a database is to compromise the application that interfaces with the database and dumping the database records using SQL injection or something similar. If an attacker dumped this database, they would get e-mails and names, but the sensitive information would be relatively secure. If they accessed the records through the application that knows how the encryption works, the data would not be safe. Again, this goes back to layers. Not only does the database need to be secured, but the machines and processes that interact with the database need to be secured.
Please let me know your thoughts. Did I miss something? Is there a better way to do this? Thanks for reading!