> ## Documentation Index
> Fetch the complete documentation index at: https://developer.upsun.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Sanitizing databases: MariaDB and Drupal

> Sanitize MariaDB data in preview environments directly or by using Drush.

Databases of live websites often contain personally identifiable information (PII)
such as full names, mailing addresses, and phone numbers.
To ensure people reviewing code changes can't access information they shouldn't,
sanitize your databases of any PII that they may contain.

This example goes through the process for a MySQL database using Drupal.

## Before you begin

You need:

* A project with a [MySQL database](/docs/add-services/mysql).
* A command interface installed:
  * If doing it manually, the [Upsun CLI](/cli).
  * Otherwise, make sure [Drush](https://www.drush.org/latest/install/) is installed in your environment.

This guide is about sanitizing MySQL databases.

This guide doesn't address:

* Sanitizing NoSQL Databases (such as [MongoDB](/docs/add-services/mongodb))
* Input validation and input sanitization, which both help prevent security vulnerabilities

## Sanitize the database

Make sure that you only sanitize preview environments and **never** the production environment.
Otherwise you may lose most or even all of the relevant data stored in your database.

First, take a [database dump](/docs/add-services/mysql#exporting-data) of your preview environment.
This is just a safety precaution.
Production data isn't altered.
To get a database dump, run the following command:

`upsun db:dump -e DEVELOPMENT_ENVIRONMENT_NAME`.

<Tabs>
  <Tab title="Manually">
    Assumptions:

    * `users` is the table where all of your PII is stored in the `staging` development database.
    * `staging` is an exact copy of your production database.

    1. Connect to the `staging` database by running `upsun sql -e staging`.

    2. Display all fields from your `users` table, to select which ones need to be redacted. Run the following query:

       ```sql theme={null}
       MariaDB [main]> SELECT * FROM users;
       ```

       You see output like the following:

       ```sql theme={null}
       +----+------------+---------------+---------------------------+---------------+
       | ID | first_name | last_name     | user_email                | display_name  |
       +----+------------+---------------+---------------------------+---------------+
       |  1 | admin      | admin         | admin@yourcompany.com     | admin         |
       |  2 | john       | doe           | john.doe@gmail.com        | john          |
       |  3 | jane       | doe           | janedoe@ymail.com         | jane          |
       +----+------------+---------------+---------------------------+---------------+
       ```

    3. Change the fields where PII is contained with the [`UPDATE` statement](https://mariadb.com/kb/en/update/).
       For example, to change the display name of users with an email address not in your company's domain
       to a random value, run the following query:

       ```sql theme={null}
       UPDATE users
       SET display_name==substring(md5(display_name||'$PLATFORM_PROJECT_ENTROPY') for 8);
       WHERE email NOT LIKE '%@yourcompany%'
       ```

       Adapt and run that query for all fields that you need to sanitize.
       If you modify fields that you shouldn't alter, [you can restore them](/docs/environments/restore) from the dump you took in step 1.

       You can create a script to automate the sanitization process to be run automatically on each new deployment.
       Once you have a working script, add your script to sanitize the database to [a `deploy` hook](/docs/configure-apps/hooks/hooks-comparison#deploy-hook):

       ```yaml theme={null}
       applications:
           myapp:

               # ...

               hooks:
                   deploy: |

                       # ...

                       cd /app/public
                       if [ "$PLATFORM_ENVIRONMENT_TYPE" = production ]; then
                           # Do whatever you want on the production site.
                       else
                           # The sanitization of the database should happen here (since it's non-production)
                           sanitize_the_database.sh
                       fi
       ```
  </Tab>

  <Tab title="With Drupal and Drush">
    To sanitize your database and get rid of sensitive, live information, use the `drush sql:sanitize` command.
    Add your script to sanitize the database to [a `deploy` hook](/docs/configure-apps/hooks/hooks-comparison#deploy-hook)
    for preview environments:

    ```yaml .upsun/config.yaml theme={null}
    applications:
      myapp:
        hooks:
          deploy: |

            # ...

            cd /app/public
            if [ "$PLATFORM_ENVIRONMENT_TYPE" = production ]; then
              # Do whatever you want on the production site.
            else
              drush -y sql:sanitize
            fi
            drush -y updatedb
    ```

    More options are available.
    These are described in the [Drush documentation](https://www.drush.org/latest/commands/sql_sanitize/).

    To sanitize only on the initial deploy and not all future deploys,
    use [Drush state](https://www.drush.org/latest/commands/state_set/) as in the following example:

    ```yaml .upsun/config.yaml theme={null}
    applications:
      myapp:
        hooks:
          deploy: |

            # ...

            cd /app/public
            if [ "$PLATFORM_ENVIRONMENT_TYPE" = production ] || [ "$(drush state:get --format=string mymodule.sanitized)" != yes ]; then
              # Do whatever you want on the production site.
            else
              drush -y sql:sanitize
              drush state:set --input-format=string mymodule.sanitized yes
            fi
    ```
  </Tab>
</Tabs>

## What's next

You learned how to remove sensitive data from a database.

To replace sensitive data with other meaningful data, you can add a `faker` to the process.
A `faker` is a program that generates fake data that looks real.
Having meaningful PII-free data allows you to keep your current Q\&A, external reviews, and other processes.
To add a faker, adapt your sanitizing queries to replace each value that contains PII with a new value generated by the faker.

You might also want to make sure that you [implement input validation](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html#goals-of-input-validation).

If your database contains a lot of data, consider using the [`OPTIMIZE TABLE` statement](https://mariadb.com/kb/en/optimize-table/)
to reduce its size and help improve performance.
