Configure Data Masking
What is data masking?
Data masking is the process of obfuscating data in client systems, so it is no longer recognizable as PII (personally identifiable information.)
For example, if a customer requests that your remove all information associated with their email,
test@example.com
, you might choose to "mask" that email with a random string, xgoi4301nkyi79fjfdopvyjc5lnbr9
, and
their associated address with another random string 2ab6jghdg37uhkaz3hpyavpss1dvg2
.
It's important to remember that masking does not equal anonymization. Since records are not deleted, a masked dataset is (at best) pseudonymized in most cases, and (at worst) may still be identifiable if the masking is reversible or easy to predict.
In fidesops, your options to pseudonymize data are captured in "masking strategies". Fidesops supports a wide variety of masking strategies for different purposes when used directly as an API including HMAC, Hash, AES encryption, string rewrite, random string rewrite, and null rewrite.
Why mask instead of delete?
Deleting customer data may involve entirely deleting a whole record (all attributes of the entity) or permanent and irreversible anonymization of the record by updating specific fields within a record with masked values.
Using a masking strategy instead of straight deletion to obscure PII helps ensure referential integrity in your
database. For example, you might have an orders
table with a foreign key to user
without cascade delete. Say you first
deleted a user with email test@example.com
without addressing their orders, you could potentially
have lingering orphans in the orders
table. Using masking as a "soft delete" might be a safer strategy
depending on how your tables are defined.
In order to ensure referential integrity is retained, any values that represent foreign keys must be consistently updated with the same masked values across all sources.
Other reasons to mask instead of delete include legal requirements that have you retain certain data for a certain length of time.
Using fidesops as a masking service
If you just want to use fidesops as a masking service, you can send a PUT
request to the masking endpoint with the
value(s) you'd like pseudonymized. This endpoint is also useful for getting a feel of how the different masking strategies work.
Masking example
PUT /masking/mask | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Response 200 OK | |
---|---|
1 2 3 4 |
|
The email has been replaced with a random string of 20 characters, while still preserving that the value is an email.
See the masking values API on how to use fidesops to as a masking service.
Specifying Multiple Strategies
If you would like multiple strategies to be applied in sequence when using fides as a masking service,
supply a list of strategies under "strategy". Each strategy will be applied across all values in order.
In this example, the random_string_rewrite
strategy will be run on both values first, and then the hash
masking strategy
will be run on both values output from random_string_rewrite
.
PUT /masking/mask | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Configuration
Erasure requests will mask data with the chosen masking strategy.
To configure a specific masking strategy to be used for a Policy, you will create an erasure
rule
that captures that strategy for the Policy.
PATCH /policy/policy_key/rule | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Supported masking strategies
Null rewrite
Masks the input value with a null value.
strategy
: null_rewrite
No config needed.
String rewrite
Masks the input value with a default string value.
strategy
: string_rewrite
configuration
:
rewrite_value
:str
that will replace input valuesformat_preservation
(optional):Dict
with the following key/vals:suffix
:str
that specifies suffix to append to masked value
Hash
Masks the data by hashing the input before returning it. The hash is deterministic such that the same input will return the same output within the context of the same privacy request. This is not the case when the masking service is called as a standalone service, outside the context of a privacy request.
strategy
: hash
configuration
:
algorithm
(optional):str
that specifies Hash masking algorithm. Options includeSHA-512
orSHA_256
. Default =SHA_256
format_preservation
(optional):Dict
with the following key/vals:suffix
:str
that specifies suffix to append to masked value
Random string rewrite
Masks the input value with a random string of a specified length.
strategy
: random_string_rewrite
configuration
:
length
(optional):int
that specifies length of randomly generated string. Default =30
format_preservation
(optional):Dict
with the following key/vals:suffix
:str
that specifies suffix to append to masked value
AES encrypt
Masks the data using AES encryption before returning it. The AES encryption strategy is deterministic such that the same input will return the same output within the context of the same privacy request. This is not the case when the masking service is called as a standalone service, outside the context of a privacy request.
strategy
: aes_encrypt
configuration
:
mode
(optional):str
that specifies AES encryption mode. Only supported option isGCM
. Default =GCM
format_preservation
(optional):Dict
with the following key/vals:suffix
:str
that specifies suffix to append to masked value
HMAC
Masks the data using HMAC before returning it. The HMAC encryption strategy is deterministic such that the same input will return the same output within the context of the same privacy request. This is not the case when the masking service is called as a standalone service, outside the context of a privacy request.
strategy
: hmac
configuration
:
algorithm
(optional):str
that specifies HMAC masking algorithm. Options includeSHA-512
orSHA_256
. Default =SHA_256
format_preservation
(optional):Dict
with the following key/vals:suffix
:str
that specifies suffix to append to masked value
See the Policy guide for more detailed instructions on creating Policies and Rules.
Getting masking options
Issue a GET request to /api/v1/masking/strategy
to preview the different masking
strategies available, along with their configuration options.
Extensibility
In fidesops, masking strategies are all built on top of an abstract base class - MaskingStrategy
.
MaskingStrategy
has four methods - mask
, secrets_required
, get_description
, and data_type_supported
. For more detail on these
methods, visit the class in the fidesops repository. For now, we will focus on the implementation of
RandomStringRewriteMaskingStrategy
below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
The mask
method will be called with the list of values to be masked and the masked values will be the output. In this case, we want to replace the supplied values with a random mixture of ascii lowercase letters and digits of the
specified length. If format preservation is specified, for example, we still want to know that an email was an email,
we might tack on an email-like suffix.
Note the arguments to the init method - there is a field configuration of type RandomStringMaskingConfiguration
.
This is the configuration for the masking strategy. It is used to house the options specified by the client as well as
any defaults that should be applied in their absence. All configuration classes extend from the
MaskingConfiguration
class.
Integrate the masking strategy factory
In order to leverage an implemented masking strategy, the MaskingStrategy
subclass must be imported into the application runtime. Also, the MaskingStrategy
class must define two class variables: name
, which is the unique, registered name that callers will use in their "masking_strategy"."strategy"
field to invoke the strategy; and configuration_model
, which references the configuration class used to parameterize the strategy.