Annotate Complex Fields
Fidesops can retrieve and mask data from complex objects and arrays in MongoDB, although this involves annotating your dataset files to let fidesops know about your complex data.
Declare an object field
To declare an object
field, you should define nested fields underneath that field. You can optionally
add the data_type: object
annotation, but the object type will be inferred by the presence of the nested fields. In the example below,
workplace_info
is an object field with two nested fields: employer
and position
.
Data categories cannot be specified at the object
level due to potential conflicts with nested fields. Instead,
annotate the scalar fields within the object field. Here, the workplace_info.position
field has data_category
user.job_title
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Reference a nested field
To define a relationship between a field on one collection and a nested field on another collection, use dot notation
in the fidesops_meta
references for as many levels are necessary.
In the example below, we might add a separate customer
collection that references
the nested field workplace_info.id
field in the customer_details
collection.
Under references, this field is denoted by <collection_name>.<field_name>.<sub_field>
name, or
customer_details.workplace_info.id
.
If we preferred, we could instead define this relationship on the customer_details.workplace_info.id
field itself,
with a direction of from
, with field mydatabase.customer.workplace_id
, and dataset mydatabase
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Declare an array field
There is not an official array
type per se, since arrays can store scalar values or objects. Instead, an array is represented
by a []
flag on a field.
Declare an array of scalar values
In this example, our mydatabase:customer
collection has a travel_identifiers
field that is an array of strings,
described by data_type: string[]
. An array of integers would be described by data_type: integer[]
.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Declare a nested array
In this example, our mydatabase:customer
collection has a nested workplace_info.direct_reports
array, that is an
array of strings. In other words, we have a workplace_info
object field, with sub-fields employer
, postion
, and direct_reports
,
where direct_reports
is an array.
We define direct_reports
as a subfield under workplace_info
, as well as add the data_type string[]
to direct_reports
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Declare an array of objects
In this example, our mydatabase:customer
collection has an emergency_contacts
field which is an array of objects, or
embedded documents, denoted by data_type: object[]
. Each object in the emergency_contacts
array can contain a
name
, relationship
, and phone
field.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Reference an array
Generally, reference an array field as if it is any other field. You cannot currently reference a specific index in an array field, but you can point a field to an array field, and we would search for matches within that array.
In this example, mydatabase:flights.plane
is an integer field that will be used to lookup records that match an integer
in the mydatabase:aircraft.planes
array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
In this more complicated example, a field in an array of objects is used to look up a different field in an array of
objects in another collection. Potentially multiple values from mydatabase:customer.comments.comment_id
can be used
to query for corresponding values in mydatabase:conversations.thread.comment
. Because this field is in an array of objects,
multiple matches may be found.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
Query an array
There are some assumptions made with array querying that may or may not fit with how your data is structured. If an array is an entrypoint into a collection (in other words, one collection references its array field), there is ambiguity around how the queries should be built - for example, AND versus OR, and whether only the matched indices or matched embedded documents within arrays should be considered.
Assumptions
1) If an array is the entry point into a node, we will search for corresponding matches across the entire array. You cannot specify a certain index.
2) Everything is basically an "OR" query. Data returned from multiple array fields will be flattened before being passed into the next collection.
1) For example, say Collection A returned values [1, 2, 3] and Collection B returned values [4, 5, 6]. Collection C has an array field that depends on both Collection A and Collection B. We search Collection C's array field to return any record that contains one of the values [1, 2, 3, 4, 5, 6] in the array.
3. By default, if an array field is an entry point to a node, only matching indices in that array are considered, both for access and erasures, as well as for subsequent queries on dependent collections where applicable.
1. For example, a query on Collection A only matched indices 0 and 1 in an array. Only the data located at indices 0 and 1 will be returned, and used to query data on dependent collection C.
2. This can be overridden by specifying return_all_elements: true
on an entrypoint array field, in which case, the query will return the entire array and/or mask the entire array.
4. Individual array elements are masked, not the entire array, e.g. ["MASKED", "MASKED", "MASKED"]
Example query traversal
This is an example traversal created from our test postgres_example
and mongo_test
datasets.
Multiple collections are point to or from complex objects and arrays. See the mongo_example_test_dataset.yml
for more information.