Query Execution
Graphs and Traversals
Fidesops uses your Datasets to generate a graph of the resources. Based on the identity data you provide, fidesops then generates a specific traversal, which is the order of steps that will be taken to fulfill a specific request.
The graph supports both directed and non-directed edges using the optional direction
parameter on the relation (non-directional edges may be traversed in either direction). You can preview the queries that will be generated or manually control the order of operations by making relations explicitly directional and with the after
Collection parameters. If you specify a Collection that can't be reached, fidesops generates an error.
An example graph
In this example there are three databases: a mysql database that stores users and their comments, a postgres DB that stores purchase information, and a mongoDB that stores user accounts. Each of them may have related data that we'd like to retrieve.
The Dataset specification looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
We trigger a retrieval with identity data, such as an email address or user ID, that's provided by the user. What we do is...
- Identify the collections that contain the identity data that belong to the user.
- Find all related records.
- Use the data to find all connected data.
- Continue until we've found all related data.
For the first step, we use the concept of an identity
. In the fidesops Dataset specification, any field may be marked with an identity
notation:
1 2 3 4 5 6 |
|
What this means is that we will initiate the data retrieval process with provided data that looks like
{"email": "user@example.com", "username": "someone"}
by looking for values in the collection users
where email == user@example.com
. Note that the names of the provided starter data do not need to match the field names we're going to use this data to search. Also note that in this case, since we're providing two pieces of data, we can also choose to start a search using the username provided value. In the above diagram, this means we have enough data to search in both postgres_1.users.email
and mongo_1.users.user_name
.
How does fidesops execute queries?
The next step is to follow any links provided in field relationship information. In the abbreviated dataset declarations below, you can see that since we know that mongo_1.accounts
data contains data related to mongo_1.users
, we can retrieve data from mongo_1.accounts
by generating this set of queries:
1 2 3 4 5 6 7 8 9 10 11 |
|
Logically, we are creating a linked graph using the connections you've specified between your collections to retrieve your data.
Notes about Dataset traversals
-
You can define multiple links between collections, which will generate OR queries like
SELECT a,b,c from TABLE_1 where name in (values from TABLE\_2) OR email in (values from TABLE\_3)
. -
It's an error to specify a collection in your Dataset can't be reached through the relations you've specified.
-
Fidesops uses your Datasets and your input data to "solve" the graph of your collections and how it is traversed. If your Dataset has multiple identity values, you can create a situation where the query behavior depends on the values you provide. In the example above, starting the graph traversal with
{"email": "value1", "username":" value2"}
is valid, but starting with{"email": "value1"}
fails becausemongo_1.users
is no longer reachable. -
As shown in the example, you can create queries between Datasets.