SaaS Configuration
What is a SaaS configuration schema?
A SaaS connector is defined in two parts, the Dataset and the SaaS config. The Dataset describes the data that is available from the connector and the SaaS config describes how to connect and retrieve/update the data in the connector. If you contrast this to a database connector, the ways to retrieve/update data conform to a specification (such as SQL) and are consistent. When accessing data from APIs, each application or even different endpoints within the same application can follow different patterns. It was necessary to have a flexible configuration to be able to define the different access/update patterns. Keep in mind that SaaS configs are only applicable to SaaS connectors, not database connectors.
In short, you can think of the Dataset as the "what" (what data is available from this API) and the SaaS config as the "how" (how to access and update the data).
An example SaaS config
For this guide, we will use the SaaS config to connect to Mailchimp, this config defines:
- The domain and authentication requirements for an HTTP client to Mailchimp
- A test request for verifying the connection was set up correctly
- Endpoints to the following resources within the Mailchimp API:
GET
andPUT
for the members resourceGET
for the conversations resourceGET
for the messages resource
The following is an example SaaS config for Mailchimp:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
A SaaS config schema contains the following metadata fields:
fides_key
Used to uniquely identify the connector, this field is used to link a SaaS config to a dataset.name
A human-readable name for the connector.type
Type of SaaS connector. Choose fromhubspot
,mailchimp
,outreach
,segment
,sentry
,stripe
,zendesk
or usecustom
for other types.description
Used to add a useful description.version
Used to track different versions of the SaaS config.
And the following complex fields which we will cover in detail below:
connector_params
client_config
test_request
endpoints
data_protection_request
Connector params
The connector_params
field is used to describe a list of settings which a user must configure as part of the setup. A default_value
can also be used to include values such as a standard base domain for an API or a recommended page size for pagination. Make sure to not include confidential values such as passwords or API keys, these values are added as part of the ConnectionConfig secrets. When configuring a connector's secrets for the first time, the default values will be used if a value is not provided.
1 2 3 4 5 6 7 |
|
Client config
The client_config
describes the necessary information to be able to create a base HTTP client. Notice that the values for host, username, and password are not defined here, only references in the form of a connector_param
which fidesops uses to insert the actual value from the stored secrets.
1 2 3 4 5 6 7 8 |
|
The authentication strategies are swappable. In this example we used the basic
authentication strategy which uses a username
and password
in the configuration. An alternative to this is to use bearer
authentication which looks like this:
1 2 3 4 |
|
Test request
Once the base client is defined we can use a test_request
to verify our hostname and credentials. This is in the form of an idempotent request (usually a read). The testing approach is the same for any ConnectionConfig test.
1 2 3 |
|
Data protection request
If your third party integration supports something like a GDPR delete endpoint, that can be configured as a data_protection_request
. It has similar attributes to the test request or endpoint requests, but it is generally one endpoint that removes all user PII in one go.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Endpoints
This is where we define how we are going to access and update each collection in the corresponding Dataset. The endpoint section contains the following members:
name
This name corresponds to a Collection in the corresponding Dataset.after
To configure if this endpoint should run after other endpoints or collections. This should be a list of collection addresses, for example:after: [ mailchimp_connector_example.member ]
would cause the current endpoint to run after the member endpoint.requests
A map ofread
,update
, anddelete
requests for this collection. Each collection can define a way to read and a way to update the data. Each request is made up of:method
The HTTP method used for the endpoint.path
A static or dynamic resource path. The dynamic portions of the path are enclosed within angle brackets<dynamic_value>
and are replaced with values fromparam_values
.headers
andquery_params
The HTTP headers and query parameters to include in the request.name
the value to use for the header or query param name.value
can be a static value, one or more of<dynamic_value>
, or a mix of static and dynamic values (prefix<value>
) which will be replaced with the value sourced from theparam_value
with a matching name.
body
(optional) static or dynamic request body, with dynamic portions enclosed in brackets, just likepath
. These dynamic values will be replaced with values fromparam_values
.param_values
name
Used as the key to reference this value from dynamic values in the path, headers, query, or body params.references
These are the same asreferences
in the Dataset schema. It is used to define the source of the value for the given param_value.identity
Used to access the identity values passed into the privacy request such as email or phone number.connector_param
Used to access the user-configured secrets for the connection.
ignore_errors
A boolean. If true, we will ignore non-200 status codes.data_path
: The expression used to access the collection information from the raw JSON response.postprocessors
An optional list of response post-processing strategies. We will ignore this for the example scenarios below but an in depth-explanation can be found under SaaS Post-Processorspagination
An optional strategy used to get the next set of results from APIs with resources spanning multiple pages. Details can be found under SaaS Pagination.grouped_inputs
An optional list of reference fields whose inputs are dependent upon one another. For example, an endpoint may need both anorganization_id
and aproject_id
from another endpoint. These aren't independent values, as aproject_id
belongs to anorganization_id
. You would specify this as ["organization_id", "project_id"].client_config
Specify optional embedded Client Configs if an individual request needs a different protocol, host, or authentication strategy from the base Client Config
Param values in more detail
The param_values
list is what provides the values to our various placeholders in the path, headers, query params and body. Values can be identities
such as email or phone number, references
to fields in other collections, or connector_params
which are defined as part of configuring a SaaS connector. Whenever a placeholder is encountered, the placeholder name is looked up in the list of param_values
and corresponding value is used instead. Here is an example of placeholders being used in various locations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
How are requests generated?
The following HTTP request properties are generated for each request based on the endpoint configuration:
- method
- path
- headers
- query params
- body
Method
This is a required field since a read, update, or delete endpoint might use any of the HTTP methods to perform the given action.
Path
This can be a static value or use placeholders. If the placeholders to build the path are not found at request-time, the request will fail.
Headers and query params
These can also be static or use placeholders. If a placeholder is missing, the request will continue and omit the given header or query param in the request.
If reference values are used for the placeholders, each value will be processed independently unless the grouped_inputs
field is set. The following examples use query params but this applies to headers as well.
With ungrouped inputs (default)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
1 2 3 4 5 6 |
|
With grouped inputs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
1 2 3 |
|
Body
The body can be static or use placeholders. If the placeholders to build the body are not found at request-time, the request will fail.
Placeholder options for updates
The following placeholders can be included in the body of an update:
<masked_object_fields>
- any masked fields, along with their masked value<all_object_fields>
- all object fields, including the masked fields and values
Fidesops will automatically fill in the value of these placeholders with the appropriate contents.
Example
An access request returned the following row:
1 2 3 4 5 |
|
With the name
field masked, the value of each placeholder would be:
Placeholder | Value |
---|---|
<masked_object_fields> |
"name":"MASKED" |
<all_object_fields> |
"id":123,"name":"MASKED","address":"Arlen TX" |
all_object_fields
should be used if non-masked fields are required as part of the update payload.
Read-Only fields
A field can be flagged as read-only
in the dataset to exclude it from the value of <all_object_fields>
(for example, if including the id
would cause an error).
1 2 3 4 |
|
id
removed from the result:
Placeholder | Value |
---|---|
<all_object_fields> |
"name":"MASKED","address":"Arlen TX" |
Example scenarios
Dynamic path with dataset references
1 2 3 4 5 6 7 8 9 10 11 12 |
|
/3.0/conversations/<conversation_id>/messages
as the resource path for messages and define the path param of conversation_id
as coming from the id
field of the conversations
collection. A separate GET HTTP request will be issued for each conversations.id
value.
1 2 3 4 |
|
Identity as a query param
1 2 3 4 5 6 7 8 9 10 11 12 |
|
query
query param would be replaced with the value of the param_value
with a name of email
, which is the email
identity. The result would look like this:
1 |
|
Data update with a dynamic path
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
member.id
and one from member.list_id
. Since both of these are references to the member
collection, we must first issue a data retrieval (which will happen automatically if the read
request is defined). If a call to GET /3.0/search-members
returned the following member
object:
1 2 3 4 5 6 7 8 |
|
1 2 3 4 5 6 7 8 9 10 |
|
Data update with a dynamic HTTP body
Sometimes, the update request needs a different body structure than what we obtain from the read request. In this example, we use a custom HTTP body that contains our masked object fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Fidesops will replace the <masked_object_fields>
placeholder with the result of the policy-driven masking service, for example 'company': None, 'email': None
. Note that neither enclosing curly brackets ({
}
) nor a trailing comma (,
) are included as part of the replacement string generated by fidesops.
This results in the following update request:
1 2 3 4 5 6 7 8 9 |
|
How does this relate to graph traversal?
Fidesops uses the available Datasets to generate a graph of all reachable data and the dependencies between Datasets. For SaaS connectors, all the references and identities are stored in the param_values
, therefore we must merge both the SaaS config and Dataset to provide a complete picture for the graph traversal. Using Mailchimp as an example the Dataset collection and SaaS config endpoints for messages
looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
An example of the augmented Dataset with the SaaS Config references would look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
conversation_id
field is updated with a reference from mailchimp_connector_example.conversations.id
. This means that the conversations
collection must be retrieved first to forward the conversation IDs to the messages collection for further processing.
What if a collection has no dependencies?
In the Mailchimp example, you might have noticed the placeholder
request param.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
identity
or Dataset reference
values. The way the fidesops graph traversal interprets this is as an unreachable collection. At this time, the way to mark this as reachable is to include a param_value
with an identity or a reference. In the future we plan on having collections like these still be considered reachable even without this placeholder (the param_value name is not relevant, we just chose placeholder for this example).