Aggregating Results
Results from content and context classifications can be aggregated to provide the highest scoring labels for a given column's data.
In a production environment where input data might not always be clean, aggregation provides additional confidence in category suggestions by providing multiple classification perspectives.
Content Aggregation
Content aggregation provides PII category suggestions for an individual object or column cell, which are then compiled to arrive at a single category label for the column itself.
To aggregate a series of content entries, send a POST
request to the /classify
endpoint with a list of data to be aggregated:
POST /text/classify | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Sample Response | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
System Aggregation
Fidescls can aggregate the results of both Content and Context classification to provide you with a system-level suggestion for a given column's contents and metadata:
To aggregate system data, send a POST
request to the /classify
endpoint with both content
and context
data.
POST /text/classify | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
Sample Response | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
The data_aggregation
object uses a weighted scale to compile Content and Context results. This weight can be adjusted in the context_weight
and content_weight
fields.
You can specify the amount of results you would like returned with top_n
.
Classifier Weights
When dealing with system aggregation, Fidescls uses a weighted scale to accommodate the measurement differences between content and context classification methods.
The weight used is a percent-based scale that must add to 1. This scale is adjustable via the method_params
field, which then represent multiplication factors applied to the classification results.
By default, context is weighted more heavily than content.