Using the Fidescls Module

Installing fidescls from Pypi allows for easily importing both Content and Context classification systems into your own projects:

from fidescls.cls import content, context

Classification Methods

Content Classification

Calling content.classify() with a string, list, or dictionary, will return potential PII classifications for the given input.

Content Classification
content_test_string = "example@email.com"
content_test_list = ["email@example.com", "example@email.com"]
content_test_dict = {
    'email': content_test_list,
    'name': ["John Smith", "Jane Smith"]
}

## Classify a string, list, and dictionary
content_cls_string = content.classify(content_test_string)
content_cls_list = content.classify(content_test_list)
content_cls_dict = content.classify(content_test_dict)

Reviewing the results shows the classification suggestions for each input, as well as the percentage score of certainty:

Results
## String Results
[ClassifyOutput(input='example@email.com', labels=
[MethodOutput(label='EMAIL_ADDRESS', score=1.0, position_start=0, position_end=17)])]

## List Results
[ClassifyOutput(input='email@example.com', labels=
[MethodOutput(label='EMAIL_ADDRESS', score=1.0, position_start=0, position_end=17)]),

ClassifyOutput(input='example@email.com', labels=
[MethodOutput(label='EMAIL_ADDRESS', score=1.0, position_start=0, position_end=17)])]

## Dictionary Results
{'email':
[ClassifyOutput(input='email@example.com', labels=
[MethodOutput(label='EMAIL_ADDRESS', score=1.0, position_start=0, position_end=17)]),
ClassifyOutput(input='test@email.com', labels=
[MethodOutput(label='EMAIL_ADDRESS', score=1.0, position_start=0, position_end=14)])],

'name':
[ClassifyOutput(input='John Smith', labels=
[MethodOutput(label='PERSON', score=0.85, position_start=0, position_end=10)]),
ClassifyOutput(input='Jane Smith', labels=
[MethodOutput(label='PERSON', score=0.85, position_start=0, position_end=10)])]}

Context Classification

Classifying by context requires a set of data categories to compare your input against. The fideslang taxonomy allows you to easily classify your systems by common privacy definitions and standards, and is imported by default to easily work with fidescls' classification systems.

Calling context.classify(column_name) with a column name will return possible classification labels for the column's contents. You can also provide an integer value for top_n to specify the amount of potential categories returned for each input.

Context Classification
first_name_column = 'first_name'
phone_column = 'phone_num'

data_categories = [
    "user.payment.financial_account_number",
    "user.financial.account_number",
    "user.contact.phone_number",
    "user.name",
    "user.contact.address.street",
    "user.contact.address.city",
    "user.contact.address.state",
    "user.contact.address.country",
    "user.contact.address.postal_code"
    "user.location"
]

context_cls = context.classify(
    [first_name_column, phone_column],
    method_name='similarity',
    method_params={
        possible_targets=data_categories,
        top_n=1
    }
)

Reviewing the results shows the classification suggestions for each input, as well as the percentage score of certainty:

Results
[ClassifyOutput(input='first_name', labels=
[MethodOutput(label='user.name', score=0.6663, position_start=None, position_end=None)]), ClassifyOutput(input='phone_num', labels=
[MethodOutput(label='user.contact.phone_number', score=0.6606, position_start=None, position_end=None)])]

Aggregation Methods

To take advantage of Fidescls' data aggregation methods, aggregation module is available as an import for your own projects:

from fidescls.cls import aggregation

Content Aggregation

Providing content.classify() with an aggregation_method alongside the data to be classified (represented here as content_data) will aggregate the results into a higher-level classification recommendation.

from fidescls.cls import content

content_cls = content.classify(
    content_data,
    aggregation_method = 'mean'
)

Context Aggregation

Calling aggregation.aggregate_system() with both content and context classification results will aggregate their suggestions into a single, compiled result.

from fidescls.cls import aggregation

sys_agg = aggregation.aggregate_system(
    context_cls,
    content_cls,
    aggregation_method = 'weighted',
    aggregation_params = {
        'context_weight': 0.5,
        'content_weight': 0.5,
        'top_n': 5
    }
)

In the example above, context_cls and content_cls represent the results of context.classify() and content.classify(), respectfully.

Here the aggregation_method is weighted, which will apply a multiplier, or weight, to adjust to the Content and Context results.

The optional aggregation_params allow you to override the default Context and Content weights, as well as the number of results to return (top_n).

Next Steps

Learn more about content and context classifiers, and how to aggregate and interpret classification results.