anonymity.metrics package#

Submodules#

anonymity.metrics.data_utility_metrics module#

anonymity.metrics.data_utility_metrics.avr_equiv_class_size(og_table: DataFrame, new_table: DataFrame, qi: List | ndarray) float#

Measures how well the creation of the EQs approaches the best case, where each record is generalized in an EQ of k records.

Parameters:
  • og_table (pandas dataframe) – dataframe with the original data under study.

  • new_table (pandas dataframe) – dataframe with the anonymized data under study.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

Returns:

Measure of how well the creation of the EQs approaches the best case.

Return type:

float

anonymity.metrics.data_utility_metrics.create_vgh(hierarchy: dict) List | ndarray#

Creates the auxiliary hierarchies to facilitate the measuring of the information loss function.

Parameters:
  • hierarchy (dictionary) – hierarchies for generalization of string columns.

  • og_table (pandas dataframe) – dataframe with the original data under study.

  • new_table (pandas dataframe) – dataframe with the anonymized data under study.

  • numeric_hie (dictionary) – steps for the intervals of numeric columns.

Returns:

an array with both the auxiliar hierarchies and the number of

occurancies of each element on both the original table and the anonymized table. :rtype: array of dictionaries

anonymity.metrics.data_utility_metrics.discernibility(og_table: DataFrame, new_table: DataFrame, qi: List | ndarray) float#

Measures how indistinguishable a record is from others, by assigning a penalty to each record, equal to the size of the EQ to which it belongs.

Parameters:
  • og_table (pandas dataframe) – dataframe with the original data under study.

  • new_table (pandas dataframe) – dataframe with the anonymized data under study.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

Returns:

Measure of how indistinguishable the table is.

Return type:

float

anonymity.metrics.data_utility_metrics.generalized_information_loss(hierarchy: dict, og_table: DataFrame, new_table: DataFrame, qi: List | ndarray) float#

Captures the penalty incurred when generalizing a table, by quantifying the fraction of the domain values that have been generalized for each specific attribute.

Parameters:
  • hierarchy (dictionary) – hierarchies for generalization of string columns.

  • og_table (pandas dataframe) – dataframe with the original data under study.

  • new_table (pandas dataframe) – dataframe with the anonymized data under study.

  • numeric_hie (dictionary) – steps for the intervals of numeric columns.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

Returns:

The penalty incurred when generalizing a table.

Return type:

float

anonymity.metrics.data_utility_metrics.get_level_generalization(name: str, level: int)#

Updates the global variable which contains the generalization levels of each parameter of the table of the function which is being monitored.

Parameters:
  • name (string) – Name of the column which level we want to save

  • level (int) – Level of generalization of a given column.

anonymity.metrics.data_utility_metrics.start_level()#

Resets the global variable which contains the generalization levels of each parameter of the table of the function which is being monitored.

anonymity.metrics.data_utility_metrics.string_to_interval(column: List | ndarray) List | ndarray#

Converts a string interval to an actual interval type, to facilitate the comparison of each data.

Parameters:

column (list of strings) – List of intervals as strings.

Returns:

List containing the intervals converted to the proper data type.

Return type:

list of intervals

anonymity.metrics.efficiency_metrics module#

anonymity.metrics.efficiency_metrics.end_monitor_time()#

Updates the global variable containing the end time of the execution and prints the execution time

anonymity.metrics.efficiency_metrics.monitor_cost(type_of: str)#

Prints the cost metric for the specified algorithm.

Parameters:

type_of (string) – Name of the algorithm you want to monitor.

anonymity.metrics.efficiency_metrics.monitor_cost_add(type_of: str)#

Updates the cost metric for the specified algorithm.

Parameters:

type_of (string) – Name of the algorithm you want to monitor.

anonymity.metrics.efficiency_metrics.monitor_cost_init(type_of: str)#

Resets the cost metric for the specified algorithm.

Parameters:

type_of (string) – Name of the algorithm you want to monitor.

anonymity.metrics.efficiency_metrics.monitor_memory_consumption_start()#

Starts monitoring the memory consumption of the function.

anonymity.metrics.efficiency_metrics.monitor_memory_consumption_stop()#

Finished monitoring the memory consumption of the function and prints it.

anonymity.metrics.efficiency_metrics.monitor_time()#

Prints the execution time of the function

anonymity.metrics.efficiency_metrics.start_monitor_time()#

Updates the global variable containing the starting time of the execution

Module contents#