anonymity.metrics package#
Submodules#
anonymity.metrics.data_utility_metrics module#
- anonymity.metrics.data_utility_metrics.avr_equiv_class_size(og_table: DataFrame, new_table: DataFrame, qi: List | ndarray) float #
Measures how well the creation of the EQs approaches the best case, where each record is generalized in an EQ of k records.
- Parameters:
og_table (pandas dataframe) – dataframe with the original data under study.
new_table (pandas dataframe) – dataframe with the anonymized data under study.
qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.
- Returns:
Measure of how well the creation of the EQs approaches the best case.
- Return type:
float
- anonymity.metrics.data_utility_metrics.create_vgh(hierarchy: dict) List | ndarray #
Creates the auxiliary hierarchies to facilitate the measuring of the information loss function.
- Parameters:
hierarchy (dictionary) – hierarchies for generalization of string columns.
og_table (pandas dataframe) – dataframe with the original data under study.
new_table (pandas dataframe) – dataframe with the anonymized data under study.
numeric_hie (dictionary) – steps for the intervals of numeric columns.
- Returns:
an array with both the auxiliar hierarchies and the number of
occurancies of each element on both the original table and the anonymized table. :rtype: array of dictionaries
- anonymity.metrics.data_utility_metrics.discernibility(og_table: DataFrame, new_table: DataFrame, qi: List | ndarray) float #
Measures how indistinguishable a record is from others, by assigning a penalty to each record, equal to the size of the EQ to which it belongs.
- Parameters:
og_table (pandas dataframe) – dataframe with the original data under study.
new_table (pandas dataframe) – dataframe with the anonymized data under study.
qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.
- Returns:
Measure of how indistinguishable the table is.
- Return type:
float
- anonymity.metrics.data_utility_metrics.generalized_information_loss(hierarchy: dict, og_table: DataFrame, new_table: DataFrame, qi: List | ndarray) float #
Captures the penalty incurred when generalizing a table, by quantifying the fraction of the domain values that have been generalized for each specific attribute.
- Parameters:
hierarchy (dictionary) – hierarchies for generalization of string columns.
og_table (pandas dataframe) – dataframe with the original data under study.
new_table (pandas dataframe) – dataframe with the anonymized data under study.
numeric_hie (dictionary) – steps for the intervals of numeric columns.
qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.
- Returns:
The penalty incurred when generalizing a table.
- Return type:
float
- anonymity.metrics.data_utility_metrics.get_level_generalization(name: str, level: int)#
Updates the global variable which contains the generalization levels of each parameter of the table of the function which is being monitored.
- Parameters:
name (string) – Name of the column which level we want to save
level (int) – Level of generalization of a given column.
- anonymity.metrics.data_utility_metrics.start_level()#
Resets the global variable which contains the generalization levels of each parameter of the table of the function which is being monitored.
- anonymity.metrics.data_utility_metrics.string_to_interval(column: List | ndarray) List | ndarray #
Converts a string interval to an actual interval type, to facilitate the comparison of each data.
- Parameters:
column (list of strings) – List of intervals as strings.
- Returns:
List containing the intervals converted to the proper data type.
- Return type:
list of intervals
anonymity.metrics.efficiency_metrics module#
- anonymity.metrics.efficiency_metrics.end_monitor_time()#
Updates the global variable containing the end time of the execution and prints the execution time
- anonymity.metrics.efficiency_metrics.monitor_cost(type_of: str)#
Prints the cost metric for the specified algorithm.
- Parameters:
type_of (string) – Name of the algorithm you want to monitor.
- anonymity.metrics.efficiency_metrics.monitor_cost_add(type_of: str)#
Updates the cost metric for the specified algorithm.
- Parameters:
type_of (string) – Name of the algorithm you want to monitor.
- anonymity.metrics.efficiency_metrics.monitor_cost_init(type_of: str)#
Resets the cost metric for the specified algorithm.
- Parameters:
type_of (string) – Name of the algorithm you want to monitor.
- anonymity.metrics.efficiency_metrics.monitor_memory_consumption_start()#
Starts monitoring the memory consumption of the function.
- anonymity.metrics.efficiency_metrics.monitor_memory_consumption_stop()#
Finished monitoring the memory consumption of the function and prints it.
- anonymity.metrics.efficiency_metrics.monitor_time()#
Prints the execution time of the function
- anonymity.metrics.efficiency_metrics.start_monitor_time()#
Updates the global variable containing the starting time of the execution