trasgodp.metrics package

Module contents

Privacy-utility metrics.

trasgodp.metrics.correlation_loss(df_original: DataFrame, df_dp: DataFrame, features: List[str] | None = None, method: str = 'pearson', new_column: bool = False) float

Compute utility loss (%) based on the preservation of the correlation.

Parameters:
  • df_original (pandas dataframe) – dataframe with the original data.

  • df_dp (pandas dataframe) – dataframe with the data privatized using DP.

  • features (list) – list of featured for calculating the correlation.

  • method (string) – method for calculating the correlation.

  • new_column (boolean) – boolean, default to False. If True, the columns with dp start with dp. Otherwise, the names of the columns with DP are the same as in the original dataset.

Returns:

utlity loss (%) comparing the difference between correlations.

Return type:

float

trasgodp.metrics.divergence_distributions(df_original: DataFrame, df_dp: DataFrame, column: str, new_column: bool = False) dict

Divergence between the distribution of a column in the original and DP datasets.

Parameters:
  • df_original (pandas dataframe) – dataframe with the original data.

  • df_dp (pandas dataframe) – dataframe with the data privatized using DP.

  • column (string) – column to which DP has been applied.

  • new_column (boolean) – boolean, default to False. If True, the column with dp starts with dp.

Returns:

dictionary with the divergence metrics (TVD, JS, KL).

Return type:

dict