trasgodp.metrics package¶
Module contents¶
Privacy-utility metrics.
- trasgodp.metrics.correlation_loss(df_original: DataFrame, df_dp: DataFrame, features: List[str] | None = None, method: str = 'pearson', new_column: bool = False) float¶
Compute utility loss (%) based on the preservation of the correlation.
- Parameters:
df_original (pandas dataframe) – dataframe with the original data.
df_dp (pandas dataframe) – dataframe with the data privatized using DP.
features (list) – list of featured for calculating the correlation.
method (string) – method for calculating the correlation.
new_column (boolean) – boolean, default to False. If True, the columns with dp start with dp. Otherwise, the names of the columns with DP are the same as in the original dataset.
- Returns:
utlity loss (%) comparing the difference between correlations.
- Return type:
float
- trasgodp.metrics.divergence_distributions(df_original: DataFrame, df_dp: DataFrame, column: str, new_column: bool = False) dict¶
Divergence between the distribution of a column in the original and DP datasets.
- Parameters:
df_original (pandas dataframe) – dataframe with the original data.
df_dp (pandas dataframe) – dataframe with the data privatized using DP.
column (string) – column to which DP has been applied.
new_column (boolean) – boolean, default to False. If True, the column with dp starts with dp.
- Returns:
dictionary with the divergence metrics (TVD, JS, KL).
- Return type:
dict