Script 529: Anomoly by Campaign Type

Purpose

Analyze campaign data to identify anomalies and calculate anomaly scores for each product and account.

To Elaborate

The Python script aims to identify accounts with anomalies for each product and determine the contributing campaigns. It calculates an anomaly score based on the deviation from predicted values and the metric being analyzed (e.g., cost, impressions, CTR). The script then lists the accounts and campaigns with outlier scores for each product and account.

The key business rules of the script are as follows:

For each product, identify accounts with anomalies for the previous day.
For each account with anomalies, identify the campaigns that contributed to the anomalies.
Calculate the anomaly score by multiplying the deviation from predicted values by the metric being analyzed.
For ratio metrics (e.g., CTR), scale the anomaly score using an appropriate factor (e.g., impressions).
Sort the accounts and campaigns by their anomaly scores in descending order.

The script uses various data columns, such as campaign type, date, account, campaign, campaign status, cost, impressions, clicks, and CTR, to perform the analysis.

Walking Through the Code

The script defines column constants and imports necessary libraries.
It defines a function named getPredictions to get predictions from the previous 5 weeks of data.
Another function named getAnomalies is defined to calculate anomaly scores and identify anomalies based on a given metric and threshold.
The script defines a function named getOutlier to filter and return outliers from a given dataset.
The script prints the tail of the input dataframe.
It reduces the input dataframe by dropping the campaign status column.
The script performs some data cleaning by replacing certain characters in the account column.
It defines a dictionary agg_func_selection to specify the aggregation functions for different metrics.
The script groups the reduced input dataframe by campaign type, account, and date, and applies the aggregation functions to calculate the sum of metrics.
It applies the getPredictions function to the grouped dataframe to get account-level predictions.
The script calculates the CTR metric for account-level predictions and actual data.
It groups the reduced input dataframe by account, campaign type, campaign, and applies the getPredictions function to get campaign-level predictions.
The script calculates the CTR metric for campaign-level predictions and actual data.
It calls the getAnomalies function to identify anomalies for the impressions metric at the account level.
The script filters the account-level anomalies based on a threshold and stores them in a dataframe.
It calls the getAnomalies function again to identify anomalies for the impressions metric at the campaign level.
The script filters the campaign-level anomalies based on a threshold and the account-level anomalies.
It sorts the campaign-level anomalies and applies the getOutlier function to get the top outliers for each campaign type and account.
The script concatenates the campaign-level and account-level anomalies into a single dataframe.
It repeats steps 14-19 for the CTR and cost metrics.
The script sorts the final dataframe by metric, campaign type, and account.
It rounds all float values in the dataframe to one decimal place.
The script prints the final dataframe in a tabular format.

Vitals

Script ID : 529
Client ID / Customer ID: 1306925141 / 60269255
Action Type: Email Report
Item Changed: None
Output Columns:
Linked Datasource: M1 Report
Reference Datasource: None
Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
Created by ascott@marinsoftware.com on 2023-11-14 18:37
Last Updated by ascott@marinsoftware.com on 2023-12-06 04:01

> See it in Action

Python Code

# # Scripts - Campaign Anomaly Detection by Product -  Confused.com 2710
# 
# * For each Product, identify Accounts with anomaly for yesterday
# * For each Account with anomaly, identify contributing Campaigns
# * Calc Anamoly Score (Deviation times Metric), and list account/campaign with outlier scores for each Product/Account
#   * for Ratio Metrics, use another scaling factor that scales with spend like Impressions, Clicks, Cost
# 
# Initial Scope
# * Cost, Impressions, CTR
#
#
# Author: Michael S Huang
# Created: 2023-08-08
#
#

RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_TYPE = 'Campaign Type'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_IMPR = 'Impr.'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CTR = 'CTR %'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PRODUCTTEST = 'Status'


# Function to get prediction from previous 5 weeks
def getPredictions(data):
    if len(data) >= 36: 
        prediction = np.round(np.mean(data.iloc[[-8, -15, -22, -29, -36]], axis=0), 0)
        return prediction
    else:
        print("not enough data. skipping: ", data.index)
    
    return None




def getAnomalies(df_predict, df_actual, metric, threshold):
    PREDICT = '_predict'
    DEVIATION = 'deviation'
    DEVIATION_PCT = 'deviation_pct'
    SCORE = 'anomaly_score'
    
    df_predict_renamed = df_predict.add_suffix(PREDICT)
    
    df_combined = pd.concat([df_predict_renamed, df_actual], axis=1)
    
    df_combined['metric'] = metric
    
    # negative deviation when less than predicted
    df_combined[DEVIATION] = np.round(df_combined[metric] - df_combined[metric+PREDICT], 1)
    df_combined[DEVIATION_PCT] = np.round((df_combined[metric] - df_combined[metric+PREDICT]) / df_combined[metric+PREDICT] * 100, 1)
 
    # compute anomaly score based on metric type and deviation pct
    # metric that are ratios should use another appropriate factor to scale
    if metric == RPT_COL_CTR:
        # use impressions to scale
        df_combined[SCORE] = np.abs(np.round(df_combined[RPT_COL_IMPR+PREDICT] * df_combined[DEVIATION_PCT], 1))
            
    else:
        # in general, anomaly score is proportional to metric value and deviation pct
        df_combined[SCORE] = np.abs(np.round(df_combined[metric+PREDICT] * df_combined[DEVIATION_PCT], 1))
    
    df_anomaly = df_combined.loc[ df_combined[DEVIATION_PCT].abs() >= threshold ]
    
    if not df_anomaly.empty:
        df_anomaly = df_anomaly.sort_values(by=SCORE, ascending=False)
    
    return df_anomaly
    
    # negative deviation when less than predicted
    df_combined[DEVIATION] = np.round(df_combined[metric] - df_combined[metric+PREDICT], 1)
    df_combined[DEVIATION_PCT] = np.round((df_combined[metric] - df_combined[metric+PREDICT]) / df_combined[metric+PREDICT] * 100, 1)
 
    # compute anomaly score based on metric type and deviation pct
    # metric that are ratios should use another appropriate factor to scale
    if metric == RPT_COL_CTR:
        # use impressions to scale
        df_combined[SCORE] = np.abs(np.round(df_combined[RPT_COL_IMPR+PREDICT] * df_combined[DEVIATION_PCT], 1))
            
    else:
        # in general, anomaly score is proportional to metric value and deviation pct
        df_combined[SCORE] = np.abs(np.round(df_combined[metric+PREDICT] * df_combined[DEVIATION_PCT], 1))
    
    
#     print("getAnomalies: df_predict", df_predict)
#     print("getAnomalies: df_actual", df_actual)
#     print("getAnomalies: df_combined BNG_MOTOR", df_combined.loc['BNG_Motor'])
    
    
    df_anomaly = df_combined.loc[ df_combined[DEVIATION_PCT].abs() >= threshold ]
    
    if not df_anomaly.empty:
        df_anomaly = df_anomaly.sort_values(by=SCORE, axis=0, ascending=False)
    
    return df_anomaly



# returns outliers
# if no obvious outliers, just return top 2
def getOutlier(data, col='anomaly_score', thresh=2):
    
    # for short lists, just return top 1
    if len(data) < 3:
        return data.iloc[[0]]
    
    # to reduce noise, skip low anomaly score rows
    #data = data.loc[data[col] > 5000]

    IRQ = data[col].quantile(0.75) - data[col].quantile(0.25)
    upper_bound = data[col].quantile(0.75) + (thresh * IRQ)
    lower_bound = data[col].quantile(0.25) - (thresh * IRQ)
    anomalies_mask_over = data[col] > upper_bound
    anomalies_mask_under = data[col] < lower_bound

    if sum(anomalies_mask_over) > 0:
        # since anomaly score is calculated using abs(), higher is worst. Only care about over upper bound
        return data.loc[ anomalies_mask_over ]
    else:
        # return top 2
        return data.iloc[[0,1]]


print("inputDf", inputDf.tail(20).to_string())

inputDf_reduced = inputDf.drop([RPT_COL_CAMPAIGN_STATUS], axis=1)

print("inputDf_reduced shape", inputDf_reduced.shape)


# Hack around FEND-17759 bug
inputDf_reduced[RPT_COL_ACCOUNT] = inputDf_reduced[RPT_COL_ACCOUNT].str.replace('���','–')
inputDf_reduced[RPT_COL_ACCOUNT] = inputDf_reduced[RPT_COL_ACCOUNT].str.replace('��','–')
inputDf_reduced[RPT_COL_ACCOUNT] = inputDf_reduced[RPT_COL_ACCOUNT].str.replace('�','–')


# ## Prepare Timeseries for each Product and Account
agg_func_selection = {
    RPT_COL_IMPR: ['sum'],
    RPT_COL_CLICKS: ['sum'],
    RPT_COL_PUB_COST: ['sum'],
}

df_agg_product_account_date = inputDf_reduced.groupby([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT,RPT_COL_DATE]) \
                                .agg(agg_func_selection) \
                                .droplevel(1, axis=1)

# Account-level prediction 
df_predict = df_agg_product_account_date.groupby([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT])\
                                         .apply(getPredictions)

df_predict[RPT_COL_CTR] = np.round(df_predict[RPT_COL_CLICKS] / df_predict[RPT_COL_IMPR], 3) * 100

# Account-level actual
df_actual = df_agg_product_account_date.reset_index() \
                                       .groupby([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT])\
                                       .apply(lambda x: x.iloc[-1]) \
                                       .reset_index(drop=True) \
                                       .set_index([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT])

df_actual[RPT_COL_CTR] = np.round(df_actual[RPT_COL_CLICKS] / df_actual[RPT_COL_IMPR], 3) * 100


# Campaign-level prediction


df_predict_campaign = inputDf_reduced \
                             .set_index([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN,RPT_COL_DATE]) \
                             .groupby([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN]) \
                             .apply(getPredictions)

df_predict_campaign[RPT_COL_CTR] = np.round(df_predict_campaign[RPT_COL_CLICKS] / df_predict_campaign[RPT_COL_IMPR], 3) * 100


# Campaign-level actual
df_actual_campaign = inputDf_reduced.groupby([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN])\
                                    .apply(lambda x: x.iloc[-1]) \
                                    .reset_index(drop=True) \
                                    .set_index([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN])
df_actual_campaign[RPT_COL_CTR] = np.round(df_actual_campaign[RPT_COL_CLICKS] / df_actual_campaign[RPT_COL_IMPR], 3) * 100




# ## Build the entire Anomaly Report

# IMPR
df_account_anomaly_all = getAnomalies(df_predict, df_actual, RPT_COL_IMPR, 20)

df_account_anomaly_output = df_account_anomaly_all.loc[ df_account_anomaly_all[RPT_COL_IMPR+'_predict'] > 1000]


df_campaign_anomaly_all = getAnomalies(df_predict_campaign, df_actual_campaign, RPT_COL_IMPR, 0)

df_campaign_anomaly_output = df_campaign_anomaly_all.loc[df_account_anomaly_output.index.get_level_values(1)] \
                       .loc[ df_campaign_anomaly_all[RPT_COL_IMPR+'_predict'] > 100] \
                       .reset_index() \
                       .sort_index() \
                       .groupby([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]) \
                       .apply(getOutlier) \
                       .reset_index(drop=True) \
                       .set_index([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT])


df_csv_output = pd.concat([df_campaign_anomaly_output, df_account_anomaly_output], axis=0)

# CTR

df_account_anomaly_all = getAnomalies(df_predict, df_actual, RPT_COL_CTR, 20)

df_account_anomaly_output = df_account_anomaly_all.loc[ df_account_anomaly_all[RPT_COL_IMPR+'_predict'] > 1000]


df_campaign_anomaly_all = getAnomalies(df_predict_campaign, df_actual_campaign, RPT_COL_CTR, 0)

df_campaign_anomaly_output = df_campaign_anomaly_all.loc[df_account_anomaly_output.index.get_level_values(1)] \
                       .loc[ df_campaign_anomaly_all[RPT_COL_IMPR+'_predict'] > 100] \
                       .reset_index() \
                       .sort_index() \
                       .groupby([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]) \
                       .apply(getOutlier) \
                       .reset_index(drop=True) \
                       .set_index([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT])

df_csv_output = pd.concat([df_csv_output, df_campaign_anomaly_output, df_account_anomaly_output], axis=0)


# Cost

df_account_anomaly_all = getAnomalies(df_predict, df_actual, RPT_COL_PUB_COST, 20)

df_account_anomaly_output = df_account_anomaly_all.loc[ df_account_anomaly_all[RPT_COL_IMPR+'_predict'] > 1000]


df_campaign_anomaly_all = getAnomalies(df_predict_campaign, df_actual_campaign, RPT_COL_PUB_COST, 0)

df_campaign_anomaly_output = df_campaign_anomaly_all.loc[df_account_anomaly_output.index.get_level_values(1)] \
                       .loc[ df_campaign_anomaly_all[RPT_COL_IMPR+'_predict'] > 100] \
                       .reset_index() \
                       .sort_index() \
                       .groupby([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]) \
                       .apply(getOutlier) \
                       .reset_index(drop=True) \
                       .set_index([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT])


df_csv_output = pd.concat([df_csv_output, df_campaign_anomaly_output, df_account_anomaly_output], axis=0)


# sort CSV output by Product and Account
outputDf = df_csv_output.reset_index().set_index(['metric',RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]).sort_index().reset_index()

# round all floats with 1 DP
outputDf = round(outputDf, 1)

print("Download CSV", tableize(outputDf))

Post generated on 2024-05-15 07:44:05 GMT

14 Nov 2023

« Script 527: Ad Group Outlier Script 531: Anomoly by Campaign Type »

MarinOne Scripts Creator's Corner