Script 531: Anomoly by Campaign Type

Purpose

The Python script detects anomalies in campaign performance metrics for different campaign types and accounts.

To Elaborate

The script is designed to identify anomalies in advertising campaign performance by analyzing key metrics such as cost, impressions, and click-through rate (CTR) for different campaign types and accounts. It uses historical data to predict expected values and compares them with actual values to calculate deviations. An anomaly score is computed based on these deviations, highlighting accounts and campaigns with significant discrepancies. The script aims to pinpoint unusual performance patterns that may require further investigation or action, ensuring that marketing efforts are optimized and any issues are promptly addressed.

Walking Through the Code

Data Preparation:
- The script begins by cleaning and preparing the input data, removing unnecessary columns and fixing encoding issues in account names.
- It aggregates data by campaign type, account, and date, summing up metrics like impressions, clicks, and cost.
Prediction Calculation:
- For each account and campaign type, the script calculates predictions based on the average of specific past weeks’ data.
- It computes the predicted CTR by dividing predicted clicks by predicted impressions.
Anomaly Detection:
- The script defines a function to calculate anomalies by comparing predicted and actual values, computing deviation percentages and anomaly scores.
- It identifies significant anomalies by applying a threshold to the deviation percentage.
Outlier Identification:
- A function is used to identify outliers in the anomaly scores, focusing on those significantly above the calculated upper bound.
Anomaly Reporting:
- The script generates anomaly reports for impressions, CTR, and cost, filtering and sorting the results to highlight the most significant anomalies.
- It combines account-level and campaign-level anomalies into a final output, rounding numerical values for clarity.

Vitals

Script ID : 531
Client ID / Customer ID: 1306924501 / 60269325
Action Type: Email Report
Item Changed: None
Output Columns:
Linked Datasource: M1 Report
Reference Datasource: None
Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
Created by ascott@marinsoftware.com on 2023-11-14 19:12
Last Updated by Aaron Thomas on 2023-12-06 04:01

> See it in Action

Python Code

## # Scripts - Campaign Anomaly Detection by Campaign Type -  Fisher Investments 68428
# 
# * For each Campaign Type, find Accounts with anomaly for yesterday
# * For each Account with anomaly, identify contributing Campaigns
# * Calc Anomaly Score (Deviation times Metric), and list account/campaign with outlier scores for each CampaignType/Account
#   * for Ratio Metrics, use another scaling factor that scales with spend like Impressions, Clicks, Cost
# 
# Initial Scope
# * Cost, Impressions, CTR
#
#
# Authors: Michael S Huang, Adam Scott
# Created: 2023-11-14
#
#

RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_TYPE = 'Campaign Type'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_IMPR = 'Impr.'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CTR = 'CTR %'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PRODUCTTEST = 'Status'


# Function to get prediction from previous 5 weeks
def getPredictions(data):
    if len(data) >= 36: 
        prediction = np.round(np.mean(data.iloc[[-8, -15, -22, -29, -36]], axis=0), 0)
        return prediction
    else:
        print("not enough data. skipping: ", data.index)
    
    return None




def getAnomalies(df_predict, df_actual, metric, threshold):
    PREDICT = '_predict'
    DEVIATION = 'deviation'
    DEVIATION_PCT = 'deviation_pct'
    SCORE = 'anomaly_score'
    
    df_predict_renamed = df_predict.add_suffix(PREDICT)
    df_combined = pd.concat([df_predict_renamed, df_actual], axis=1)
    
    df_combined['metric'] = metric
    
    # negative deviation when less than predicted
    df_combined[DEVIATION] = np.round(df_combined[metric] - df_combined[metric+PREDICT], 1)
    df_combined[DEVIATION_PCT] = np.round((df_combined[metric] - df_combined[metric+PREDICT]) / df_combined[metric+PREDICT] * 100, 1)
 
    # compute anomaly score based on metric type and deviation pct
    # metric that are ratios should use another appropriate factor to scale
    if metric == RPT_COL_CTR:
        # use impressions to scale
        df_combined[SCORE] = np.abs(np.round(df_combined[RPT_COL_IMPR+PREDICT] * df_combined[DEVIATION_PCT], 1))
            
    else:
        # in general, anomaly score is proportional to metric value and deviation pct
        df_combined[SCORE] = np.abs(np.round(df_combined[metric+PREDICT] * df_combined[DEVIATION_PCT], 1))
    
    df_anomaly = df_combined.loc[ df_combined[DEVIATION_PCT].abs() >= threshold ]
    
    if not df_anomaly.empty:
        df_anomaly = df_anomaly.sort_values(by=SCORE, ascending=False)
    
    return df_anomaly
    


# returns outliers
# if no obvious outliers, just return top 2
def getOutlier(data, col='anomaly_score', thresh=2):
    
    # for short lists, just return top 1
    if len(data) < 3:
        return data.iloc[[0]]
    
    # to reduce noise, skip low anomaly score rows
    #data = data.loc[data[col] > 5000]

    IRQ = data[col].quantile(0.75) - data[col].quantile(0.25)
    upper_bound = data[col].quantile(0.75) + (thresh * IRQ)
    lower_bound = data[col].quantile(0.25) - (thresh * IRQ)
    anomalies_mask_over = data[col] > upper_bound
    anomalies_mask_under = data[col] < lower_bound

    if sum(anomalies_mask_over) > 0:
        # since anomaly score is calculated using abs(), higher is worst. Only care about over upper bound
        return data.loc[ anomalies_mask_over ]
    else:
        # return top 2
        return data.iloc[[0,1]]


print("inputDf", inputDf.tail(20).to_string())

inputDf_reduced = inputDf.drop([RPT_COL_CAMPAIGN_STATUS], axis=1)

print("inputDf_reduced shape", inputDf_reduced.shape)


# Hack around FEND-17759 bug
inputDf_reduced[RPT_COL_ACCOUNT] = inputDf_reduced[RPT_COL_ACCOUNT].str.replace('���','–')
inputDf_reduced[RPT_COL_ACCOUNT] = inputDf_reduced[RPT_COL_ACCOUNT].str.replace('��','–')
inputDf_reduced[RPT_COL_ACCOUNT] = inputDf_reduced[RPT_COL_ACCOUNT].str.replace('�','–')


# ## Prepare Timeseries for each Campaign Type and Account
agg_func_selection = {
    RPT_COL_IMPR: ['sum'],
    RPT_COL_CLICKS: ['sum'],
    RPT_COL_PUB_COST: ['sum'],
}

df_agg_campaigntype_account_date = inputDf_reduced.groupby([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT,RPT_COL_DATE]) \
                                .agg(agg_func_selection) \
                                .droplevel(1, axis=1)

# Account-level prediction 
df_predict = df_agg_campaigntype_account_date.groupby([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT])\
                                         .apply(getPredictions)

df_predict[RPT_COL_CTR] = np.round(df_predict[RPT_COL_CLICKS] / df_predict[RPT_COL_IMPR], 3) * 100

# Account-level actual
df_actual = df_agg_campaigntype_account_date.reset_index() \
                                       .groupby([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT])\
                                       .apply(lambda x: x.iloc[-1]) \
                                       .reset_index(drop=True) \
                                       .set_index([RPT_COL_CAMPAIGN_TYPE,RPT_COL_ACCOUNT])

df_actual[RPT_COL_CTR] = np.round(df_actual[RPT_COL_CLICKS] / df_actual[RPT_COL_IMPR], 3) * 100


# Campaign-level prediction


df_predict_campaign = inputDf_reduced \
                             .set_index([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN,RPT_COL_DATE]) \
                             .groupby([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN]) \
                             .apply(getPredictions)

df_predict_campaign[RPT_COL_CTR] = np.round(df_predict_campaign[RPT_COL_CLICKS] / df_predict_campaign[RPT_COL_IMPR], 3) * 100


# Campaign-level actual
df_actual_campaign = inputDf_reduced.groupby([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN])\
                                    .apply(lambda x: x.iloc[-1]) \
                                    .reset_index(drop=True) \
                                    .set_index([RPT_COL_ACCOUNT,RPT_COL_CAMPAIGN_TYPE,RPT_COL_CAMPAIGN])
df_actual_campaign[RPT_COL_CTR] = np.round(df_actual_campaign[RPT_COL_CLICKS] / df_actual_campaign[RPT_COL_IMPR], 3) * 100




# ## Build the entire Anomaly Report

# IMPR
df_account_anomaly_all = getAnomalies(df_predict, df_actual, RPT_COL_IMPR, 20)

df_account_anomaly_output = df_account_anomaly_all.loc[ df_account_anomaly_all[RPT_COL_IMPR+'_predict'] > 1000]


df_campaign_anomaly_all = getAnomalies(df_predict_campaign, df_actual_campaign, RPT_COL_IMPR, 0)

df_campaign_anomaly_output = df_campaign_anomaly_all.loc[df_account_anomaly_output.index.get_level_values(1)] \
                       .loc[ df_campaign_anomaly_all[RPT_COL_IMPR+'_predict'] > 100] \
                       .reset_index() \
                       .sort_index() \
                       .groupby([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]) \
                       .apply(getOutlier) \
                       .reset_index(drop=True) \
                       .set_index([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT])


df_csv_output = pd.concat([df_campaign_anomaly_output, df_account_anomaly_output], axis=0)

# CTR

df_account_anomaly_all = getAnomalies(df_predict, df_actual, RPT_COL_CTR, 20)

df_account_anomaly_output = df_account_anomaly_all.loc[ df_account_anomaly_all[RPT_COL_IMPR+'_predict'] > 1000]


df_campaign_anomaly_all = getAnomalies(df_predict_campaign, df_actual_campaign, RPT_COL_CTR, 0)

df_campaign_anomaly_output = df_campaign_anomaly_all.loc[df_account_anomaly_output.index.get_level_values(1)] \
                       .loc[ df_campaign_anomaly_all[RPT_COL_IMPR+'_predict'] > 100] \
                       .reset_index() \
                       .sort_index() \
                       .groupby([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]) \
                       .apply(getOutlier) \
                       .reset_index(drop=True) \
                       .set_index([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT])

df_csv_output = pd.concat([df_csv_output, df_campaign_anomaly_output, df_account_anomaly_output], axis=0)


# Cost

df_account_anomaly_all = getAnomalies(df_predict, df_actual, RPT_COL_PUB_COST, 20)

df_account_anomaly_output = df_account_anomaly_all.loc[ df_account_anomaly_all[RPT_COL_IMPR+'_predict'] > 1000]


df_campaign_anomaly_all = getAnomalies(df_predict_campaign, df_actual_campaign, RPT_COL_PUB_COST, 0)

df_campaign_anomaly_output = df_campaign_anomaly_all.loc[df_account_anomaly_output.index.get_level_values(1)] \
                       .loc[ df_campaign_anomaly_all[RPT_COL_IMPR+'_predict'] > 100] \
                       .reset_index() \
                       .sort_index() \
                       .groupby([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]) \
                       .apply(getOutlier) \
                       .reset_index(drop=True) \
                       .set_index([RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT])


df_csv_output = pd.concat([df_csv_output, df_campaign_anomaly_output, df_account_anomaly_output], axis=0)


# sort CSV output by Product and Account
outputDf = df_csv_output.reset_index().set_index(['metric',RPT_COL_CAMPAIGN_TYPE, RPT_COL_ACCOUNT]).sort_index().reset_index()

# round all floats with 1 DP
outputDf = round(outputDf, 1)

print("Email CSV", tableize(outputDf.head()))

Post generated on 2024-11-27 06:58:46 GMT

14 Nov 2023

« Script 529: Anomoly by Campaign Type Script 533: Campaign Anomoly »

MarinOne Scripts Creator's Corner