Script 513: AdGroup Cost Conv $ Outlier Tagging

Purpose

The Python script identifies and tags AdGroups with abnormally low Cost/Conversion performance within a campaign over a specified lookback period.

To Elaborate

The script is designed to detect and tag AdGroups within advertising campaigns that exhibit unusually low Cost/Conversion performance. It analyzes data over a 30-day period, excluding the most recent three days to account for conversion delays. The script aggregates performance metrics such as public cost, conversions, revenue, and clicks for each AdGroup within a campaign. It then calculates key performance indicators like Cost/Conversion, ROAS, conversion rate, and average CPC. Anomalies are identified using the Interquartile Range (IQR) method, focusing on Cost/Conversion metrics. The script flags AdGroups as outliers if their Cost/Conversion is significantly lower than the campaign average, provided they have a higher spend than the campaign median. This helps in identifying underperforming AdGroups that may require attention or adjustment.

Walking Through the Code

Data Preparation
- The script first filters the input data to include only records from the past 30 days, excluding the most recent three days.
- It reduces the dataset to essential columns and aggregates metrics like public cost, conversions, revenue, and clicks by AdGroup within each campaign.
Feature Calculation
- It calculates performance indicators such as Cost/Conversion, ROAS, conversion rate, and average CPC for each AdGroup.
Anomaly Detection Functions
- The script defines functions to detect anomalies using the IQR method. It identifies outliers by comparing each AdGroup’s performance against calculated thresholds.
Anomaly Identification
- For each campaign, the script checks if any AdGroup’s Cost/Conversion is significantly lower than the campaign average, considering only those with higher spend than the median.
- It tags these AdGroups as anomalies and prepares a summary of the findings.
Output Preparation
- If anomalies are found, the script compiles the results into a DataFrame for further analysis or reporting. If no anomalies are detected, it outputs an empty DataFrame.

Vitals

Script ID : 513
Client ID / Customer ID: 367770165 / 16077
Action Type: Bulk Upload
Item Changed: AdGroup
Output Columns: Account, Campaign, Group, AUTOMATION - INFO
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Jesus Garza (jgarza@marinsoftware.com)
Created by Jesus Garza on 2023-11-07 22:15
Last Updated by Autumn Archibald on 2023-12-08 17:37

> See it in Action

Python Code

#
# Tag AdGroup if ROAS performance is abnormally low within Campaign
#
#
# Author: Michael S. Huang
# Date: 2023-03-24

RPT_COL_GROUP = 'Group'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_GROUP_ID = 'Group ID'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_COST_PER_CONV = 'Cost/Conv. $'
RPT_COL_ROAS = 'ROAS'
RPT_COL_CONV_RATE = 'Conv. Rate %'
RPT_COL_AVG_CPC = 'Avg. CPC $'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CONV = 'Conv.'
RPT_COL_REVENUE = 'Revenue $'
RPT_COL_IMPR = 'Impr.'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_AUTOMATION_INFO = 'AUTOMATION - INFO'

outputDf[BULK_COL_AUTOMATION_INFO] = numpy.nan

## Data Prep

print(inputDf[RPT_COL_DATE].min(), inputDf[RPT_COL_DATE].max())

# 30-day lookback without most recent 3 days due to conversion lag
start_date = pd.to_datetime(datetime.date.today() - datetime.timedelta(days=33))
end_date = pd.to_datetime(datetime.date.today() - datetime.timedelta(days=3))

df_reduced = inputDf[ (inputDf[RPT_COL_DATE] >= start_date) & (inputDf[RPT_COL_DATE] <= end_date) ]

if (df_reduced.shape[0] > 0):
    print("reduced dates\\n", min(df_reduced[RPT_COL_DATE]), max(df_reduced[RPT_COL_DATE]))
else:
    print("no more input to process")

# reduce to needed columns
df_reduced = df_reduced[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, RPT_COL_DATE, RPT_COL_PUB_COST, RPT_COL_CONV, RPT_COL_REVENUE, RPT_COL_CLICKS]].copy()

# sum metics across dates
df_group_perf = df_reduced.groupby([RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP])[[RPT_COL_PUB_COST, RPT_COL_CONV, RPT_COL_REVENUE, RPT_COL_CLICKS]].sum()
# remove rows without cost 
df_group_perf = df_group_perf[(df_group_perf[RPT_COL_PUB_COST] > 0)]

# index by campaign
df_group_perf = df_group_perf.reset_index().set_index([RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN]).sort_index()

# calculate features
df_group_perf[RPT_COL_COST_PER_CONV] = (df_group_perf[RPT_COL_PUB_COST] / df_group_perf[RPT_COL_CONV])
df_group_perf[RPT_COL_ROAS] = df_group_perf[RPT_COL_REVENUE] / df_group_perf[RPT_COL_PUB_COST]
df_group_perf[RPT_COL_CONV_RATE] = df_group_perf[RPT_COL_CONV] / df_group_perf[RPT_COL_CLICKS]
df_group_perf[RPT_COL_AVG_CPC] = (df_group_perf[RPT_COL_PUB_COST] / df_group_perf[RPT_COL_CLICKS])


## Define Anomaly Fuctions

# Finds anomalies using a certain function (e.g. sigma rule, IRQ etc.)
# data: DataFrame
#     Dataset with features
# func: func
#     Function to use to find anomalies
# features: list
#     Feature list
# thresh: int
#     Threshold value (e.g. 2/3 * sigma, 2/3 * IRQ)
# Returns: tuple
def get_feature_anomalies(data, func, features=None, thresh=3):

    if features:
        features_to_check = features
    else:
        features_to_check = data.columns 
        
    outliers_over = pd.Series(data=[False] * data.shape[0], index=data[features_to_check].index, name='is_outlier')
    outliers_under = pd.Series(data=[False] * data.shape[0], index=data[features_to_check].index, name='is_outlier')

    anomalies_summary = {}
    for feature in features_to_check:
        anomalies_mask_over, anomalies_mask_under, upper_bound, lower_bound = func(data, feature, thresh=thresh)
        anomalies_mask_combined = pd.concat([anomalies_mask_over, anomalies_mask_under], axis=1).any(axis=1)
        anomalies_summary[feature] = [upper_bound, lower_bound, sum(anomalies_mask_combined), 100*sum(anomalies_mask_combined)/len(anomalies_mask_combined)]
        outliers_over[anomalies_mask_over[anomalies_mask_over].index] = True
        outliers_under[anomalies_mask_under[anomalies_mask_under].index] = True
        
        
    anomalies_summary = pd.DataFrame(anomalies_summary).T
    anomalies_summary.columns=['upper_bound', 'lower_bound', 'anomalies_count', 'anomalies_percentage']
    
    anomalies_ration = round(anomalies_summary['anomalies_percentage'].sum(), 2)
    
    return anomalies_summary, outliers_over, outliers_under

# Finds outliers/anomalies using IRQ 
# data: DataFrame
# col: str
# thresh: int
#     Number of IRQ to apply 
# Returns: Series 
#     Boolean Series Mask of outliers 
def is_anomaly_irq(data, col, thresh):

    IRQ = data[col].quantile(0.75) - data[col].quantile(0.25)
    upper_bound = data[col].quantile(0.75) + (thresh * IRQ)
    lower_bound = data[col].quantile(0.25) - (thresh * IRQ)
    anomalies_mask_over = data[col] > upper_bound
    anomalies_mask_under = data[col] < lower_bound
#     print("Anomalies mask: ", (anomalies_mask_over, anomalies_mask_under))
    
    return anomalies_mask_over, anomalies_mask_under, upper_bound, lower_bound

def find_peer_anomaly(df_slice, features, irq_threshold=1.8, outliers_desired=(True, True)):
    
    (want_outliers_over, want_outliers_under) = outliers_desired
   
    if (df_slice.shape[0] < 3):
        return
    
    idx = df_slice.index.unique()
    
    df_slice.reset_index(inplace=True)
    
    anomalies_summary_irq, outlier_over_irq, outlier_under_irq = get_feature_anomalies( \
                df_slice, \
                func=is_anomaly_irq, \
                features=features, \
                thresh=irq_threshold)
    
    median_cost = df_slice[RPT_COL_PUB_COST].median()
    
    
    # include over/under outliers as desired
    is_outlier_irq = np.logical_or(
                        np.logical_and(want_outliers_over, outlier_over_irq),
                        np.logical_and(want_outliers_under, outlier_under_irq)
    )
    
    



    # ignore anomaly from low spend adgroups (greater than campaign median)
    is_outlier_irq = np.logical_and(is_outlier_irq, df_slice[RPT_COL_PUB_COST] > median_cost)
    
    if sum(is_outlier_irq) > 0:
        print(">>> ANOMALY", idx)
        print(anomalies_summary_irq)
        cols = [RPT_COL_GROUP, RPT_COL_PUB_COST, RPT_COL_CONV, RPT_COL_REVENUE] + features
        print(df_slice.loc[is_outlier_irq, cols])
        
    return is_outlier_irq

## Find ROAS Anomalies

print("input shape:", df_group_perf.shape)
df_anomalies = pd.DataFrame()

# annotate via Marin Dimensions
def rowFunc(row):
    return 'ROAS {:,.2f} is much lower than campaign avg {:,.2f}'.format(
        row[RPT_COL_COST_PER_CONV], \
        row[RPT_COL_COST_PER_CONV + '_median']
    )

# dump data used for anomaly detection
print("df_group_perf\n\n", df_group_perf.to_string())





for campaign_idx in df_group_perf.index.unique():
    df_campaign = df_group_perf.loc[[campaign_idx]].copy()
    df_campaign[RPT_COL_COST_PER_CONV + '_median'] = df_campaign[RPT_COL_COST_PER_CONV].mean()
    df_campaign[BULK_COL_AUTOMATION_INFO] = np.nan
    outliers = find_peer_anomaly(df_campaign, [RPT_COL_COST_PER_CONV], irq_threshold=0.8, outliers_desired=(False,True))

    if outliers is not None and sum(outliers) > 0:
        df_outliers = df_campaign.loc[outliers].copy()
        df_outliers[BULK_COL_AUTOMATION_INFO] = df_outliers.apply(rowFunc, axis=1)
        print(df_outliers)
        df_anomalies = pd.concat([df_anomalies, df_outliers], axis=0)

## Prepare Output
if not df_anomalies.empty:
    print(tableize(df_anomalies))
    outputDf = df_anomalies[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, BULK_COL_AUTOMATION_INFO]]
else:
    print("No anomalies found!")
    outputDf = outputDf.iloc[0:0]

Post generated on 2024-11-27 06:58:46 GMT

07 Nov 2023

« Script 511: Auto Pause & Re enable Script 519: Epicor Budget Staging GSheets »

MarinOne Scripts Creator's Corner