Script 31: AdGroup ROAS Outlier Tagging

Purpose:

The Python script identifies and tags AdGroups with abnormally low ROAS performance within a campaign over a specified lookback period.

To Elaborate

The script is designed to identify AdGroups within advertising campaigns that exhibit unusually low Return on Advertising Spend (ROAS) performance. It analyzes historical data over a 30-day period, excluding the most recent three days to account for conversion delays. The script calculates various performance metrics, such as cost per conversion and conversion rate, and uses statistical methods to detect anomalies in ROAS. If an AdGroup’s ROAS is significantly lower than the campaign average, it is flagged as an outlier. This helps advertisers quickly identify underperforming AdGroups that may require attention or adjustment.

Walking Through the Code

Data Preparation
- The script begins by defining a lookback period of 30 days, excluding the most recent three days.
- It filters the input data to include only the relevant date range and necessary columns for analysis.
- The data is grouped by account, campaign, and AdGroup, summing up key metrics like cost, conversions, and revenue.
Feature Calculation
- Various performance metrics are calculated, including cost per conversion, ROAS, conversion rate, and average CPC.
- These metrics are used to assess the performance of each AdGroup within its campaign.
Anomaly Detection
- The script defines functions to detect anomalies using statistical methods like the Interquartile Range (IRQ).
- It identifies AdGroups with ROAS significantly lower than the campaign average, considering only those with spending above the campaign median.
Tagging Outliers
- For each campaign, the script checks for ROAS anomalies and tags the relevant AdGroups with a descriptive message.
- The results are compiled into a DataFrame, which is either printed or returned as output if anomalies are found.

Vitals

Script ID : 31
Client ID / Customer ID: 1306920543 / 60268855
Action Type: Bulk Upload
Item Changed: AdGroup
Output Columns: Account, Campaign, Group, AUTOMATION - INFO
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Michael Huang (mhuang@marinsoftware.com)
Created by Michael Huang on 2023-03-24 02:26
Last Updated by Kent Pearce on 2023-12-06 04:01

> See it in Action

Python Code

#
# Tag AdGroup if ROAS performance is abnormally low within Campaign
#
#
# Author: Michael S. Huang
# Date: 2023-03-24

RPT_COL_GROUP = 'Group'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_GROUP_ID = 'Group ID'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_COST_PER_CONV = 'Cost/Conv. $'
RPT_COL_ROAS = 'ROAS'
RPT_COL_CONV_RATE = 'Conv. Rate %'
RPT_COL_AVG_CPC = 'Avg. CPC $'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CONV = 'Conv.'
RPT_COL_REVENUE = 'Revenue $'
RPT_COL_IMPR = 'Impr.'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_AUTOMATION_INFO = 'AUTOMATION - INFO'

outputDf[BULK_COL_AUTOMATION_INFO] = numpy.nan

## Data Prep

print(inputDf[RPT_COL_DATE].min(), inputDf[RPT_COL_DATE].max())

# 30-day lookback without most recent 3 days due to conversion lag
start_date = pd.to_datetime(datetime.date.today() - datetime.timedelta(days=33))
end_date = pd.to_datetime(datetime.date.today() - datetime.timedelta(days=3))

df_reduced = inputDf[ (inputDf[RPT_COL_DATE] >= start_date) & (inputDf[RPT_COL_DATE] <= end_date) ]

if (df_reduced.shape[0] > 0):
    print("reduced dates\\n", min(df_reduced[RPT_COL_DATE]), max(df_reduced[RPT_COL_DATE]))
else:
    print("no more input to process")

# reduce to needed columns
df_reduced = df_reduced[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, RPT_COL_DATE, RPT_COL_PUB_COST, RPT_COL_CONV, RPT_COL_REVENUE, RPT_COL_CLICKS]].copy()

# sum metics across dates
df_group_perf = df_reduced.groupby([RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP]).sum()

# remove rows without cost 
df_group_perf = df_group_perf[(df_group_perf[RPT_COL_PUB_COST] > 0)]

# index by campaign
df_group_perf = df_group_perf.reset_index().set_index([RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN]).sort_index()

# calculate features
df_group_perf[RPT_COL_COST_PER_CONV] = (df_group_perf[RPT_COL_PUB_COST] / df_group_perf[RPT_COL_CONV])
df_group_perf[RPT_COL_ROAS] = df_group_perf[RPT_COL_REVENUE] / df_group_perf[RPT_COL_PUB_COST]
df_group_perf[RPT_COL_CONV_RATE] = df_group_perf[RPT_COL_CONV] / df_group_perf[RPT_COL_CLICKS]
df_group_perf[RPT_COL_AVG_CPC] = (df_group_perf[RPT_COL_PUB_COST] / df_group_perf[RPT_COL_CLICKS])


## Define Anomaly Fuctions

# Finds anomalies using a certain function (e.g. sigma rule, IRQ etc.)
# data: DataFrame
#     Dataset with features
# func: func
#     Function to use to find anomalies
# features: list
#     Feature list
# thresh: int
#     Threshold value (e.g. 2/3 * sigma, 2/3 * IRQ)
# Returns: tuple
def get_feature_anomalies(data, func, features=None, thresh=3):

    if features:
        features_to_check = features
    else:
        features_to_check = data.columns 
        
    outliers_over = pd.Series(data=[False] * data.shape[0], index=data[features_to_check].index, name='is_outlier')
    outliers_under = pd.Series(data=[False] * data.shape[0], index=data[features_to_check].index, name='is_outlier')

    anomalies_summary = {}
    for feature in features_to_check:
        anomalies_mask_over, anomalies_mask_under, upper_bound, lower_bound = func(data, feature, thresh=thresh)
        anomalies_mask_combined = pd.concat([anomalies_mask_over, anomalies_mask_under], axis=1).any(1)
        anomalies_summary[feature] = [upper_bound, lower_bound, sum(anomalies_mask_combined), 100*sum(anomalies_mask_combined)/len(anomalies_mask_combined)]
        outliers_over[anomalies_mask_over[anomalies_mask_over].index] = True
        outliers_under[anomalies_mask_under[anomalies_mask_under].index] = True
        
        
    anomalies_summary = pd.DataFrame(anomalies_summary).T
    anomalies_summary.columns=['upper_bound', 'lower_bound', 'anomalies_count', 'anomalies_percentage']
    
    anomalies_ration = round(anomalies_summary['anomalies_percentage'].sum(), 2)
    
    return anomalies_summary, outliers_over, outliers_under

# Finds outliers/anomalies using IRQ 
# data: DataFrame
# col: str
# thresh: int
#     Number of IRQ to apply 
# Returns: Series 
#     Boolean Series Mask of outliers 
def is_anomaly_irq(data, col, thresh):

    IRQ = data[col].quantile(0.75) - data[col].quantile(0.25)
    upper_bound = data[col].quantile(0.75) + (thresh * IRQ)
    lower_bound = data[col].quantile(0.25) - (thresh * IRQ)
    anomalies_mask_over = data[col] > upper_bound
    anomalies_mask_under = data[col] < lower_bound
#     print("Anomalies mask: ", (anomalies_mask_over, anomalies_mask_under))
    
    return anomalies_mask_over, anomalies_mask_under, upper_bound, lower_bound

def find_peer_anomaly(df_slice, features, irq_threshold=1.8, outliers_desired=(True, True)):
    
    (want_outliers_over, want_outliers_under) = outliers_desired
   
    if (df_slice.shape[0] < 3):
        return
    
    idx = df_slice.index.unique()
    
    df_slice.reset_index(inplace=True)
    
    anomalies_summary_irq, outlier_over_irq, outlier_under_irq = get_feature_anomalies( \
                df_slice, \
                func=is_anomaly_irq, \
                features=features, \
                thresh=irq_threshold)
    
    median_cost = df_slice[RPT_COL_PUB_COST].median()
    
    
    # include over/under outliers as desired
    is_outlier_irq = np.logical_or(
                        np.logical_and(want_outliers_over, outlier_over_irq),
                        np.logical_and(want_outliers_under, outlier_under_irq)
    )
    
    
    # ignore anomaly from low spend adgroups (greater than campaign median)
    is_outlier_irq = np.logical_and(is_outlier_irq, df_slice[RPT_COL_PUB_COST] > median_cost)
    
    if sum(is_outlier_irq) > 0:
        print(">>> ANOMALY", idx)
        print(anomalies_summary_irq)
        cols = [RPT_COL_GROUP, RPT_COL_PUB_COST, RPT_COL_CONV, RPT_COL_REVENUE] + features
        print(df_slice.loc[is_outlier_irq, cols])
        
    return is_outlier_irq

## Find ROAS Anomalies

print("input shape:", df_group_perf.shape)
df_anomalies = pd.DataFrame()

# annotate via Marin Dimensions
def rowFunc(row):
    return 'ROAS {:,.2f} is much lower than campaign avg {:,.2f}'.format(
        row[RPT_COL_ROAS], \
        row[RPT_COL_ROAS + '_median']
    )

# dump data used for anomaly detection
print("df_group_perf\n\n", df_group_perf.to_string())


for campaign_idx in df_group_perf.index.unique():
    df_campaign = df_group_perf.loc[[campaign_idx]].copy()
    df_campaign[RPT_COL_ROAS + '_median'] = df_campaign[RPT_COL_ROAS].mean()
    df_campaign[BULK_COL_AUTOMATION_INFO] = np.nan
    outliers = find_peer_anomaly(df_campaign, [RPT_COL_ROAS], irq_threshold=0.8, outliers_desired=(False,True))

    if outliers is not None and sum(outliers) > 0:
        df_outliers = df_campaign.loc[outliers].copy()
        df_outliers[BULK_COL_AUTOMATION_INFO] = df_outliers.apply(rowFunc, axis=1)
        print(df_outliers)
        df_anomalies = pd.concat([df_anomalies, df_outliers], axis=0)

## Prepare Output
if not df_anomalies.empty:
    print(tableize(df_anomalies))
    outputDf = df_anomalies[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, BULK_COL_AUTOMATION_INFO]]
else:
    print("No anomalies found!")
    outputDf = outputDf.iloc[0:0]

Post generated on 2025-03-11 01:25:51 GMT

24 Mar 2023

« Script 25: Strategy Auto Assign on MTD Conv Script 33: Campaign ROAS Outlier Tagging »

MarinOne Scripts Creator's Corner