Script 1285: Strategy Benchmarking

Purpose:

The Python script calculates and tags campaigns with strategy-level benchmark scores for metrics such as MTD Gross Lead, CPL, CPL Trend, and Interview Rate.

To Elaborate

The Python script named “Strategy Benchmarking” is designed to evaluate and tag marketing campaigns based on various performance metrics. It calculates strategy-level benchmark scores for metrics including Month-to-Date (MTD) Gross Lead, Cost Per Lead (CPL), CPL Trend, and Interview Rate. The script processes data from a specified source, applies business rules to calculate these benchmarks, and assigns performance scores to each campaign. These scores help in assessing the effectiveness of different strategies and provide insights into areas that may require optimization. The script is structured to run both locally and on a server, with specific configurations for data loading and processing.

Walking Through the Code

Initialization and Configuration:
- The script begins by setting up configurations for local execution, including paths for data files and necessary imports.
- It checks whether the code is running on a server or locally, and loads data accordingly using a pickle file if running locally.
Data Preparation:
- The script defines lookback windows to analyze data over specific periods, such as recent 7, 14, and 30 days, and calculates the number of days with data for each campaign.
- It groups data by campaign and calculates distributions to understand the completeness of data within these windows.
Benchmark Calculations:
- Functions are defined to calculate CPL, interview rates, and leads for specified lookback periods.
- The script aggregates CPL trends and interview benchmarks by strategy, applying custom aggregation functions to calculate performance scores.
Performance Scoring:
- Performance scores are assigned based on calculated benchmarks, using customer-defined scoring rules where lower scores indicate better performance.
- The script calculates CPL trend performance and interview rate performance by comparing recent data with historical benchmarks.
Output Generation:
- The script merges calculated benchmarks with the original campaign data to produce a final output dataset.
- Performance scores are formatted as strings to allow for blanks, and the output is prepared for further analysis or reporting.
Local Debugging:
- If running locally, the script writes the output and debug data to CSV files for inspection and validation.

Vitals

Script ID : 1285
Client ID / Customer ID: 1306926629 / 60270083
Action Type: Bulk Upload
Item Changed: Campaign
Output Columns: Account, Campaign, MTD Gross Lead, Gross Lead - Benchmark - High - Prorated, Gross Lead - Benchmark - Medium - Prorated, Gross Lead - Benchmark - Low - Prorated, MTD Gross Lead - Performance, 14 Day CPL - Benchmark, 30 Day CPL, 90 Day CPL - Benchmark, CPL Trend - Performance, CPL - Performance, Intv Rate (Trailing 7-37), Intv Rate - Performance, Expected Interviews, PPC Score, Campaign ID
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Michael Huang (mhuang@marinsoftware.com)
Created by Michael Huang on 2024-07-22 04:27
Last Updated by ascott@marinsoftware.com on 2024-09-26 18:45

> See it in Action

Python Code

##
## name: Strategy Benchmarking
## description:
##  Tag campaigns with calculated Strategy-level Benchmark Scores for:
##  
##  - MTD Gross Lead
##  - CPL 
##  - CPL Trend
##  - Interview Rate
## 
## author: Michael Huang
## created: 2024-08-12
## 


########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/allcampus_strategy_benchmarking_20240815_all_strategies.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
    # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed

    # pandas settings
    pd.set_option('display.max_columns', None)  # Display all columns
    pd.set_option('display.max_colwidth', None)  # Display full content of each column

else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

today = datetime.datetime.now(CLIENT_TIMEZONE).date()


# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_GROSS_LEAD_CONV = 'Gross Lead Conv.'
RPT_COL_INTERVIEW_CONV = 'Interview Conv.'
RPT_COL_APPLICATION_IN_PROGRESS_CONV = 'Application In Progress Conv.'
RPT_COL_CPL__BENCHMARK = 'CPL - Benchmark'
RPT_COL_GROSS_LEAD__BENCHMARK__HIGH = 'Gross Lead - Benchmark - High'
RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM = 'Gross Lead - Benchmark - Medium'
RPT_COL_GROSS_LEAD__BENCHMARK__LOW = 'Gross Lead - Benchmark - Low'
RPT_COL_INTV_RATE__BENCHMARK = 'Intv Rate - Benchmark'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_14_DAY_CPL__BENCHMARK = '14 Day CPL - Benchmark'
BULK_COL_30_DAY_CPL = '30 Day CPL'
BULK_COL_90_DAY_CPL__BENCHMARK = '90 Day CPL - Benchmark'
BULK_COL_CPL__PERFORMANCE = 'CPL - Performance'
BULK_COL_CPL_TREND__PERFORMANCE = 'CPL Trend - Performance'
BULK_COL_EXPECTED_INTERVIEWS = 'Expected Interviews'
BULK_COL_GROSS_LEAD__BENCHMARK__HIGH__PRORATED = 'Gross Lead - Benchmark - High - Prorated'
BULK_COL_GROSS_LEAD__BENCHMARK__LOW__PRORATED = 'Gross Lead - Benchmark - Low - Prorated'
BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM__PRORATED = 'Gross Lead - Benchmark - Medium - Prorated'
BULK_COL_INTV_RATE_TRAILING_737 = 'Intv Rate (Trailing 7-37)'
BULK_COL_INTV_RATE__PERFORMANCE = 'Intv Rate - Performance'
BULK_COL_MTD_GROSS_LEAD = 'MTD Gross Lead'
BULK_COL_MTD_GROSS_LEAD__PERFORMANCE = 'MTD Gross Lead - Performance'
BULK_COL_PPC_SCORE = 'PPC Score'
outputDf[BULK_COL_14_DAY_CPL__BENCHMARK] = "<<YOUR VALUE>>"
outputDf[BULK_COL_30_DAY_CPL] = "<<YOUR VALUE>>"
outputDf[BULK_COL_90_DAY_CPL__BENCHMARK] = "<<YOUR VALUE>>"
outputDf[BULK_COL_CPL__PERFORMANCE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_CPL_TREND__PERFORMANCE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_EXPECTED_INTERVIEWS] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__HIGH__PRORATED] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__LOW__PRORATED] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM__PRORATED] = "<<YOUR VALUE>>"
outputDf[BULK_COL_INTV_RATE_TRAILING_737] = "<<YOUR VALUE>>"
outputDf[BULK_COL_INTV_RATE__PERFORMANCE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_MTD_GROSS_LEAD] = "<<YOUR VALUE>>"
outputDf[BULK_COL_MTD_GROSS_LEAD__PERFORMANCE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_PPC_SCORE] = "<<YOUR VALUE>>"

# use Gross Lead for CPL calc
COL_CONV = RPT_COL_GROSS_LEAD_CONV
# recent leads for calculating expected interviews
COL_7_DAY_LEADS = "7 Day Leads"
# Prorated ratio
COL_PRORATED_RATIO = "Prorated Ratio"

## user code starts here

print("inputDf.shape", inputDf.shape)
print("inputDf.dtypes\n", inputDf.dtypes)
print("inputDf sample\n", inputDf.head())

# in order to correctly run old reports, need to pretend today is the day after the latest date in report
today_pd = inputDf[RPT_COL_DATE].max() + pd.Timedelta(days=1)
print(f"looking at report, inferred today = {today_pd.date()}")

# define lookback start dates
lookback_7_start = today_pd - pd.Timedelta(days=7)
lookback_14_start = today_pd - pd.Timedelta(days=14)
lookback_30_start = today_pd - pd.Timedelta(days=30)
lookback_37_start = today_pd - pd.Timedelta(days=37)
lookback_104_start = today_pd - pd.Timedelta(days=104)
# Define the start and end of the current month
current_month_start = today_pd.replace(day=1)
next_month_start = current_month_start + pd.DateOffset(months=1)


# Define lookback windows with their respective start and end dates
# start date is inclusive, but end date is EXCLUSIVE
lookback_windows = {
    'recent_7_days': (lookback_7_start, today_pd),
    'recent_14_days': (lookback_14_start, today_pd),
    'recent_30_days': (lookback_30_start, today_pd),
    '30_days_prior_to_recent_7_days': (lookback_37_start, lookback_7_start),
    '90_days_prior_to_recent_14_days': (lookback_104_start, lookback_14_start),
    'current_month': (current_month_start, next_month_start),
}

# Print out lookback windows for debugging
for window_name, (start_date, end_date) in lookback_windows.items():
    print(f"Lookback Window: {window_name}, Start Date (inclusive): {start_date.date()}, End Date (exclusive): {end_date.date()}")

## get idea of how many campaigns satisfy the various lookback windows

# Calculate the number of days with data for each campaign
def count_trafficking_days_in_date_range(df, start_date, end_date):
    period_data = df[(df[RPT_COL_DATE] >= start_date) & (df[RPT_COL_DATE] < end_date)]
    min_date = period_data[RPT_COL_DATE].min()
    max_date = period_data[RPT_COL_DATE].max()
    num_days = (max_date - min_date).days + 1
    return num_days

# Define a function to count the number of days with data for each lookback window
def count_days_with_data_for_windows(df):
    return pd.Series({
        window: count_trafficking_days_in_date_range(df, start_date, end_date)
        for window, (start_date, end_date) in lookback_windows.items()
    })

# Group by Campaign and count the number of days with data
days_with_data = inputDf.groupby([RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN]).apply(count_days_with_data_for_windows).reset_index()

# Calculate the distribution of campaigns with data for each lookback window
distributions = {window: days_with_data[window].value_counts(normalize=True).sort_index() for window in lookback_windows}

# Print the percentage of campaigns with full data for each lookback window
for window, distribution in distributions.items():
    if not distribution.empty:
        last_value = distribution.index[-1]
        last_percentage = distribution.iloc[-1]
        print(f"Data in Lookback Window: {window}, Actual Days: {last_value}, Campaigns with full data: {last_percentage:.2%}")

### Common Functions

# Calculate CPL for a given lookback window
def calculate_cpl(df, start_date, end_date):
    data = df[(df[RPT_COL_DATE] >= start_date) & (df[RPT_COL_DATE] < end_date)]
    total_cost = data[RPT_COL_PUB_COST].sum()
    total_conv = data[COL_CONV].sum()
    cpl = total_cost / total_conv if total_conv != 0 else float(total_cost)

    # print(f"CPL from {start_date.date()} to {end_date.date()}: {total_cost} / {total_conv} => {cpl}")

    return np.round(cpl, 2)

# Calculate interview rate for a given lookback window
def calculate_interview_rate(df, start_date, end_date):   
    lookback = df[(df[RPT_COL_DATE] >= start_date) & (df[RPT_COL_DATE] < end_date)]
    total_leads = lookback[RPT_COL_GROSS_LEAD_CONV].sum()
    total_interviews = lookback[RPT_COL_INTERVIEW_CONV].sum()
    rate = total_interviews / total_leads if total_leads != 0 else np.nan
    return np.round(rate, 2)

# Calculate leads for a given lookback window
def calculate_leads(df, start_date, end_date):   
    lookback = df[(df[RPT_COL_DATE] >= start_date) & (df[RPT_COL_DATE] < end_date)]
    total_leads = lookback[RPT_COL_GROSS_LEAD_CONV].sum()
    return total_leads.astype(int)

def calculate_prorated_ratio(today, start_date, end_date):
    """
    Calculate the prorated ratio of the period from start_date to end_date that has elapsed as of today.

    Args:
    today (pd.Timestamp): The current date.
    start_date (pd.Timestamp): The start date of the period.
    end_date (pd.Timestamp): The end date of the period.

    Returns:
    float: The prorated ratio of the elapsed period.
    """
    if today < start_date:
        return 0.0
    if today >= end_date:
        return 1.0

    total_days = (end_date - start_date).days
    elapsed_days = (today - start_date).days

    proration = elapsed_days / total_days if total_days != 0 else 0.0

    return np.round(proration, 4)


### CPL Benchmarks

def aggregate_cpl_trends(x):
    recent_14_days_start, recent_14_days_end = lookback_windows['recent_14_days']
    recent_30_days_start, recent_30_days_end = lookback_windows['recent_30_days']
    prior_90_days_start, prior_90_days_end = lookback_windows['90_days_prior_to_recent_14_days']
    
    return pd.Series({
        BULK_COL_14_DAY_CPL__BENCHMARK: calculate_cpl(x, recent_14_days_start, recent_14_days_end),
        BULK_COL_30_DAY_CPL: calculate_cpl(x, recent_30_days_start, recent_30_days_end),
        BULK_COL_90_DAY_CPL__BENCHMARK: calculate_cpl(x, prior_90_days_start, prior_90_days_end),
        RPT_COL_CPL__BENCHMARK: x[RPT_COL_CPL__BENCHMARK].iloc[-1]  # Take the last value of CPL__BENCHMARK
    })

# Group by Campaign and apply custom aggregation functions
strategy_cpl_trends = inputDf.groupby([RPT_COL_STRATEGY]).apply(aggregate_cpl_trends).reset_index()

print("strategy_cpl_trends.dtypes\n", strategy_cpl_trends.dtypes)

def assign_cpl_performance(cpl_recent, cpl_baseline):
    if cpl_recent == 0 or (cpl_baseline == 0 or cpl_baseline == ""):
        return 7
    
    performance_ratio = (cpl_recent - cpl_baseline) / cpl_baseline * 100.0
    
    score = calc_cpl_performance_score(performance_ratio)

    return score

# customer-defined performance score; lower is better
def calc_cpl_performance_score(cpl_performance):
    score = 7
    if cpl_performance < -50:
        score = 1
    elif -50 <= cpl_performance < -25:
        score = 2
    elif -25 <= cpl_performance < -7:
        score = 3
    elif -7 <= cpl_performance <= 7:
        score = 4
    elif 7 < cpl_performance <= 33:
        score = 5
    elif 33 < cpl_performance <= 100:
        score = 6
    elif cpl_performance > 100:
        score = 7
    return int(score) if not np.isnan(score) else score


# CPL Trend Benchmark Performance compares recent 14-day CPL with 90-day period before that
def calculate_cpl_trend_performance(row):
    return assign_cpl_performance(row[BULK_COL_14_DAY_CPL__BENCHMARK], row[BULK_COL_90_DAY_CPL__BENCHMARK])

strategy_cpl_trends[BULK_COL_CPL_TREND__PERFORMANCE] = strategy_cpl_trends.apply(calculate_cpl_trend_performance, axis=1)

# CPL Benchmark Performance compares recent 30-day CPL with customer-defined Benchmark
def calculate_cpl_performance(row):
    return assign_cpl_performance(row[BULK_COL_30_DAY_CPL], row[RPT_COL_CPL__BENCHMARK])

strategy_cpl_trends[BULK_COL_CPL__PERFORMANCE] = strategy_cpl_trends.apply(calculate_cpl_performance, axis=1)

### Interview Rate Benchmarks

def calculate_strategy_interview_benchmarks(group):
    lookback_30_days_start, lookback_30_days_end = lookback_windows['30_days_prior_to_recent_7_days']
    lookback_7_days_start, lookback_7_days_end = lookback_windows['recent_7_days']
    
    return pd.Series({
        BULK_COL_INTV_RATE_TRAILING_737: calculate_interview_rate(group, lookback_30_days_start, lookback_30_days_end),
        COL_7_DAY_LEADS: calculate_leads(group, lookback_7_days_start, lookback_7_days_end),
        RPT_COL_INTV_RATE__BENCHMARK: group[RPT_COL_INTV_RATE__BENCHMARK].iloc[-1]  # Take the last value of INTV_RATE__BENCHMARK
    })

strategy_interview_benchmarks = inputDf.groupby([RPT_COL_STRATEGY]).apply(calculate_strategy_interview_benchmarks).reset_index()

strategy_interview_benchmarks[BULK_COL_EXPECTED_INTERVIEWS] = np.round(strategy_interview_benchmarks[BULK_COL_INTV_RATE_TRAILING_737] * strategy_interview_benchmarks[COL_7_DAY_LEADS], 2)

print("strategy_interview_benchmarks.dtypes\n", strategy_interview_benchmarks.dtypes)

def assign_interview_rate_performance(intv_rate_recent, intv_rate_baseline):
    if intv_rate_recent == 0 or intv_rate_baseline == 0:
        return 7
    
    performance = (intv_rate_recent - intv_rate_baseline) / intv_rate_baseline * 100.0
    
    score = calc_interview_rate_performance_score(performance) 

    return score

# customer-defined performance score; lower is better
def calc_interview_rate_performance_score(intv_rate):
    if intv_rate < -50:
        score = 7
    elif -50 <= intv_rate < -25:
        score = 6
    elif -25 <= intv_rate < -7:
        score = 5
    elif -7 <= intv_rate <= 7:
        score = 4
    elif 7 < intv_rate <= 33:
        score = 3
    elif 33 < intv_rate <= 100:
        score = 2
    elif intv_rate > 100:
        score = 1
    else:
        score = 7
    return int(score) if not np.isnan(score) else score


def calculate_interview_rate_performance(row):
    return assign_interview_rate_performance(row[BULK_COL_INTV_RATE_TRAILING_737], row[RPT_COL_INTV_RATE__BENCHMARK])

strategy_interview_benchmarks[BULK_COL_INTV_RATE__PERFORMANCE] = strategy_interview_benchmarks.apply(
    calculate_interview_rate_performance,
    axis=1)

### Gross Leads Benchmarks

def calculate_leads_benchmarks(x):
    month_start, month_end = lookback_windows['current_month']
    return pd.Series({
        BULK_COL_MTD_GROSS_LEAD: calculate_leads(x, month_start, month_end),
        COL_PRORATED_RATIO: calculate_prorated_ratio(today_pd, month_start, month_end),
        RPT_COL_GROSS_LEAD__BENCHMARK__HIGH: x[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH].iloc[-1],
        RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM: x[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM].iloc[-1],
        RPT_COL_GROSS_LEAD__BENCHMARK__LOW: x[RPT_COL_GROSS_LEAD__BENCHMARK__LOW].iloc[-1],
    })

strategy_leads_benchmarks = inputDf.groupby([RPT_COL_STRATEGY]).apply(calculate_leads_benchmarks).reset_index()

# calc prorated benchmarks
strategy_leads_benchmarks[BULK_COL_GROSS_LEAD__BENCHMARK__HIGH__PRORATED] = np.round(strategy_leads_benchmarks[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH] *  strategy_leads_benchmarks[COL_PRORATED_RATIO], 2)
strategy_leads_benchmarks[BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM__PRORATED] = np.round(strategy_leads_benchmarks[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM] *  strategy_leads_benchmarks[COL_PRORATED_RATIO], 2)
strategy_leads_benchmarks[BULK_COL_GROSS_LEAD__BENCHMARK__LOW__PRORATED] = np.round(strategy_leads_benchmarks[RPT_COL_GROSS_LEAD__BENCHMARK__LOW] *  strategy_leads_benchmarks[COL_PRORATED_RATIO], 2)

print("strategy_leads_benchmarks.dtypes\n", strategy_leads_benchmarks.dtypes)

def calc_leads_performance_score(mtd_gross_lead, high_prorated, medium_prorated, low_prorated):
    if mtd_gross_lead > high_prorated:
        score = 1
    elif high_prorated >= mtd_gross_lead > medium_prorated:
        score = 3
    elif medium_prorated >= mtd_gross_lead > low_prorated:
        score = 5
    elif (mtd_gross_lead <= low_prorated) or (mtd_gross_lead == 0):
        score = 7
    else:
        score = 7
    return int(score) if not np.isnan(score) else score

def calculate_performance(row):
    return calc_leads_performance_score(
        row[BULK_COL_MTD_GROSS_LEAD],
        row[BULK_COL_GROSS_LEAD__BENCHMARK__HIGH__PRORATED],
        row[BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM__PRORATED],
        row[BULK_COL_GROSS_LEAD__BENCHMARK__LOW__PRORATED]
    )

strategy_leads_benchmarks[BULK_COL_MTD_GROSS_LEAD__PERFORMANCE] = strategy_leads_benchmarks.apply(
    calculate_performance,
    axis=1
)

### Build output

# Combined Strategy Benchmarks
strategy_benchmarks = strategy_cpl_trends.merge(strategy_interview_benchmarks, on=RPT_COL_STRATEGY, how='outer') \
                                         .merge(strategy_leads_benchmarks, on=RPT_COL_STRATEGY, how='outer')

# Calculate PPC Score by summing the relevant performance columns
strategy_benchmarks[BULK_COL_PPC_SCORE] = strategy_benchmarks[
    [
        BULK_COL_MTD_GROSS_LEAD__PERFORMANCE,
        BULK_COL_CPL__PERFORMANCE,
        BULK_COL_CPL_TREND__PERFORMANCE,
        BULK_COL_INTV_RATE__PERFORMANCE
    ]
].sum(axis=1)

# force Performance scores to be string, so blanks are allowed; scores are integers without decimal
def format_performance_score(x):
    return str(int(x)) if x != '' else ''

performance_columns = [
    BULK_COL_MTD_GROSS_LEAD__PERFORMANCE,
    BULK_COL_CPL__PERFORMANCE,
    BULK_COL_CPL_TREND__PERFORMANCE,
    BULK_COL_INTV_RATE__PERFORMANCE,
    BULK_COL_PPC_SCORE
]

for col in performance_columns:
    strategy_benchmarks[col] = strategy_benchmarks[col].fillna('').apply(format_performance_score)

debugDf = strategy_benchmarks

# Join inputDf and strategy_benchmarks via Strategy column
unique_campaigns_df = inputDf[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_CAMPAIGN_ID, RPT_COL_STRATEGY]].drop_duplicates()
mergedDf = unique_campaigns_df.merge(strategy_benchmarks, on=RPT_COL_STRATEGY, how='left')
                              
print("mergedDf.dtypes\n", mergedDf.dtypes)

output_cols = [
    RPT_COL_ACCOUNT, 
    RPT_COL_CAMPAIGN,
    RPT_COL_CAMPAIGN_ID,
    BULK_COL_MTD_GROSS_LEAD,
    BULK_COL_GROSS_LEAD__BENCHMARK__HIGH__PRORATED,
    BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM__PRORATED,
    BULK_COL_GROSS_LEAD__BENCHMARK__LOW__PRORATED,
    BULK_COL_MTD_GROSS_LEAD__PERFORMANCE,
    BULK_COL_14_DAY_CPL__BENCHMARK, 
    BULK_COL_30_DAY_CPL, 
    BULK_COL_90_DAY_CPL__BENCHMARK, 
    BULK_COL_CPL_TREND__PERFORMANCE,
    BULK_COL_CPL__PERFORMANCE,
    BULK_COL_INTV_RATE_TRAILING_737,
    BULK_COL_INTV_RATE__PERFORMANCE,
    BULK_COL_EXPECTED_INTERVIEWS,
    BULK_COL_PPC_SCORE,
]

outputDf = mergedDf.loc[:, output_cols].copy()

print("outputDf.shape", outputDf.shape)
print("outputDf.dtypes\n", outputDf.dtypes)

### local debug

if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2025-03-11 01:25:51 GMT

22 Jul 2024

« Script 1281: [Script] Weekly Campaign Anomaly Script 1287: Weekly Campaign Anomaly Detector »

MarinOne Scripts Creator's Corner