Script 1061: SBA Campaign Pacing

Purpose

SBA Campaign Budget Pacing - Minimize Lost IS (Budget)

To Elaborate

The Python script solves the problem of allocating budgets to different campaign strategies in order to minimize lost impression share (budget). It takes into account factors such as remaining budget for each strategy group, remaining weekdays in the month, historical spend and spend potential, campaigns with spend in the lookback period, and a minimum daily budget.

Walking Through the Code

The script starts by defining a configurable parameter for the minimum daily budget.
It then checks if the code is running on a server or locally and sets up the necessary environment accordingly.
The script loads the input data from a pickled file and sets the output dataframe as a copy of the input dataframe.
It imports the required libraries and defines column constants.
The script performs some data preprocessing, such as converting date columns to the appropriate data type and rounding allocation values.
It calculates the full-potential spend by adjusting the historical spend based on the lost impression share due to budget.
The script calculates the MTD spend by summing the spend for each campaign within the current month.
It filters out inactive or expired campaigns and calculates the budget allocation ratio for each campaign based on the full-potential spend and total spend within the same strategy budget group.
The script calculates the remaining budget for each strategy budget group by subtracting the MTD spend from the SBA monthly budget.
It allocates the remaining budget to each campaign according to the allocation ratio.
The script calculates the next day’s daily budget by dividing the allocated budget by the number of business days left in the current month.
It applies a minimum rule to bump up the allocated budget above the minimum daily budget if necessary.
The script handles traffic budgets separately by copying over the recommended daily budget and then blanking out the budget for non-traffic campaigns.
It calculates the SBA strategy-level pacing compliance percentage by dividing the MTD spend by the prorated budget target.
The script generates the output dataframe by selecting the changed campaigns and renaming the columns.
Finally, it outputs the result to a CSV file if running in local development mode.

Vitals

Script ID : 1061
Client ID / Customer ID: 1306927889 / 60270375
Action Type: Bulk Upload
Item Changed: Campaign
Output Columns: Account, Campaign, Daily Budget, Rec. Daily Budget, SBA Allocation, SBA Budget Pacing
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Grégory Pantaine (gpantaine@marinsoftware.com)
Created by Grégory Pantaine on 2024-05-10 13:23
Last Updated by Michael Huang on 2024-05-15 03:47

> See it in Action

Python Code

#
# SBA Campaign Budget Pacing - Minimize Lost IS (Budget)
#
# Allocates according to:
# * Remaining budget for each Strategy Group
# * Remaining weekdays in month
# * Historical spend and spend potential
# * Campaigns with spend in lookback period
# * Minimum daily budget
#
# Author: Michael S. Huang, Adam Scott
#
# Created: 2023-09-30
# Updated: 2024-02-10
#

##### Configurable Param #####

MINIMUM_DAILY_BUDGET = 1

##############################


########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/gatorworks_sba_budget_pacing_20240515.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
    # LA Timezone is GMT-8. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-8))

    # import pandas
    import pandas as pd
    import numpy as np

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########


RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CONV = 'Conv.'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_LOST_IMPR_SHARE_BUDGET = 'Lost Impr. Share (Budget) %'
RPT_COL_LOST_IMPR_SHARE_RANK = 'Lost Impr. Share (Rank) %'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_SBA_CAMPAIGN_BUDGET = 'SBA Campaign Budget'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_REC_DAILY_BUDGET = 'Rec. Daily Budget'
RPT_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
RPT_COL_SBA_TRAFFIC = 'SBA Traffic'
RPT_COL_PROGRAM_END_Date = 'Program End Date'

BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_DAILY_BUDGET = 'Daily Budget'
BULK_COL_SBA_ALLOCATION = 'SBA Allocation'
BULK_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
BULK_COL_REC_DAILY_BUDGET = 'Rec. Daily Budget'

COL_SPEND_FULL_POTENTIAL = 'spend_lookback_full_potential'
COL_SPEND_FULL_POTENTIAL_CAPPED = 'spend_lookback_full_potential_capped'
COL_SPEND_MTD = 'spend_mtd'
COL_SBA_ALLOCATION_NEW_FLOAT = RPT_COL_SBA_ALLOCATION + '_new_float'
COL_SBA_ALLOCATION_NEW = RPT_COL_SBA_ALLOCATION + '_new'
COL_SBA_STRATEGY_BUDGET_REMAINING = 'SBA_Campaign_budget_remaining'
COL_SBA_BUDGET_PACING_NEW = RPT_COL_SBA_BUDGET_PACING + '_new'
COL_BUDGET_REMAINING = 'budget_remaining'
COL_DAILY_BUDGET_NEW = RPT_COL_DAILY_BUDGET + '_new'
COL_REC_DAILY_BUDGET_NEW = RPT_COL_REC_DAILY_BUDGET + '_new'
COL_DAYS_REMAINING= 'weekdays_remaining'
COL_DAYS_TOTAL= 'weekdays_total'
COL_PACING_CALC = 'pacing_calc'

outputDf[BULK_COL_DAILY_BUDGET] = "<<YOUR VALUE>>"

today = datetime.datetime.now(CLIENT_TIMEZONE).date()
print("inputDf shape", inputDf.shape)
print("inputDf dtypes", inputDf.dtypes)


### User Code Starts Here

### DEBUGGING. Comment out in PROD run
# set arbitrary date
# today = datetime.date(2024, 5, 1)

# change back to percent string
if inputDf[RPT_COL_SBA_ALLOCATION].dtype == "float":
    inputDf[RPT_COL_SBA_ALLOCATION] = round(inputDf[RPT_COL_SBA_ALLOCATION] * 100.0, 0).astype(str) + '%'
if inputDf[BULK_COL_SBA_BUDGET_PACING].dtype == "float":
    inputDf[BULK_COL_SBA_BUDGET_PACING] = round(inputDf[BULK_COL_SBA_BUDGET_PACING] * 100.0, 0).astype(str) + '%'

# coerce Program End Date into Date type
inputDf[RPT_COL_PROGRAM_END_Date] = pd.to_datetime(inputDf[RPT_COL_PROGRAM_END_Date], errors='coerce')

inputDf = inputDf.set_index([RPT_COL_SBA_STRATEGY])
group_by_strategy = inputDf.groupby(RPT_COL_SBA_STRATEGY)

# ## Calculate Full-Potential Spend
# * Adjust Historical Spend by _Lost Impression Share due to Budget
# 

adj_ratio = 1 + (inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET] / (1 - inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET]))

inputDf[COL_SPEND_FULL_POTENTIAL] = round(inputDf[RPT_COL_PUB_COST] * adj_ratio, 2)

# ## Remove Date Segmentation
# * Calculate MTD Spend

# SUM Series with Date index and only includes current month
def current_month_sum(x):
    x = x.sort_index()
    mtd = x[ (x.index.month == today.month) & (x.values > 0)]
    return mtd.sum()

groupby_cols = [ \
    RPT_COL_SBA_STRATEGY, \
    RPT_COL_STRATEGY, \
    RPT_COL_PUBLISHER_NAME, \
    RPT_COL_ACCOUNT, \
    RPT_COL_CAMPAIGN, \
]


agg_spec = {
    RPT_COL_CAMPAIGN_STATUS: 'last', \
    RPT_COL_DAILY_BUDGET: 'last', \
    RPT_COL_SBA_CAMPAIGN_BUDGET: 'last', \
    RPT_COL_SBA_ALLOCATION: 'last', \
    RPT_COL_REC_DAILY_BUDGET: 'last', \
    RPT_COL_SBA_BUDGET_PACING: 'last', \
    RPT_COL_SBA_TRAFFIC: 'last', \
    RPT_COL_CLICKS: 'sum', \
    RPT_COL_CONV: 'sum', \
    RPT_COL_PUB_COST: 'sum', \
    COL_SPEND_MTD: current_month_sum, \
    COL_SPEND_FULL_POTENTIAL: 'sum', \
    RPT_COL_PROGRAM_END_Date: 'last', \
}

inputDf[COL_SPEND_MTD] = inputDf[RPT_COL_PUB_COST]

df_campaign_agg = inputDf.reset_index() \
                         .set_index(RPT_COL_DATE) \
                         .groupby(groupby_cols) \
                         .agg(agg_spec) \
                         .reset_index() \
                         .set_index(RPT_COL_SBA_STRATEGY)


# ## Only allocate budget for recently trafficking campaigns
# * Exclude Campaigns that are:
# ** not ACTIVE 
# ** without spend in lookback period
# ** Program Date is in the past

inactive_campaigns = (df_campaign_agg[RPT_COL_CAMPAIGN_STATUS] != 'Active') & (df_campaign_agg[RPT_COL_PUB_COST] == 0)
expired_campaigns = df_campaign_agg[RPT_COL_PROGRAM_END_Date].notnull() & (df_campaign_agg[RPT_COL_PROGRAM_END_Date] < pd.to_datetime(today))
df_campaign_agg = df_campaign_agg.loc[ ~(inactive_campaigns | expired_campaigns) ]


# ## Calculate Budget Allocation Ratio 
# * Cap full potential spend at 2X (don't spend twice as much as before)
# * Compare full potential spend for each campaign to total spend within same Strategy budget group

df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] = df_campaign_agg \
    .apply(lambda row: min(row[COL_SPEND_FULL_POTENTIAL], 2 * row[RPT_COL_PUB_COST]), axis=1)


# use transform to calculate sum for each Stratgey and make it available on every row
# note: no need to build aggregate DataFrame and JOIN back to original

df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] = 100.0 * \
        df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] / \
        df_campaign_agg.groupby(RPT_COL_SBA_STRATEGY)[COL_SPEND_FULL_POTENTIAL_CAPPED].transform('sum')

df_campaign_agg[COL_SBA_ALLOCATION_NEW] = round(df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT],0).astype(str) + '%'


# 
# ## Calculate Remaining Budget
# * For each Straetgy budget group, calculate how much Budget is left by substracting SBA Monthly budget from MTD Strategy spend

df_campaign_agg[COL_SBA_STRATEGY_BUDGET_REMAINING] =  \
        df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET] - \
        df_campaign_agg.groupby(by=[RPT_COL_SBA_STRATEGY])[COL_SPEND_MTD].sum()


# ## Allocate Budget
# * Allocate remaining budget to each campaign according to ratio calculated above

df_campaign_agg[COL_BUDGET_REMAINING] = round(df_campaign_agg[COL_SBA_STRATEGY_BUDGET_REMAINING] * df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] / 100.0, 1)


# ## Calculate SBA Daily Budget
# 
# * Calcualte next day Daily Budget by dividing allocated budget by number of Business Days left in the current month


today_numpy = pd.to_datetime(today).to_numpy().astype('datetime64[D]')
next_month_start = (today_numpy + pd.offsets.MonthBegin()).to_numpy().astype('datetime64[D]')

# assumes this Script runs in the morning, and today is included in days_left
# use max() to make sure there is at least 1 day left
days_left = max(1, (next_month_start - today_numpy).astype('timedelta64[D]').astype(int))

df_campaign_agg[COL_DAYS_REMAINING] = days_left
df_campaign_agg[COL_REC_DAILY_BUDGET_NEW] = round(df_campaign_agg[COL_BUDGET_REMAINING] / days_left, 0)

# ### Apply Minimum Rule
# * Bump allocated budget above minimum

allocated_below_min = (df_campaign_agg[COL_REC_DAILY_BUDGET_NEW] < MINIMUM_DAILY_BUDGET)
df_campaign_agg.loc[allocated_below_min, COL_REC_DAILY_BUDGET_NEW] = MINIMUM_DAILY_BUDGET


# ### Traffic Budget
df_campaign_agg[COL_DAILY_BUDGET_NEW] = np.nan

# campaigns to traffic
to_traffic = df_campaign_agg[RPT_COL_SBA_TRAFFIC].notnull() & \
            (df_campaign_agg[RPT_COL_SBA_TRAFFIC].astype(str).str.lower() == 'traffic')
print("Count of campaigns enabled for Traffic: ", to_traffic.sum())

# copy budgets over
df_campaign_agg[COL_DAILY_BUDGET_NEW] = df_campaign_agg[COL_REC_DAILY_BUDGET_NEW]
# then blank out budget for non-traffic campaigns
df_campaign_agg.loc[~to_traffic, COL_DAILY_BUDGET_NEW] = np.nan


### Calculate SBA Strategy-level Pacing Compliance percentage. 

# Use remaining days in current month to prorate budget target. Ideally should be 100% each day.

current_month_start = pd.to_datetime(today.replace(day=1)).to_numpy().astype('datetime64[D]')
total_days_in_month = (next_month_start - current_month_start).astype('timedelta64[D]').astype(int)
df_campaign_agg[COL_DAYS_TOTAL] = total_days_in_month
# could be 0 on first day of month
prorated_ratio = (total_days_in_month - days_left) / total_days_in_month

print("today: ", today)
print("current_month_start: ", current_month_start)
print("next_month_start: ", next_month_start)
print("total_days_in_month: ", total_days_in_month)
print("days_left: ", days_left)
print("prorated_ratio: ", prorated_ratio)

# divide SBA Strategy-level MTD spend by prorated budget target
mask = df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET] > 0
if (total_days_in_month == days_left):
    # pacing is zero on first day of month
    df_campaign_agg[COL_PACING_CALC] = '0%'
else:
    df_campaign_agg[COL_PACING_CALC] = round(100.0 * \
                                        df_campaign_agg.groupby(by=[RPT_COL_SBA_STRATEGY])[COL_SPEND_MTD].sum() / \
                                        (prorated_ratio * df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET]), \
                                        0).astype(str) + '%'
    
df_campaign_agg.loc[mask, COL_SBA_BUDGET_PACING_NEW] = df_campaign_agg.loc[mask, COL_PACING_CALC]



# Print a sample of strategy budgets
df_strategy_budget = group_by_strategy[[RPT_COL_SBA_CAMPAIGN_BUDGET]].transform('max').dropna().drop_duplicates()
print("Sample of Strategy budgets", df_strategy_budget.head(10).to_string())


# ## Generate outputDf

# Check for changes
changed = df_campaign_agg[COL_REC_DAILY_BUDGET_NEW].notnull() & \
    ( \
       (df_campaign_agg[RPT_COL_REC_DAILY_BUDGET] != df_campaign_agg[COL_REC_DAILY_BUDGET_NEW]) | \
       (df_campaign_agg[RPT_COL_DAILY_BUDGET] != df_campaign_agg[COL_DAILY_BUDGET_NEW]) | \
       (df_campaign_agg[RPT_COL_SBA_ALLOCATION] != df_campaign_agg[COL_SBA_ALLOCATION_NEW]) | \
       (df_campaign_agg[RPT_COL_SBA_BUDGET_PACING] != df_campaign_agg[COL_SBA_BUDGET_PACING_NEW]) \
    )

print("Changed rows:", changed.sum())

# Debug
debugDf = df_campaign_agg.loc[changed] \
      .reset_index() \
      .sort_values(by=[RPT_COL_SBA_STRATEGY, COL_DAILY_BUDGET_NEW, COL_REC_DAILY_BUDGET_NEW], ascending=False) 

# print("debugDf", tableize(debugDf))

# Only emit output for changed campaigns
if changed.sum() > 0:

    # construct outputDf
    outputDf = df_campaign_agg.loc[changed, [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, COL_DAILY_BUDGET_NEW, COL_REC_DAILY_BUDGET_NEW, COL_SBA_BUDGET_PACING_NEW, COL_SBA_ALLOCATION_NEW]] \
                      .copy() \
                      .rename(columns={ \
                            COL_DAILY_BUDGET_NEW: BULK_COL_DAILY_BUDGET, \
                            COL_REC_DAILY_BUDGET_NEW: BULK_COL_REC_DAILY_BUDGET, \
                            COL_SBA_BUDGET_PACING_NEW: BULK_COL_SBA_BUDGET_PACING, \
                            COL_SBA_ALLOCATION_NEW: BULK_COL_SBA_ALLOCATION, \
                        }) \
                      .reset_index() \
                      .sort_values(by=[RPT_COL_SBA_STRATEGY, BULK_COL_DAILY_BUDGET, BULK_COL_REC_DAILY_BUDGET], ascending=False) \
                      .drop(RPT_COL_SBA_STRATEGY, axis=1)

    print("outputDf shape", outputDf.shape)
    print("outputDf", tableize(outputDf.head()))
else:
    print("No changes detected, returning an empty dataframe")
    outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_DAILY_BUDGET, BULK_COL_REC_DAILY_BUDGET, BULK_COL_SBA_BUDGET_PACING, BULK_COL_SBA_ALLOCATION])

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2024-05-15 07:44:05 GMT

10 May 2024

« Script 1059: SBA Transcription Script 1063: Budget Staging for Strategies via GSheets #2 »

MarinOne Scripts Creator's Corner