Script 615: SBA Campaign Pacing
Purpose
SBA Campaign Budget Pacing - Minimize Lost IS (Budget)
To Elaborate
The Python script solves the problem of allocating budgets to different campaign strategies in order to minimize lost impression share due to budget constraints. It takes into account various factors such as remaining budget, remaining weekdays in the month, historical spend and spend potential, campaigns with spend in the lookback period, and a minimum daily budget. The goal is to optimize the allocation of budgets to maximize the performance of the campaigns while staying within the budget constraints.
Walking Through the Code
- The script starts by defining a configurable parameter for the minimum daily budget.
- It then checks if the script is running on a server or locally, and loads the necessary data accordingly.
- The script sets up the required libraries and imports.
- It defines column constants for the input and output dataframes.
- The script initializes the input dataframe and sets the output dataframe as a copy of the input dataframe.
- It sets up the client timezone and imports the required libraries.
- The script performs some data preprocessing, such as converting date columns to the appropriate format and rounding allocation percentages.
- It sets the input dataframe index to the SBA strategy column and groups the data by strategy.
- The script calculates the full-potential spend by adjusting the historical spend based on the lost impression share due to budget.
- It calculates the MTD spend by summing the spend for the current month.
- The script filters out inactive or expired campaigns and calculates the budget allocation ratio for each campaign.
- It calculates the remaining budget for each strategy group by subtracting the MTD spend from the SBA monthly budget.
- The script allocates the remaining budget to each campaign based on the allocation ratio.
- It calculates the next day’s daily budget by dividing the allocated budget by the number of business days left in the current month.
- The script applies a minimum rule to bump up the allocated budget above the minimum daily budget if necessary.
- It handles the traffic budget separately by copying the recommended daily budget and blanking out the budget for non-traffic campaigns.
- The script calculates the SBA strategy-level pacing compliance percentage based on the remaining days in the current month.
- It checks for changes in the recommended daily budget, daily budget, allocation, and budget pacing.
- The script constructs the output dataframe with the changed campaigns.
- If running in local development mode, it writes the output and debug dataframes to CSV files.
Vitals
- Script ID : 615
- Client ID / Customer ID: 309909744 / 14196
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Daily Budget, SBA Allocation, SBA Budget Pacing, Rec. Daily Budget
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
- Created by ascott@marinsoftware.com on 2023-12-19 20:24
- Last Updated by Michael Huang on 2024-05-15 03:52
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
#
# SBA Campaign Budget Pacing - Minimize Lost IS (Budget)
#
# Allocates according to:
# * Remaining budget for each Strategy Group
# * Remaining weekdays in month
# * Historical spend and spend potential
# * Campaigns with spend in lookback period
# * Minimum daily budget
#
# Author: Michael S. Huang, Adam Scott
#
# Created: 2023-09-30
# Updated: 2024-02-10
#
##### Configurable Param #####
MINIMUM_DAILY_BUDGET = 1
##############################
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/gatorworks_sba_budget_pacing_20240515.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
# print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
inputDf = dataSourceDict["1"]
outputDf = inputDf.copy()
# setup timezone
import datetime
# LA Timezone is GMT-8. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-8))
# import pandas
import pandas as pd
import numpy as np
# other imports
import re
import urllib
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CONV = 'Conv.'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_LOST_IMPR_SHARE_BUDGET = 'Lost Impr. Share (Budget) %'
RPT_COL_LOST_IMPR_SHARE_RANK = 'Lost Impr. Share (Rank) %'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_SBA_CAMPAIGN_BUDGET = 'SBA Campaign Budget'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_REC_DAILY_BUDGET = 'Rec. Daily Budget'
RPT_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
RPT_COL_SBA_TRAFFIC = 'SBA Traffic'
RPT_COL_PROGRAM_END_Date = 'Program End Date'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_DAILY_BUDGET = 'Daily Budget'
BULK_COL_SBA_ALLOCATION = 'SBA Allocation'
BULK_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
BULK_COL_REC_DAILY_BUDGET = 'Rec. Daily Budget'
COL_SPEND_FULL_POTENTIAL = 'spend_lookback_full_potential'
COL_SPEND_FULL_POTENTIAL_CAPPED = 'spend_lookback_full_potential_capped'
COL_SPEND_MTD = 'spend_mtd'
COL_SBA_ALLOCATION_NEW_FLOAT = RPT_COL_SBA_ALLOCATION + '_new_float'
COL_SBA_ALLOCATION_NEW = RPT_COL_SBA_ALLOCATION + '_new'
COL_SBA_STRATEGY_BUDGET_REMAINING = 'SBA_Campaign_budget_remaining'
COL_SBA_BUDGET_PACING_NEW = RPT_COL_SBA_BUDGET_PACING + '_new'
COL_BUDGET_REMAINING = 'budget_remaining'
COL_DAILY_BUDGET_NEW = RPT_COL_DAILY_BUDGET + '_new'
COL_REC_DAILY_BUDGET_NEW = RPT_COL_REC_DAILY_BUDGET + '_new'
COL_DAYS_REMAINING= 'weekdays_remaining'
COL_DAYS_TOTAL= 'weekdays_total'
COL_PACING_CALC = 'pacing_calc'
outputDf[BULK_COL_DAILY_BUDGET] = "<<YOUR VALUE>>"
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
print("inputDf shape", inputDf.shape)
print("inputDf dtypes", inputDf.dtypes)
### User Code Starts Here
### DEBUGGING. Comment out in PROD run
# set arbitrary date
# today = datetime.date(2024, 5, 1)
# change back to percent string
if inputDf[RPT_COL_SBA_ALLOCATION].dtype == "float":
inputDf[RPT_COL_SBA_ALLOCATION] = round(inputDf[RPT_COL_SBA_ALLOCATION] * 100.0, 0).astype(str) + '%'
if inputDf[BULK_COL_SBA_BUDGET_PACING].dtype == "float":
inputDf[BULK_COL_SBA_BUDGET_PACING] = round(inputDf[BULK_COL_SBA_BUDGET_PACING] * 100.0, 0).astype(str) + '%'
# coerce Program End Date into Date type
inputDf[RPT_COL_PROGRAM_END_Date] = pd.to_datetime(inputDf[RPT_COL_PROGRAM_END_Date], errors='coerce')
inputDf = inputDf.set_index([RPT_COL_SBA_STRATEGY])
group_by_strategy = inputDf.groupby(RPT_COL_SBA_STRATEGY)
# ## Calculate Full-Potential Spend
# * Adjust Historical Spend by _Lost Impression Share due to Budget
#
adj_ratio = 1 + (inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET] / (1 - inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET]))
inputDf[COL_SPEND_FULL_POTENTIAL] = round(inputDf[RPT_COL_PUB_COST] * adj_ratio, 2)
# ## Remove Date Segmentation
# * Calculate MTD Spend
# SUM Series with Date index and only includes current month
def current_month_sum(x):
x = x.sort_index()
mtd = x[ (x.index.month == today.month) & (x.values > 0)]
return mtd.sum()
groupby_cols = [ \
RPT_COL_SBA_STRATEGY, \
RPT_COL_STRATEGY, \
RPT_COL_PUBLISHER_NAME, \
RPT_COL_ACCOUNT, \
RPT_COL_CAMPAIGN, \
]
agg_spec = {
RPT_COL_CAMPAIGN_STATUS: 'last', \
RPT_COL_DAILY_BUDGET: 'last', \
RPT_COL_SBA_CAMPAIGN_BUDGET: 'last', \
RPT_COL_SBA_ALLOCATION: 'last', \
RPT_COL_REC_DAILY_BUDGET: 'last', \
RPT_COL_SBA_BUDGET_PACING: 'last', \
RPT_COL_SBA_TRAFFIC: 'last', \
RPT_COL_CLICKS: 'sum', \
RPT_COL_CONV: 'sum', \
RPT_COL_PUB_COST: 'sum', \
COL_SPEND_MTD: current_month_sum, \
COL_SPEND_FULL_POTENTIAL: 'sum', \
RPT_COL_PROGRAM_END_Date: 'last', \
}
inputDf[COL_SPEND_MTD] = inputDf[RPT_COL_PUB_COST]
df_campaign_agg = inputDf.reset_index() \
.set_index(RPT_COL_DATE) \
.groupby(groupby_cols) \
.agg(agg_spec) \
.reset_index() \
.set_index(RPT_COL_SBA_STRATEGY)
# ## Only allocate budget for recently trafficking campaigns
# * Exclude Campaigns that are:
# ** not ACTIVE
# ** without spend in lookback period
# ** Program Date is in the past
inactive_campaigns = (df_campaign_agg[RPT_COL_CAMPAIGN_STATUS] != 'Active') & (df_campaign_agg[RPT_COL_PUB_COST] == 0)
expired_campaigns = df_campaign_agg[RPT_COL_PROGRAM_END_Date].notnull() & (df_campaign_agg[RPT_COL_PROGRAM_END_Date] < pd.to_datetime(today))
df_campaign_agg = df_campaign_agg.loc[ ~(inactive_campaigns | expired_campaigns) ]
# ## Calculate Budget Allocation Ratio
# * Cap full potential spend at 2X (don't spend twice as much as before)
# * Compare full potential spend for each campaign to total spend within same Strategy budget group
df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] = df_campaign_agg \
.apply(lambda row: min(row[COL_SPEND_FULL_POTENTIAL], 2 * row[RPT_COL_PUB_COST]), axis=1)
# use transform to calculate sum for each Stratgey and make it available on every row
# note: no need to build aggregate DataFrame and JOIN back to original
df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] = 100.0 * \
df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] / \
df_campaign_agg.groupby(RPT_COL_SBA_STRATEGY)[COL_SPEND_FULL_POTENTIAL_CAPPED].transform('sum')
df_campaign_agg[COL_SBA_ALLOCATION_NEW] = round(df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT],0).astype(str) + '%'
#
# ## Calculate Remaining Budget
# * For each Straetgy budget group, calculate how much Budget is left by substracting SBA Monthly budget from MTD Strategy spend
df_campaign_agg[COL_SBA_STRATEGY_BUDGET_REMAINING] = \
df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET] - \
df_campaign_agg.groupby(by=[RPT_COL_SBA_STRATEGY])[COL_SPEND_MTD].sum()
# ## Allocate Budget
# * Allocate remaining budget to each campaign according to ratio calculated above
df_campaign_agg[COL_BUDGET_REMAINING] = round(df_campaign_agg[COL_SBA_STRATEGY_BUDGET_REMAINING] * df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] / 100.0, 1)
# ## Calculate SBA Daily Budget
#
# * Calcualte next day Daily Budget by dividing allocated budget by number of Business Days left in the current month
today_numpy = pd.to_datetime(today).to_numpy().astype('datetime64[D]')
next_month_start = (today_numpy + pd.offsets.MonthBegin()).to_numpy().astype('datetime64[D]')
# assumes this Script runs in the morning, and today is included in days_left
# use max() to make sure there is at least 1 day left
days_left = max(1, (next_month_start - today_numpy).astype('timedelta64[D]').astype(int))
df_campaign_agg[COL_DAYS_REMAINING] = days_left
df_campaign_agg[COL_REC_DAILY_BUDGET_NEW] = round(df_campaign_agg[COL_BUDGET_REMAINING] / days_left, 0)
# ### Apply Minimum Rule
# * Bump allocated budget above minimum
allocated_below_min = (df_campaign_agg[COL_REC_DAILY_BUDGET_NEW] < MINIMUM_DAILY_BUDGET)
df_campaign_agg.loc[allocated_below_min, COL_REC_DAILY_BUDGET_NEW] = MINIMUM_DAILY_BUDGET
# ### Traffic Budget
df_campaign_agg[COL_DAILY_BUDGET_NEW] = np.nan
# campaigns to traffic
to_traffic = df_campaign_agg[RPT_COL_SBA_TRAFFIC].notnull() & \
(df_campaign_agg[RPT_COL_SBA_TRAFFIC].astype(str).str.lower() == 'traffic')
print("Count of campaigns enabled for Traffic: ", to_traffic.sum())
# copy budgets over
df_campaign_agg[COL_DAILY_BUDGET_NEW] = df_campaign_agg[COL_REC_DAILY_BUDGET_NEW]
# then blank out budget for non-traffic campaigns
df_campaign_agg.loc[~to_traffic, COL_DAILY_BUDGET_NEW] = np.nan
### Calculate SBA Strategy-level Pacing Compliance percentage.
# Use remaining days in current month to prorate budget target. Ideally should be 100% each day.
current_month_start = pd.to_datetime(today.replace(day=1)).to_numpy().astype('datetime64[D]')
total_days_in_month = (next_month_start - current_month_start).astype('timedelta64[D]').astype(int)
df_campaign_agg[COL_DAYS_TOTAL] = total_days_in_month
# could be 0 on first day of month
prorated_ratio = (total_days_in_month - days_left) / total_days_in_month
print("today: ", today)
print("current_month_start: ", current_month_start)
print("next_month_start: ", next_month_start)
print("total_days_in_month: ", total_days_in_month)
print("days_left: ", days_left)
print("prorated_ratio: ", prorated_ratio)
# divide SBA Strategy-level MTD spend by prorated budget target
mask = df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET] > 0
if (total_days_in_month == days_left):
# pacing is zero on first day of month
df_campaign_agg[COL_PACING_CALC] = '0%'
else:
df_campaign_agg[COL_PACING_CALC] = round(100.0 * \
df_campaign_agg.groupby(by=[RPT_COL_SBA_STRATEGY])[COL_SPEND_MTD].sum() / \
(prorated_ratio * df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET]), \
0).astype(str) + '%'
df_campaign_agg.loc[mask, COL_SBA_BUDGET_PACING_NEW] = df_campaign_agg.loc[mask, COL_PACING_CALC]
# Print a sample of strategy budgets
df_strategy_budget = group_by_strategy[[RPT_COL_SBA_CAMPAIGN_BUDGET]].transform('max').dropna().drop_duplicates()
print("Sample of Strategy budgets", df_strategy_budget.head(10).to_string())
# ## Generate outputDf
# Check for changes
changed = df_campaign_agg[COL_REC_DAILY_BUDGET_NEW].notnull() & \
( \
(df_campaign_agg[RPT_COL_REC_DAILY_BUDGET] != df_campaign_agg[COL_REC_DAILY_BUDGET_NEW]) | \
(df_campaign_agg[RPT_COL_DAILY_BUDGET] != df_campaign_agg[COL_DAILY_BUDGET_NEW]) | \
(df_campaign_agg[RPT_COL_SBA_ALLOCATION] != df_campaign_agg[COL_SBA_ALLOCATION_NEW]) | \
(df_campaign_agg[RPT_COL_SBA_BUDGET_PACING] != df_campaign_agg[COL_SBA_BUDGET_PACING_NEW]) \
)
print("Changed rows:", changed.sum())
# Debug
debugDf = df_campaign_agg.loc[changed] \
.reset_index() \
.sort_values(by=[RPT_COL_SBA_STRATEGY, COL_DAILY_BUDGET_NEW, COL_REC_DAILY_BUDGET_NEW], ascending=False)
# print("debugDf", tableize(debugDf))
# Only emit output for changed campaigns
if changed.sum() > 0:
# construct outputDf
outputDf = df_campaign_agg.loc[changed, [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, COL_DAILY_BUDGET_NEW, COL_REC_DAILY_BUDGET_NEW, COL_SBA_BUDGET_PACING_NEW, COL_SBA_ALLOCATION_NEW]] \
.copy() \
.rename(columns={ \
COL_DAILY_BUDGET_NEW: BULK_COL_DAILY_BUDGET, \
COL_REC_DAILY_BUDGET_NEW: BULK_COL_REC_DAILY_BUDGET, \
COL_SBA_BUDGET_PACING_NEW: BULK_COL_SBA_BUDGET_PACING, \
COL_SBA_ALLOCATION_NEW: BULK_COL_SBA_ALLOCATION, \
}) \
.reset_index() \
.sort_values(by=[RPT_COL_SBA_STRATEGY, BULK_COL_DAILY_BUDGET, BULK_COL_REC_DAILY_BUDGET], ascending=False) \
.drop(RPT_COL_SBA_STRATEGY, axis=1)
print("outputDf shape", outputDf.shape)
print("outputDf", tableize(outputDf.head()))
else:
print("No changes detected, returning an empty dataframe")
outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_DAILY_BUDGET, BULK_COL_REC_DAILY_BUDGET, BULK_COL_SBA_BUDGET_PACING, BULK_COL_SBA_ALLOCATION])
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
Post generated on 2024-05-15 07:44:05 GMT