Script 809: SBA Auto Pause & Re enable Budget via Dim Tag
Purpose
Python script that pauses and re-enables campaigns based on their monthly spend and budget.
To Elaborate
This Python script automates the process of pausing and re-enabling campaigns based on their monthly spend and budget. The script takes into account a safety margin to compensate for system lag and non-linearity in intraday spend. The key business rules are as follows:
- Pause campaigns when the month-to-date (MTD) spend reaches the monthly budget stored in the dimensions.
- Pause campaigns with MTD spend over the monthly budget by a margin.
- Reactivate campaigns with MTD spend under the monthly budget by a margin, but only if the campaign has a pause date populated.
- Pause and re-enable campaigns based on their SBA Traffic status.
- Output the changes made to the campaigns in a bulk file.
The script uses the pandas and numpy libraries for data manipulation and analysis. It also utilizes the marin_scripts_utils module for utility functions.
Walking Through the Code
- The script starts by defining a configurable parameter, BUDGET_CAP_SAFETY_MARGIN, which determines how close the MTD spend can get to the monthly budget before campaigns are paused.
- The script checks if it is running on the server or locally by attempting to access a known restricted builtin. If running locally, it loads the dataSourceDict from a pickled file and sets the inputDf and outputDf variables.
- The script sets the client timezone and imports the necessary libraries and modules.
- The script converts the necessary columns in the inputDf to their expected types.
- The script calculates the MTD Budget Group Spend by grouping the inputDf by the SBA Budget Group column and summing the Pub. Cost column.
- The script identifies campaigns to pause based on whether they have a monthly group budget, and if their MTD Budget Group Spend is over the monthly budget by the specified safety margin.
- The script updates the SBA Recommended Status column for the campaigns to be paused.
- The script identifies campaigns to resume based on whether they have a monthly group budget, their MTD Budget Group Spend is under the monthly budget by the specified safety margin, and they have a pause date populated.
- The script updates the SBA Recommended Status column for the campaigns to be resumed.
- The script identifies campaigns to pause and resume based on their SBA Traffic status and the SBA Recommended Status column.
- The script updates the Campaign Status and SBA Pause Date columns for the campaigns to be paused and resumed.
- The script selects the changed rows from the inputDf and creates the outputDf and debugDf.
- The script renames the Campaign Status column in the outputDf to the Bulk column header.
- The script outputs the shape and a sample of the outputDf.
- If running locally, the script writes the outputDf and debugDf to CSV files for local debugging.
Vitals
- Script ID : 809
- Client ID / Customer ID: 1306927165 / 60270223
- Action Type: Bulk Upload (Preview)
- Item Changed: Campaign
- Output Columns: Account, Campaign, Status, SBA Pause Date, SBA Recommended Status
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-03-13 20:02
- Last Updated by dwaidhas@marinsoftware.com on 2024-03-13 20:02
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
##
## name: Intraday Budget Cap via Dimensions
## description:
## Pause campaigns when MTD spend reaches Monthly Budget (stored in Dimensions)
##
## author: Dana Waidhas
## created: 2024-03-13
##
##### Configurable Param #####
# Define how close MTD spend can get to Monthly Budget before being Paused
# - compensates for lag in system
# - compendates for non-linearity in intraday spend
BUDGET_CAP_SAFETY_MARGIN = 0.02 # set to 2%
##############################
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
pickle_path = '/Users/mhuang/Downloads/pickle/generic_budget_cap_intraday_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
# print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
inputDf = dataSourceDict["1"]
outputDf = inputDf.copy()
# setup timezone
import datetime
# Chicago Timezone is GMT-5. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))
# import pandas
import pandas as pd
import numpy as np
# Printing out the version of Python, Pandas and Numpy
# import sys
# python_version = sys.version
# pandas_version = pd.__version__
# numpy_version = np.__version__
# print(f"python version: {python_version}")
# print(f"pandas version: {pandas_version}")
# print(f"numpy version: {numpy_version}")
# other imports
import re
import urllib
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost R$'
RPT_COL_SBA_TRAFFIC = 'SBA Traffic'
RPT_COL_SBA_BUDGET_GROUP = 'SBA Strategy'
RPT_COL_SBA_GROUP_MONTHLY_BUDGET = 'SBA Campaign Budget'
RPT_COL_SBA_PAUSE_DATE = 'SBA Pause Date'
RPT_COL_SBA_RECOMMENDED_STATUS = 'SBA Recommended Status'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_STATUS = 'Status'
BULK_COL_SBA_PAUSE_DATE = 'SBA Pause Date'
BULK_COL_SBA_RECOMMENDED_STATUS = 'SBA Recommended Status'
# Make inputDf a copy of original to keep dataSourceDict.pkl pristine
originalDf = dataSourceDict["1"]
inputDf = originalDf.copy()
# define some intermediate columns
COL_MTD_BUDGET_GROUP_SPEND = 'mtd_budget_group_spend'
# define Status values
VAL_STATUS_ACTIVE = 'Active'
VAL_STATUS_PAUSED = 'Paused'
VAL_BLANK = ''
print("inputDf shape", inputDf.shape)
print("inputDf info", inputDf.info())
## force expected types
# Convert RPT_COL_SBA_MONTHLY_BUDGET to numeric, coercing errors to NaN
inputDf[RPT_COL_SBA_GROUP_MONTHLY_BUDGET] = pd.to_numeric(inputDf[RPT_COL_SBA_GROUP_MONTHLY_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_SBA_GROUP_MONTHLY_BUDGET].fillna(0.0, inplace=True)
# Force RPT_COL_SBA_PAUSE_DATE to be Date type
inputDf[RPT_COL_SBA_PAUSE_DATE] = pd.to_datetime(inputDf[RPT_COL_SBA_PAUSE_DATE], errors='coerce').dt.date
# HACK: replace nan with empty strings so comparison doesn't fail
inputDf.fillna(VAL_BLANK, inplace=True)
# Calculate MTD Budget Group Spend
inputDf[COL_MTD_BUDGET_GROUP_SPEND] = inputDf.groupby(RPT_COL_SBA_BUDGET_GROUP)[RPT_COL_PUB_COST].transform('sum')
# Recommend to Pause camapigns with MTD Budget Group Spend over Monthly Budget (by a margin)
has_monthly_group_budget = inputDf[RPT_COL_SBA_GROUP_MONTHLY_BUDGET] > 0.0
over_spent_campaigns = inputDf[COL_MTD_BUDGET_GROUP_SPEND] >= inputDf[RPT_COL_SBA_GROUP_MONTHLY_BUDGET] * (1 - BUDGET_CAP_SAFETY_MARGIN)
campaigns_to_pause = has_monthly_group_budget & over_spent_campaigns
inputDf.loc[campaigns_to_pause, 'pause'] = 1
print(f"campaigns_to_pause count: {sum(campaigns_to_pause)}")
if campaigns_to_pause.any():
print("campaigns_to_pause campaigns", tableize(inputDf.loc[campaigns_to_pause].head()))
inputDf.loc[ campaigns_to_pause, \
RPT_COL_SBA_RECOMMENDED_STATUS \
] = VAL_STATUS_PAUSED
# Recommend to reactivate campaigns with MTD Budget Group Spend under Monthly Group Budget (by a margin)
# but limited to campaigns with SBA Pause Date populated 10 digit date
under_spent_campaigns = inputDf[COL_MTD_BUDGET_GROUP_SPEND] < inputDf[RPT_COL_SBA_GROUP_MONTHLY_BUDGET] * (1 - BUDGET_CAP_SAFETY_MARGIN)
sba_paused_campaigns = inputDf[RPT_COL_SBA_PAUSE_DATE].astype('str').str.len() >= 10
campaigns_to_resume = under_spent_campaigns & sba_paused_campaigns
inputDf.loc[campaigns_to_resume, 'resume'] = 1
print(f"campaigns_to_resume count: {sum(campaigns_to_resume)}")
if campaigns_to_resume.any():
print("campaigns_to_resume", tableize(inputDf.loc[campaigns_to_resume].head()))
inputDf.loc[ campaigns_to_resume, \
RPT_COL_SBA_RECOMMENDED_STATUS \
] = VAL_STATUS_ACTIVE
## Actually taffic PAUSE
should_traffic = inputDf[RPT_COL_SBA_TRAFFIC].astype(str).str.lower() == 'traffic'
should_traffic_pause = should_traffic & \
(inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] == VAL_STATUS_PAUSED) & \
(inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] != inputDf[RPT_COL_CAMPAIGN_STATUS])
inputDf.loc[should_traffic_pause, 'traffic_pause'] = 1
print(f"should_traffic_pause count: {sum(should_traffic_pause)}")
if should_traffic_pause.any():
print("should_traffic_pause campaigns", tableize(inputDf.loc[should_traffic_pause].head()))
inputDf.loc[should_traffic_pause, RPT_COL_CAMPAIGN_STATUS] = inputDf.loc[should_traffic_pause, RPT_COL_SBA_RECOMMENDED_STATUS]
inputDf.loc[should_traffic_pause, RPT_COL_SBA_PAUSE_DATE] = today.strftime('%Y-%m-%d')
## Actually taffic RESUME
should_traffic_resume = should_traffic & \
(inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] == VAL_STATUS_ACTIVE) & \
(inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] != inputDf[RPT_COL_CAMPAIGN_STATUS]) & \
sba_paused_campaigns
inputDf.loc[should_traffic_resume, 'traffic_resume'] = 1
print(f"should_traffic_resume count: {sum(should_traffic_resume)}")
if should_traffic_resume.any():
print("should_traffic_resume campaigns", tableize(inputDf.loc[should_traffic_resume].head()))
inputDf.loc[should_traffic_resume, RPT_COL_CAMPAIGN_STATUS] = inputDf.loc[should_traffic_resume, RPT_COL_SBA_RECOMMENDED_STATUS]
inputDf.loc[should_traffic_resume, RPT_COL_SBA_PAUSE_DATE] = VAL_BLANK
print(f"select_changed with inputDf shape {inputDf.shape} and originalDf shape {originalDf.shape}")
# only include changed rows in bulk file
(outputDf, debugDf) = select_changed(inputDf, \
originalDf, \
diff_cols = [ \
RPT_COL_CAMPAIGN_STATUS, \
RPT_COL_SBA_RECOMMENDED_STATUS, \
], \
select_cols = [ \
RPT_COL_ACCOUNT, \
RPT_COL_CAMPAIGN, \
RPT_COL_CAMPAIGN_STATUS, \
RPT_COL_SBA_RECOMMENDED_STATUS, \
RPT_COL_SBA_PAUSE_DATE, \
], \
merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN] \
)
changed = (debugDf[RPT_COL_CAMPAIGN_STATUS+'_new'] != debugDf[RPT_COL_CAMPAIGN_STATUS+'_orig']) | \
(debugDf[RPT_COL_SBA_RECOMMENDED_STATUS+'_new'] != debugDf[RPT_COL_SBA_RECOMMENDED_STATUS+'_orig'])
debugDf.loc[changed, 'changed'] = 1
print(f"changed count: {sum(changed)}")
if changed.any():
print("changed campaigns", tableize(debugDf.loc[changed].head()))
# remember to use Bulk column header for Status
outputDf = outputDf.rename(columns = { \
RPT_COL_CAMPAIGN_STATUS: BULK_COL_STATUS \
})
print("outputDf shape", outputDf.shape)
print("outputDf", tableize(outputDf.tail(5)))
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
Post generated on 2024-05-15 07:44:05 GMT