Script 701: SBA Auto Pause

Purpose

The Python script automatically pauses or resumes marketing campaigns based on their month-to-date (MTD) spending relative to a predefined budget cap.

To Elaborate

The script is designed to manage marketing campaigns by automatically pausing those that exceed a specified budget cap and resuming those that fall below it. It calculates the MTD spend for each campaign and compares it against a predefined budget cap, adjusted by a safety margin to account for system lags and non-linear spending patterns. Campaigns that exceed the budget cap are recommended for pausing, while those that are under the budget and have been previously paused are recommended for resumption. The script ensures that only campaigns with a valid budget and pause date are considered for these actions. This automation helps in maintaining budget discipline and optimizing campaign performance without manual intervention.

Walking Through the Code

  1. Configuration and Setup
    • The script begins by defining a configurable parameter, BUDGET_CAP_SAFETY_MARGIN, which determines how close the MTD spend can get to the monthly budget before a campaign is paused.
    • It checks if the script is running on a server or locally, and loads necessary data from a pickle file if running locally.
  2. Data Preparation
    • The script loads data into a DataFrame, ensuring data types are correct and handling missing values by replacing them with empty strings.
    • It calculates the MTD spend for each campaign by summing the costs associated with each Salesforce Item ID.
  3. Campaign Evaluation
    • Campaigns are evaluated to determine if they should be paused or resumed based on their MTD spend relative to the budget cap.
    • Campaigns exceeding the budget cap are marked for pausing, while those under the budget and previously paused are marked for resumption.
  4. Status Update
    • The script updates the campaign status in the DataFrame based on the recommendations, ensuring that only campaigns with a traffic status are considered for pausing or resumption.
    • It records the pause date for campaigns that are paused and clears it for those that are resumed.
  5. Output Preparation
    • The script identifies changes in campaign status and prepares an output DataFrame with only the changed rows, renaming columns as necessary for bulk processing.

Vitals

  • Script ID : 701
  • Client ID / Customer ID: 1306926667 / 60270093
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Status, SBA Pause Date, SBA Recommended Status
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
  • Created by ascott@marinsoftware.com on 2024-02-14 17:00
  • Last Updated by ascott@marinsoftware.com on 2024-02-14 17:54
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
###
## name: Salesforce Item ID Budget Cap
## description:
##  Pause campaigns when MTD spend reaches SBA Campaign Budget Cap
## 
## author: Michael S. Huang
## created: 2023-10-31
## 

##### Configurable Param #####
# Define how close MTD spend can get to Monthly Budget before being Paused
#  - compensates for lag in system
#  - compendates for non-linearity in intraday spend
BUDGET_CAP_SAFETY_MARGIN = 0.02 # set to 2%
##############################


########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
pickle_path = '/Users/mhuang/Downloads/pickle/heartland_dental_cap_20240117_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
    # LA Timezone is GMT-8. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-8))

    # import pandas
    import pandas as pd
    import numpy as np

    # Printing out the version of Python, Pandas and Numpy
    # import sys
    # python_version = sys.version
    # pandas_version = pd.__version__
    # numpy_version = np.__version__

    # print(f"python version: {python_version}")
    # print(f"pandas version: {pandas_version}")
    # print(f"numpy version: {numpy_version}")

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_SBA_TRAFFIC = 'SBA Traffic'
RPT_COL_SBA_RECOMMENDED_STATUS = 'SBA Recommended Status'
RPT_COL_SBA_PAUSE_DATE = 'SBA Pause Date'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_SBA_CALCULATED_BUDGET_DAILY = 'SBA Calculated Budget Daily'
RPT_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_SALESFORCE_ITEM_ID = 'Salesforce Item Id'
RPT_COL_SBA_CAMPAIGN_BUDGET= 'SBA Campaign Budget'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_STATUS = 'Status'
BULK_COL_SBA_PAUSE_DATE = 'SBA Pause Date'
BULK_COL_SBA_RECOMMENDED_STATUS = 'SBA Recommended Status'

# Make inputDf a copy of original to keep dataSourceDict.pkl pristine
originalDf = dataSourceDict["1"]
inputDf = originalDf.copy()

# define some intermediate columns
COL_MTD_SALESFORCE_ITEM_ID_SPEND = 'mtd_salesforce_item_id_spend'

# define Status values
VAL_STATUS_ACTIVE = 'Active'
VAL_STATUS_PAUSED = 'Paused'
VAL_BLANK = ''

## force expected types
# Convert RPT_COL_SBA_CAMPAIGN_BUDGETto numeric, coercing errors to NaN
inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET] = pd.to_numeric(inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET].fillna(0.0, inplace=True)
# Force RPT_COL_SBA_PAUSE_DATE to be Date type
inputDf[RPT_COL_SBA_PAUSE_DATE] = pd.to_datetime(inputDf[RPT_COL_SBA_PAUSE_DATE], errors='coerce').dt.date

# HACK: replace nan with empty strings so comparison doesn't fail
inputDf.fillna(VAL_BLANK, inplace=True)

print("inputDf shape", inputDf.shape)
print("inputDf info", inputDf.info())



# Calculate MTD Epicor Spend
inputDf[COL_MTD_SALESFORCE_ITEM_ID_SPEND] = inputDf.groupby(RPT_COL_SALESFORCE_ITEM_ID)[RPT_COL_PUB_COST].transform('sum')

# Recommend to Pause camapigns with MTD Epicor Spend over Monthly Epicor Budget (by a margin)
has_epicor_budget = inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET] > 0.0
over_spent_campaigns = inputDf[COL_MTD_SALESFORCE_ITEM_ID_SPEND] >= inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET] * (1 - BUDGET_CAP_SAFETY_MARGIN)
campaigns_to_pause = has_epicor_budget & over_spent_campaigns
# TODO: remove TEST line below
# campaigns_to_pause = over_spent_campaigns

inputDf.loc[campaigns_to_pause, 'pause'] = 1
print(f"campaigns_to_pause count: {sum(campaigns_to_pause)}")
if campaigns_to_pause.any():
    print("campaigns_to_pause campaigns", tableize(inputDf.loc[campaigns_to_pause].head()))

inputDf.loc[ campaigns_to_pause, \
             RPT_COL_SBA_RECOMMENDED_STATUS \
           ] = VAL_STATUS_PAUSED
 
# Recommend to reactivate campaigns with MTD Epicor Spend under Monthly Epicor Budget (by a margin)
# but limited to campaigns with SBA Pause Date populated 10 digit date
under_spent_campaigns = inputDf[COL_MTD_SALESFORCE_ITEM_ID_SPEND] < inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET] * (1 - BUDGET_CAP_SAFETY_MARGIN)
sba_paused_campaigns = inputDf[RPT_COL_SBA_PAUSE_DATE].astype('str').str.len() >= 10
campaigns_to_resume = under_spent_campaigns & sba_paused_campaigns

inputDf.loc[campaigns_to_resume, 'resume'] = 1
print(f"campaigns_to_resume count: {sum(campaigns_to_resume)}")
if campaigns_to_resume.any():
    print("campaigns_to_resume", tableize(inputDf.loc[campaigns_to_resume].head()))

inputDf.loc[ campaigns_to_resume, \
             RPT_COL_SBA_RECOMMENDED_STATUS  \
           ] = VAL_STATUS_ACTIVE

## Actually taffic PAUSE

should_traffic = inputDf[RPT_COL_SBA_TRAFFIC].astype(str).str.lower() == 'traffic'
# TODO: comment out below TEST line for PROD
# should_traffic = inputDf[RPT_COL_PUB_COST] > 1.0

# HACK: to find what should have been paused, pretend none of them are Paused yet
# inputDf.loc[campaigns_to_pause & should_traffic, RPT_COL_CAMPAIGN_STATUS] = VAL_STATUS_ACTIVE
# originalDf.loc[campaigns_to_pause & should_traffic, RPT_COL_CAMPAIGN_STATUS] = VAL_STATUS_ACTIVE


should_traffic_pause = should_traffic & \
                       (inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] == VAL_STATUS_PAUSED) & \
                       (inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] != inputDf[RPT_COL_CAMPAIGN_STATUS])


inputDf.loc[should_traffic_pause, 'traffic_pause'] = 1
print(f"should_traffic_pause count: {sum(should_traffic_pause)}")
if should_traffic_pause.any():
    print("should_traffic_pause campaigns", tableize(inputDf.loc[should_traffic_pause].head()))


inputDf.loc[should_traffic_pause, RPT_COL_CAMPAIGN_STATUS] = inputDf.loc[should_traffic_pause, RPT_COL_SBA_RECOMMENDED_STATUS]
inputDf.loc[should_traffic_pause, RPT_COL_SBA_PAUSE_DATE] = today.strftime('%Y-%m-%d')

## Actually taffic RESUME

should_traffic_resume = should_traffic & \
                       (inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] == VAL_STATUS_ACTIVE) & \
                       (inputDf[RPT_COL_SBA_RECOMMENDED_STATUS] != inputDf[RPT_COL_CAMPAIGN_STATUS]) & \
                       sba_paused_campaigns

inputDf.loc[should_traffic_resume, 'traffic_resume'] = 1
print(f"should_traffic_resume count: {sum(should_traffic_resume)}")
if should_traffic_resume.any():
    print("should_traffic_resume campaigns", tableize(inputDf.loc[should_traffic_resume].head()))

inputDf.loc[should_traffic_resume, RPT_COL_CAMPAIGN_STATUS] = inputDf.loc[should_traffic_resume, RPT_COL_SBA_RECOMMENDED_STATUS]
inputDf.loc[should_traffic_resume, RPT_COL_SBA_PAUSE_DATE] = VAL_BLANK

print(f"select_changed with inputDf shape {inputDf.shape} and originalDf shape {originalDf.shape}")


# only include changed rows in bulk file
(outputDf, debugDf) = select_changed(inputDf, \
                                    originalDf, \
                                    diff_cols = [ \
                                        RPT_COL_CAMPAIGN_STATUS, \
                                        RPT_COL_SBA_RECOMMENDED_STATUS, \
                                    ], \
                                    select_cols = [ \
                                        RPT_COL_ACCOUNT, \
                                        RPT_COL_CAMPAIGN, \
                                        RPT_COL_CAMPAIGN_STATUS, \
                                        RPT_COL_SBA_RECOMMENDED_STATUS, \
                                        RPT_COL_SBA_PAUSE_DATE, \
                                    ], \
                                    merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN] \
                                    )


changed = (debugDf[RPT_COL_CAMPAIGN_STATUS+'_new'] != debugDf[RPT_COL_CAMPAIGN_STATUS+'_orig']) | \
          (debugDf[RPT_COL_SBA_RECOMMENDED_STATUS+'_new'] != debugDf[RPT_COL_SBA_RECOMMENDED_STATUS+'_orig'])

debugDf.loc[changed, 'changed'] = 1
print(f"changed count: {sum(changed)}")
if changed.any():
    print("changed campaigns", tableize(debugDf.loc[changed].head()))

# remember to use Bulk column header for Status
outputDf = outputDf.rename(columns = { \
                RPT_COL_CAMPAIGN_STATUS: BULK_COL_STATUS \
                })

print("outputDf shape", outputDf.shape)
print("outputDf", tableize(outputDf.tail(5)))

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus