Script 1323: Ascend Auto Cap

Purpose

The Python script automates the process of enabling a campaign override flag when adjusted recommendations exceed predefined caps set in Google Sheets.

To Elaborate

The script addresses the need to manage campaign budget overrides by comparing adjusted recommendations against predefined caps. It automates the process of enabling or disabling a campaign override flag based on whether the adjusted recommendations exceed the specified floor or ceiling caps. The caps are defined in Google Sheets, and the script uses these to determine if the campaign’s adjusted recommendations fall outside the acceptable range. If they do, the override flag is enabled, and if not, it is disabled. This ensures that campaigns remain within budget constraints while allowing for adjustments when necessary. The script is designed to be run locally or on a server, with specific configurations for each environment.

Walking Through the Code

  1. Initialization and Configuration:
    • The script begins by setting up configurations for local or server execution. It checks if the code is running on a server or locally and loads necessary data from a pickle file if running locally.
    • User changeable parameters include the pickle_path for local data loading and the download_preview_input flag for previewing data sources.
  2. Data Preparation:
    • The script loads data from a dictionary (dataSourceDict) and prepares it for processing. It ensures that certain columns are converted to the correct data types, such as numeric or date types.
    • It retrieves cap rules and campaign settings from Google Sheets, renaming columns to match report column names.
  3. Cap Calculation and Comparison:
    • For each cap rule, the script calculates the floor and ceiling caps based on baseline values. It checks if the adjusted recommendations exceed these caps.
    • It merges the calculated caps back into the main data frame and determines whether the override flag should be enabled or disabled based on the comparison results.
  4. Override Flag Management:
    • The script collects override flags for each rule and determines which campaigns should have the override enabled or disabled.
    • It updates the override flag and date in the data frame based on the comparison results and cleans up any orphaned override dates.
  5. Output Preparation:
    • The script prepares the output data frame, ensuring only changed rows are included. It renames columns to match the expected output format and writes the results to CSV files if running in local development mode.

Vitals

  • Script ID : 1323
  • Client ID / Customer ID: 240065 / 191056
  • Action Type: Bulk Upload (Preview)
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Campaign Override, AUTO - Campaign Override Date
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Michael Huang (mhuang@marinsoftware.com)
  • Created by Michael Huang on 2024-08-14 15:27
  • Last Updated by Michael Huang on 2024-11-14 02:33
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
##
## name: Ascend Auto Cap
## description:
##  Enable Campaign Override flag when Ascend Adj Recommendations exceed caps set in GSheets.
## 
## author: Michael S. Huang
## created: 2024-08-14
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
pickle_path = '/Users/mhuang/Downloads/pickle/wex_photo_ascend_auto_cap_20240815.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
    # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # Printing out the version of Python, Pandas and Numpy
    # import sys
    # python_version = sys.version
    # pandas_version = pd.__version__
    # numpy_version = np.__version__

    # print(f"python version: {python_version}")
    # print(f"pandas version: {pandas_version}")
    # print(f"numpy version: {numpy_version}")

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed

    # pandas settings
    pd.set_option('display.max_columns', None)  # Display all columns
    pd.set_option('display.max_colwidth', None)  # Display full content of each column

else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_BUDGET_OVERRIDE = 'Campaign Override'
RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE = 'AUTO - Campaign Override Date'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_STRATEGY_GOAL = 'Strategy Goal'
RPT_COL_STRATEGY_CONSTRAINT = 'Strategy Constraint'
RPT_COL_STRATEGY_TARGET = 'Strategy Target'
RPT_COL_PUBLISHER_BID_STRATEGY = 'Publisher Bid Strategy'
RPT_COL_PUBLISHER_TARGET_CPA = 'Publisher Target CPA'
RPT_COL_PUBLISHER_TARGET_ROAS = 'Publisher Target ROAS'
RPT_COL_REC_CAMPAIGN_TARGET = 'Rec. Campaign Target'
RPT_COL_ADJ_REC_CAMPAIGN_TARGET = 'Adj. Rec. Campaign Target'
RPT_COL_REC_CAMPAIGN_TARGET_DATE = 'Rec. Campaign Target Date'
RPT_COL_PUB_COST = 'Pub. Cost £'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_REC_CAMPAIGN_BUDGET = 'Rec. Campaign Budget'
RPT_COL_ADJ_REC_CAMPAIGN_BUDGET = 'Adj. Rec. Campaign Budget'
RPT_COL_REC_CAMPAIGN_BUDGET_DATE = 'Rec. Campaign Budget Date'
RPT_COL_RECOMMENDATION_ADJ = 'Recommendation Adj.'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_CAMPAIGN_OVERRIDE = 'Campaign Override'
BULK_COL_AUTO__CAMPAIGN_OVERRIDE_DATE = 'AUTO - Campaign Override Date'
outputDf[BULK_COL_CAMPAIGN_OVERRIDE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_AUTO__CAMPAIGN_OVERRIDE_DATE] = "<<YOUR VALUE>>"

# internal constants
COL_BASELINE_COLUMN_KEY = 'baseline_col_key'
COL_BASELINE_COLUMN_NAME = 'baseline_col_name'
COL_FLOOR_CAP_PCT = 'floor_cap_pct'
COL_CEILING_CAP_PCT = 'ceiling_cap_pct'
COL_MONITOR_COLUMN_KEY = 'monitor_col_key'
COL_MONITOR_COLUMN_NAME = 'monitor_col_name'

VAL_OVERRIDE_ON = 'On'
VAL_OVERRIDE_OFF = 'Off'
VAL_BLANK = ''

#### user code starts here

print("inputDf shape", inputDf.shape)
print("inputDf dtypes\n", inputDf.dtypes)
print("inputDf sample\n", inputDf.head(5))

originalDf = dataSourceDict["1"]

# Make inputDf a copy of original to keep dataSourceDict pristine
inputDf = originalDf.copy()

## force expected types
# Convert RPT_COL_PUBLISHER_TARGET_ROAS and RPT_COL_PUBLISHER_TARGET_CPA to numeric, coercing errors to NaN
inputDf[RPT_COL_PUBLISHER_TARGET_ROAS] = pd.to_numeric(inputDf[RPT_COL_PUBLISHER_TARGET_ROAS], errors='coerce')
inputDf[RPT_COL_PUBLISHER_TARGET_CPA] = pd.to_numeric(inputDf[RPT_COL_PUBLISHER_TARGET_CPA], errors='coerce')
# Force RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE to be Date type
inputDf[RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE] = pd.to_datetime(inputDf[RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE], errors='coerce').dt.date

## prepare caps from GSheets

# sheet 1 contains list of cap rules, with actual cap values on sheet 2
df_cap_rules = dataSourceDict['2_1'].rename(columns={
    'A': COL_BASELINE_COLUMN_KEY,
    'B': COL_BASELINE_COLUMN_NAME,
    'C': COL_FLOOR_CAP_PCT,
    'D': COL_CEILING_CAP_PCT,
    'E': COL_MONITOR_COLUMN_KEY,
    'F': COL_MONITOR_COLUMN_NAME
})

### DEBUGGING, set ceiling to 2%
# df_cap_rules[COL_CEILING_CAP_PCT] = 2

print("cap rules\n", tableize(df_cap_rules))

# Create a dictionary to bring in baseline columns and rename to match report column name
column_mapping = dict(zip(df_cap_rules[COL_BASELINE_COLUMN_KEY], df_cap_rules[COL_BASELINE_COLUMN_NAME]))
column_mapping['A'] = RPT_COL_CAMPAIGN
column_mapping['C'] = RPT_COL_ACCOUNT

# sheet 2 contains snapshot of campaign settings to be used as baseline
df_campaign_caps = dataSourceDict['2_2'][column_mapping.keys()].rename(columns=column_mapping)


# calculate actual campaign-level caps from each rule and save all intermediate columns
df_merged = pd.DataFrame()

for index, rule in df_cap_rules.iterrows():
    baseline_col = rule[COL_BASELINE_COLUMN_NAME]
    monitor_col = rule[COL_MONITOR_COLUMN_NAME]
    floor_cap_pct = rule[COL_FLOOR_CAP_PCT]
    ceiling_cap_pct = rule[COL_CEILING_CAP_PCT]

    # define col names
    COL_BASELINE = f"{baseline_col}_baseline"
    COL_FLOOR = f"{baseline_col}_floor"
    COL_CEILING = f"{baseline_col}_ceiling"
    COL_OVERRIDE = f"{baseline_col}_override"

    print(f"applying rule for baseline '{baseline_col}'")

    # calculate lower and upper caps, but only if original targers are defined and non-zero
    baseline_defined = (df_campaign_caps[baseline_col].notna() & df_campaign_caps[baseline_col] > 0)
    
    print(f"baseline value defined for {sum(baseline_defined)} rows")
    
    # floor should be expressed as NEGATIVE percentage (eg -25%)
    floor = df_campaign_caps[baseline_col] * (1 + floor_cap_pct / 100)
    ceiling = df_campaign_caps[baseline_col] * (1 + ceiling_cap_pct / 100)
    # remove caps if baseline is ill defined
    floor = floor.where(baseline_defined, np.nan)
    ceiling = ceiling.where(baseline_defined, np.nan)

    # copy baseline to ease debugging
    df_campaign_caps[COL_BASELINE] = df_campaign_caps[baseline_col]
    df_campaign_caps[COL_FLOOR] = floor
    df_campaign_caps[COL_CEILING] = ceiling

    # print(f"df_campaign_caps\n", df_campaign_caps.loc[baseline_defined].head(5))

    # merge campaign-level caps back into inputDf
    # Merge the floor and ceiling caps back into inputDf
    df_comparison = inputDf.merge(df_campaign_caps[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, COL_BASELINE, COL_FLOOR, COL_CEILING]], 
                            how='left', 
                            on=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN])
    
    print("df_comparion shape", df_comparison.shape)
    print("df_comparion dtypes\n", df_comparison.dtypes)

    # sanity check
    # comparables = df_comparison.loc[df_comparison[COL_CEILING].notna() & df_comparison[monitor_col].notna()]
    # print("df_comparison comparables sample\n", comparables.head(5))

    # check monitoring column (usually the Adjusted Recommendation column; e.g. what would be pushed to publisher)
    exceeded_floor = (df_comparison[COL_FLOOR].notna() & (df_comparison[monitor_col] < df_comparison[COL_FLOOR]))
    exceeded_ceiling = (df_comparison[COL_CEILING].notna() & (df_comparison[monitor_col] > df_comparison[COL_CEILING]))
    should_override = exceeded_floor | exceeded_ceiling
    # set Override flag for this baseline
    df_comparison.loc[should_override, COL_OVERRIDE] = VAL_OVERRIDE_ON
    print(f">> should enable Override for {sum(should_override)} campaigns")

    # Select specific columns to concatenate
    # Note: need to include RPT_COL_BUDGET_OVERRIDE to preserve original value prior to changing them
    columns_to_concatenate = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_BUDGET_OVERRIDE, COL_BASELINE, COL_FLOOR, COL_CEILING, COL_OVERRIDE]
    df_comparison_selected = df_comparison[columns_to_concatenate]

    # Combine for output and debugging
    if df_merged.empty:
        df_merged = df_comparison_selected
    else:
        df_merged = df_merged.merge(df_comparison_selected, 
                                    on=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_BUDGET_OVERRIDE], 
                                    how='left', 
                                    suffixes=('', '_new'))


# collect override flags for each rule
override_cols = [f"{col}_override" for col in df_cap_rules[COL_BASELINE_COLUMN_NAME]]
# Build to_override array based on whether a row has any override_cols set to VAL_OVERRIDE_ON
to_override = df_merged[override_cols].apply(lambda row: any(row == VAL_OVERRIDE_ON), axis=1)
print(f">> After applying all rules, will ENABLE Override for {sum(to_override)} campaigns")

# actually enable Override
df_merged.loc[to_override, RPT_COL_BUDGET_OVERRIDE] = VAL_OVERRIDE_ON
df_merged.loc[to_override, RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE] = today.strftime('%Y-%m-%d')

# disable Override for those not in to_override, if Override Date is set
overriden_by_scripts = (df_merged[RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE].astype('str').str.len() >= 10)
to_disable_override = ~to_override & overriden_by_scripts
print(f">> Will also DISABLE Override for {sum(to_disable_override)} campaigns")

# actually disable Override
df_merged.loc[to_disable_override, RPT_COL_BUDGET_OVERRIDE] = VAL_OVERRIDE_OFF
df_merged.loc[to_disable_override, RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE] = VAL_BLANK

# cleanup RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE
# RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE is a marker to indicate this Script actioned the Override. 
# If not Override, for whatever reason, then a non-blank RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE causes confusion.
has_override_date = (df_merged[RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE].astype('str').str.len() >= 10)
override_date_orphans = (df_merged[RPT_COL_BUDGET_OVERRIDE] != VAL_OVERRIDE_ON) & has_override_date
df_merged.loc[override_date_orphans, RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE] = VAL_BLANK
print(f"Cleaned up {override_date_orphans.sum()} orphaned Override Dates")

print("df_merged shape", df_merged.shape)
print("df_merged dtypes\n", df_merged.dtypes)

debugDf = df_merged

## Prepare Output

# fill na with blank string since can't compare na
df_merged[RPT_COL_BUDGET_OVERRIDE].fillna('', inplace=True)
originalDf[RPT_COL_BUDGET_OVERRIDE].fillna('', inplace=True)
df_merged[RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE].fillna('', inplace=True)
originalDf[RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE].fillna('', inplace=True)

# only include changed rows in bulk file
print(f"select_changed with df_merged shape {df_merged.shape} and originalDf shape {originalDf.shape}")
(outputDf, debugDf) = select_changed(df_merged, \
                                    originalDf, \
                                    diff_cols = [ \
                                        RPT_COL_BUDGET_OVERRIDE, \
                                    ], \
                                    select_cols = [ \
                                        RPT_COL_ACCOUNT, \
                                        RPT_COL_CAMPAIGN, \
                                        RPT_COL_BUDGET_OVERRIDE, \
                                        RPT_COL_AUTO__CAMPAIGN_OVERRIDE_DATE, \
                                    ], \
                                    merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN] \
                                    )


# remember to use Bulk column header for Override
outputDf = outputDf.rename(columns = { \
                RPT_COL_BUDGET_OVERRIDE: BULK_COL_CAMPAIGN_OVERRIDE \
                })

print("outputDf shape", outputDf.shape)
print("outputDf sample", tableize(outputDf.tail(5)))

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")
   

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus