Script 1085: Update Strategy Spend Target via Master Sheet

Purpose

Lookup new month’s spend target and update via strategy bulk file

To Elaborate

This Python script solves the problem of updating the spend target for each strategy in a bulk file using the spend target values from a master sheet. The key business rules are as follows:

  • The script is limited to strategies that have a budget constraint.
  • The spend target values are looked up from the master sheet based on the current month.
  • The script combines data from multiple sheets in the master sheet based on configurable sheet indices.
  • The script cleans up the budget values from the master sheet by removing empty strategy rows and ensuring the strategy and target columns have the correct data types.
  • The script also cleans up the input data by converting the strategy target column to numeric and filtering out non-budget strategies.
  • The script compares the cleaned input data with the spend target values from the master sheet and selects the strategies that have a changed spend target.
  • The selected strategies are then updated in the output bulk file by replacing the spend target column values.

Walking Through the Code

  1. The script starts by defining configurable parameters, such as the sheet indices to import budgets from the master sheet.
  2. It then checks if the script is running on a server or locally and loads the necessary data accordingly.
  3. The current date is determined in the client’s timezone.
  4. The primary data source and columns are defined, including the input data from the master sheet and the output columns for the bulk file.
  5. The user code starts with a debugging section where the current date is set to an arbitrary value for testing purposes.
  6. The current month’s budgets are loaded from the master sheet based on the sheet indices and the current month’s column key.
  7. The loaded budgets are cleaned up by removing empty strategy rows and ensuring the correct data types for the strategy and target columns.
  8. The input data is cleaned up by converting the strategy target column to numeric and filtering out non-budget strategies.
  9. The input data is then filled with 0 for missing target values for later comparison.
  10. The input data is merged with the current month’s budgets, dropping rows with missing target values.
  11. The merged data is compared with the original input data to select the strategies that have a changed spend target.
  12. The selected strategies are updated in the output bulk file by replacing the spend target column values.
  13. If running in local development mode, the output and debug dataframes are saved as CSV files for further analysis.

Vitals

  • Script ID : 1085
  • Client ID / Customer ID: 1306927739 / 60270345
  • Action Type: Bulk Upload (Preview)
  • Item Changed: Strategy
  • Output Columns: Strategy, Spend Target, Goal
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Michael Huang (mhuang@marinsoftware.com)
  • Created by Michael Huang on 2024-05-14 00:39
  • Last Updated by Michael Huang on 2024-05-14 21:28
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
##
## name: Update Strategy Spend Target from Master Sheet (GSheets)
## description:
##  - Looks up and apply pre-calculated Spend Target for each Strategy via GSheets
##  - Limited to Strategy using Budget constraint
## 
## author: Michael S. Huang
## created: 2024-05-14
## 

########### START: Configurable Parameters ###########
# GSheets sheet indices to import budgets
SHEET_INDEX_IMPORT = [
                      1,  # Branthaven
                      2,  # Starward Homes
                    #   3,  # Remington Homes
                    #   4,  # Olive & Fig
                    #   5,  # Dream Western
                    #   6,  # Maple Reinders
                    #   7,  # DREAM - GTA
                    #   8,  # DREAM - Canary House
                    #   9, # Zibi
                    #   10, # Dawn Victoria Homes
                    #   11, # Midtown Car Wash - Waterdown
                    #   12, # DEWALT - SBD 
                    #   13, # SBD CRAFTSMAN Canada
                     ]

########### END:   Configurable Parameters ###########

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/roc_media_update_strategy_target_20240514.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

    # import pandas
    import pandas as pd
    import numpy as np

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_STRATEGY_TARGET = 'Target'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_GOAL = 'Goal'
RPT_COL_CONSTRAINT = 'Constraint'
RPT_COL_ROLL_OVER_STATUS = 'Roll Over Status'

# output columns and initial values
BULK_COL_GOAL = 'Goal'
BULK_COL_SPEND_TARGET = 'Spend Target'
outputDf[BULK_COL_GOAL] = "<<YOUR VALUE>>"
outputDf[BULK_COL_SPEND_TARGET] = "<<YOUR VALUE>>"

########### User Code Starts Here ###########

### DEBUGGING. Comment out in PROD run
# set arbitrary date
today = datetime.date(2024, 6, 1)


### Load Current Month Budgets from GSheets and rename

# For debugging, set arbitrary date
# today = datetime.date(2024, 4, 2)

# Construct column key by mapping current month to canonical column name
# Assume column D is January, H is February, H is March, AV is Dec (each month is 4 columns to the right of previous)
month_to_column_mapping = {
    1: 'D', 2: 'H', 3: 'L', 4: 'P',
    5: 'T', 6: 'X', 7: 'AB', 8: 'AF',
    9: 'AJ', 10: 'AN', 11: 'AR', 12: 'AV'
}
column_key = month_to_column_mapping[today.month]
print(f"GSheets column key for current month: {today.strftime(('%B'))} => {column_key}")

# Combine DataFrames from dataSourceDict based on SHEET_INDEX_IMPORT keys
combined_df_list = []
for sheet_index in SHEET_INDEX_IMPORT:
    key = f'2_{sheet_index}'
    if key in dataSourceDict:
        # load current month budget from sheet
        # - skip first 5 rows after header row via .loc[5:]
        # - 'Strategy' is column A
        budget = dataSourceDict[key] \
                                .loc[5:, ['A', column_key]] \
                                .rename(columns={ \
                                    'A' : RPT_COL_STRATEGY, \
                                    column_key : RPT_COL_STRATEGY_TARGET \
                                })
        combined_df_list.append(budget)
        print(f"Sheet {key} loaded.")

# Concatenate all DataFrames into a single DataFrame
current_month_budgets = pd.concat(combined_df_list, ignore_index=True)

print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print("current_month_budgets first 10 rows", current_month_budgets.head(10))

### cleanup Budget values from GSheets
# remove empty strategy rows
current_month_budgets = current_month_budgets.loc[current_month_budgets[RPT_COL_STRATEGY].notnull()]
# make sure Strategy is STR
current_month_budgets[RPT_COL_STRATEGY] = current_month_budgets[RPT_COL_STRATEGY].astype(str)
# make sure target is float; remove prefix if not
# note: can't check for `object` since not imported, so use `0` instead
if current_month_budgets[RPT_COL_STRATEGY_TARGET].dtype == 'O':
    current_month_budgets[RPT_COL_STRATEGY_TARGET] = current_month_budgets[RPT_COL_STRATEGY_TARGET] \
                                    .str.replace('US$', '', case=False, regex=False) \
                                    .str.replace('$', '', case=False, regex=False) \
                                    .str.replace(',', '') \
                                    .str.strip() \
                                    .astype(float) \
                                    .round(2)

# remove empty target rows
has_strategy = current_month_budgets[RPT_COL_STRATEGY].notnull() & \
                (current_month_budgets[RPT_COL_STRATEGY].str.len() > 3)
has_strategy_target = current_month_budgets[RPT_COL_STRATEGY_TARGET].notnull() & \
                (current_month_budgets[RPT_COL_STRATEGY_TARGET] > 0.5)
current_month_budgets = current_month_budgets.loc[has_strategy & has_strategy_target]

print("after cleanup gsheets")
print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print(current_month_budgets.head(10).to_string())


### Cleanup input

# Convert RPT_COL_STRATEGY_TARGET to numeric, coercing errors to NaN
inputDf[RPT_COL_STRATEGY_TARGET] = pd.to_numeric(inputDf[RPT_COL_STRATEGY_TARGET], errors='coerce')

# convert Constraint column into string
# NB: Bulk needs to include Goal column in order to change Spend Target, hence, 
#  remove non-Budget strategies just in case.
inputDf[RPT_COL_CONSTRAINT] = inputDf[RPT_COL_CONSTRAINT].astype(str)
budget_constraint = inputDf[RPT_COL_CONSTRAINT] == 'Budget'


# apply cleanup filters
inputDf = inputDf.loc[budget_constraint].reset_index()

print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
print("inputDf.info", inputDf.info())
print(inputDf.head(10).to_string())


# fill in missing target as 0 for comparison later
inputDf = inputDf.fillna(value={RPT_COL_STRATEGY_TARGET: 0})

# make copy of input for use with select_changed
originalDf = inputDf.copy()

### Merge inputDf with current_month_budgets, and drop rows missing target
mergedDf = inputDf.merge(current_month_budgets, on=RPT_COL_STRATEGY, how='left', suffixes=('_old', '')) \
                  .dropna(subset=[RPT_COL_STRATEGY_TARGET]) \
                  .reset_index()

print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5).to_string())

outputDf, debugDf = select_changed(mergedDf, 
                          originalDf,
                          diff_cols=[RPT_COL_STRATEGY_TARGET],
                          select_cols=[RPT_COL_STRATEGY, RPT_COL_GOAL, RPT_COL_STRATEGY_TARGET],
                          merged_cols=[RPT_COL_STRATEGY]
                          )

# Bulk column is 'Spend Target', whereas report is 'Target'
outputDf.rename(columns={RPT_COL_STRATEGY_TARGET: BULK_COL_SPEND_TARGET}, inplace=True)

print("outputDf.shape", outputDf.shape)
print("outputDf sample")
print(tableize(outputDf.tail(10)))

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2024-05-15 07:44:05 GMT

comments powered by Disqus