Script 1063: Budget Staging for Strategies via GSheets #2

Purpose:

The Python script updates strategy spend targets by copying program budgets from Google Sheets to a local data structure.

To Elaborate

The script is designed to update strategy spend targets by transferring budget data from a Google Sheets document into a local data structure. It matches the ‘Abbreviation’ column in the Google Sheets with the ‘Strategy’ column in the local data to ensure accurate data mapping. The script processes the data to ensure that only active programs with valid strategy names and budget constraints are considered. It also ensures that the strategy target meets a minimum threshold and rounds the values to two decimal places. The script is capable of running both on a server and locally, with specific configurations for each environment. It ultimately outputs the updated strategy spend targets, ready for further processing or analysis.

Walking Through the Code

  1. Local Mode Configuration:
    • The script begins by setting up configurations for local execution, including paths for downloading and loading data from a pickle file. This setup is crucial for initializing the data source dictionary when running the script locally.
  2. Environment Check:
    • A function checks whether the script is executing on a server or locally. If running locally, it loads the data source dictionary from a specified pickle file and prints the shape and first few rows of each data entry for verification.
  3. Data Preparation:
    • The script prepares the primary and reference data sources, extracting relevant columns for processing. It sets up default values and custom columns for further operations.
  4. Loading and Renaming Budgets:
    • It loads the current month’s budgets from the Google Sheets, mapping the current month to a specific column key. The script filters and renames columns to align with the expected data structure.
  5. Data Cleanup:
    • The script cleans up the budget values by removing empty strategy rows, ensuring data types are correct, and applying a minimum strategy target. It also filters out inactive programs and invalid strategy names.
  6. Input Data Cleanup:
    • Similar cleanup operations are performed on the input data, ensuring valid strategy names and budget constraints. Missing targets are filled with default values for consistency.
  7. Data Merging and Output Preparation:
    • The cleaned input data is merged with the current month’s budgets. The script identifies changes in the target values and prepares the output data frame with updated strategy spend targets.
  8. Local Development Output:
    • If running in local development mode, the script writes the output and debug data frames to CSV files for further analysis or debugging.

Vitals

  • Script ID : 1063
  • Client ID / Customer ID: 1306926629 / 60270083
  • Action Type: Bulk Upload
  • Item Changed: Strategy
  • Output Columns: Strategy, Goal, Target Spend
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
  • Created by dwaidhas@marinsoftware.com on 2024-05-10 15:02
  • Last Updated by Mingxia Wu on 2025-03-04 10:13
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
##
## name: Strategy Target Staging - GSheets - All Campus
## description:
##  Copy Program Budgets from staging GSheets to update Strategy Spend Targets
##  Use GSheets 'Abbrevation' column to match with 'Strategy'
## 
## author: Michael S. Huang, Dana Waidhas
## created: 2024-03-15
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/allcampus_strategy_spend_target_20250104.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

    # import pandas
    import pandas as pd
    import numpy as np

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

# dial forward to preview next month budgets
# CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_TARGET = 'Target'
RPT_COL_CONSTRAINT_TYPE = 'Constraint Type'
RPT_COL_GOAL = 'Goal'

# reference data source and columns
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)
# To access 10th row of Column C, use gSheetsDf.loc[10, 'C']

# output columns and initial values
BULK_COL_STRATEGY = 'Strategy'
BULK_COL_GOAL = 'Goal'
BULK_COL_SPEND_TARGET = 'Target Spend'
outputDf[BULK_COL_GOAL] = "<<YOUR VALUE>>"
outputDf[BULK_COL_SPEND_TARGET] = "<<YOUR VALUE>>"

# default values
MINIMAL_STRATEGY_TARGET = 0.01

# custom columns
COL_ACTIVE_PROGRAM = 'active_program'

########### User Code Starts Here ###########

print("inputDf.shape", inputDf.shape)
print("gSheetsDf.shape", gSheetsDf.shape)
# print("dump raw gSheetsDf\n", gSheetsDf.to_string())

### Load Current Month Budgets from GSheets and rename

# For debugging, set arbitrary date
# today = datetime.date(2024, 4, 2)

# Construct column key by mapping current month to canonical column name
# Assume column W is January, X is February, Y is March, AH is Dec, etc. In terms of integer months, W=1, X=2, Y=3,.., AH=12.
month_to_column_mapping = {
    1: 'W', 2: 'X', 3: 'Y', 4: 'Z',
    5: 'AA', 6: 'AB', 7: 'AC', 8: 'AD',
    9: 'AE', 10: 'AF', 11: 'AG', 12: 'AH'
}
column_key = month_to_column_mapping[today.month]
print(f"GSheets column key for current month: {today.strftime('%B')} => {column_key}")

# load current month targets from Sheet "Budget Sheet", assumed to be the first Sheet
# - skip first row after header via .loc[1:]; note frozen header row automatically dropped
# - 'Active' is column C; only update active programs
# - 'Abbreviation' is column V
# - budgets from January to December are on columns W to AH
current_month_budgets = gSheetsDf \
                        .loc[1:, ['C', 'V', column_key]] \
                        .rename(columns={ \
                            'C': COL_ACTIVE_PROGRAM,
                            'V': RPT_COL_STRATEGY, \
                            column_key: RPT_COL_TARGET \
                        })
                        

print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print("current_month_budgets first 10 rows", current_month_budgets.head(10))

### cleanup Budget values from GSheets
# remove empty strategy rows
current_month_budgets = current_month_budgets.loc[current_month_budgets[RPT_COL_STRATEGY].notnull()]
# make sure Strategy is STR
current_month_budgets[RPT_COL_STRATEGY] = current_month_budgets[RPT_COL_STRATEGY].astype(str)
# make sure target is float; remove prefix if not
# note: can't check for `object` since not imported, so use `0` instead
if current_month_budgets[RPT_COL_TARGET].dtype == 'O':
    current_month_budgets[RPT_COL_TARGET] = current_month_budgets[RPT_COL_TARGET] \
                                    .str.replace('US$', '', case=False, regex=False) \
                                    .str.replace('$', '', case=False, regex=False) \
                                    .str.replace(',', '') \
                                    .str.strip() \
                                    .astype(float) \
                                    .round(2)

# Ensure that the strategy target is at least MINIMAL_STRATEGY_TARGET; and round to 2 decimals
current_month_budgets[RPT_COL_TARGET] = current_month_budgets[RPT_COL_TARGET].apply(
    lambda x: round(max(x, MINIMAL_STRATEGY_TARGET), 2)
)

# remove empty target rows
has_strategy = current_month_budgets[RPT_COL_STRATEGY].notnull() & \
                (current_month_budgets[RPT_COL_STRATEGY].str.len() > 3)
has_strategy_target = current_month_budgets[RPT_COL_TARGET].notnull() & \
                (current_month_budgets[RPT_COL_TARGET] > 0)
active_program = current_month_budgets[COL_ACTIVE_PROGRAM].str.lower() == 'yes'.lower()
current_month_budgets = current_month_budgets.loc[has_strategy & has_strategy_target & active_program]

print("after cleanup current monthly budget")
print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print("current_month_budgets last 10 rows\n", tableize(current_month_budgets.tail(10)))

### Cleanup input

# Convert RPT_COL_STRATEGY_TARGET to numeric, coercing errors to NaN
inputDf[RPT_COL_TARGET] = pd.to_numeric(inputDf[RPT_COL_TARGET], errors='coerce').round(2)
# Replace NaN values with default
inputDf[RPT_COL_TARGET].fillna(MINIMAL_STRATEGY_TARGET, inplace=True)

# convert Strategy column into string and remove blank or non-numeric rows
inputDf[RPT_COL_STRATEGY] = inputDf[RPT_COL_STRATEGY].astype(str)
valid_strategy_name = inputDf[RPT_COL_STRATEGY].str.len() > 3

# convert Constraint column into string
# NB: Bulk needs to include Goal column in order to change Spend Target, hence, 
#  remove non-Budget strategies just in case.
inputDf[RPT_COL_CONSTRAINT_TYPE] = inputDf[RPT_COL_CONSTRAINT_TYPE].astype(str)
budget_constraint = inputDf[RPT_COL_CONSTRAINT_TYPE] == 'Spend'

# apply cleanup filters
inputDf = inputDf.loc[valid_strategy_name & budget_constraint]

# fill in missing target with default for comparison later
inputDf = inputDf.fillna(value={RPT_COL_TARGET: MINIMAL_STRATEGY_TARGET})

print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
print("inputDf.info", inputDf.info())
print("inputDf dump\n", tableize(inputDf.head()))

# make copy of input for use with select_changed
originalDf = inputDf.copy()

### Merge inputDf with current_month_budgets, and fill in missing target with default
mergedDf = inputDf.merge(current_month_budgets, on=RPT_COL_STRATEGY, how='left', suffixes=('_old', '')) \
                  .fillna(value={RPT_COL_TARGET: MINIMAL_STRATEGY_TARGET})

print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5).to_string())

outputDf, debugDf = select_changed(mergedDf, 
                          originalDf,
                          diff_cols=[RPT_COL_TARGET],
                          select_cols=[RPT_COL_STRATEGY, RPT_COL_GOAL, RPT_COL_TARGET],
                          merged_cols=[RPT_COL_STRATEGY]
                          )

# Bulk column is 'Target Spend', whereas report is 'Target'
outputDf.rename(columns={RPT_COL_TARGET: BULK_COL_SPEND_TARGET}, inplace=True)


change_to_zero_target = (debugDf[RPT_COL_TARGET + '_new'] < 0.1) & \
                        (debugDf[RPT_COL_TARGET + '_orig'] > 0.1)
print("count of campaigns with target cleared", sum(change_to_zero_target))
print("campaigns with target cleared", tableize(debugDf.loc[change_to_zero_target].head()))

print("outputDf.shape", outputDf.shape)
print("outputDf sample")
print(tableize(outputDf.tail(10)))

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus