Script 817: SBA Budget Staging via GSheets

Purpose

Python script that copies the SBA Budget from a staging Google Sheets and matches the “Abbreviation” column with the “SBA Strategy” column in Marin.

To Elaborate

This Python script solves the problem of copying the SBA Budget from a staging Google Sheets and matching the “Abbreviation” column with the “SBA Strategy” column in Marin. The script retrieves the current month’s budgets from the Google Sheets, cleans up the data, and merges it with the input data from Marin. It then selects the changed budgets and outputs the result.

Walking Through the Code

  1. The script starts by checking if it is running on the server or locally.
  2. If running on the server, it skips the initialization step. If running locally, it loads the data source dictionary from a pickled file.
  3. The script sets up the necessary configurations and imports the required libraries.
  4. It defines the primary data source and column constants.
  5. It retrieves the Google Sheets data and assigns it to the gSheetsDf variable.
  6. The script sets up the output columns and initial values.
  7. The user code starts here.
  8. The script loads the current month’s budgets from the Google Sheets and renames the columns.
  9. It cleans up the budget values from the Google Sheets by removing empty rows and ensuring the data types are correct.
  10. The script cleans up the input data by converting the budget column to numeric and removing blank or non-numeric rows.
  11. It fills in missing budgets as 0 for comparison later.
  12. The script merges the input data with the current month’s budgets and fills in missing budgets as 0.
  13. It selects the changed budgets using the select_changed function and assigns the result to the outputDf and debugDf variables.
  14. The script checks for campaigns with cleared budgets and outputs the count and details.
  15. Finally, it prints the shape and sample of the outputDf dataframe.
  16. If running locally, the script writes the outputDf and debugDf dataframes to CSV files for local debugging.

Vitals

  • Script ID : 817
  • Client ID / Customer ID: 1306926629 / 60270083
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, SBA Campaign Budget
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Michael Huang (mhuang@marinsoftware.com)
  • Created by Michael Huang on 2024-03-14 21:17
  • Last Updated by dwaidhas@marinsoftware.com on 2024-03-18 20:18
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
##
## name: SBA Budget Staging - GSheets - All Campus
## description:
##  Copy SBA Budget from staging GSheets for SBA Campaign Budget Pacing
##  Use GSheets 'Abbrevation' column to match with 'SBA Strategy'
## 
## author: Michael S. Huang
## created: 2024-03-15
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/allcampus_budget_staging_20240315.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

    # import pandas
    import pandas as pd
    import numpy as np

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

# dial forward to preview next month budgets
# CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_SBA_CAMPAIGN_BUDGET = 'SBA Campaign Budget' # monthly budget for budget group specified via SBA Strategy 

# reference data source and columns
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)
# To access 10th row of Column C, use gSheetsDf.loc[10, 'C']

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_SBA_CAMPAIGN_BUDGET = 'SBA Campaign Budget'
outputDf[BULK_COL_SBA_CAMPAIGN_BUDGET] = "<<YOUR VALUE>>"

########### User Code Starts Here ###########

### Load Current Month Budgets from GSheets and rename

# For debugging, set arbitrary date
# today = datetime.date(2024, 4, 2)

# Construct column key by mapping current month to canonical column name
# Assume column F is January, G is February, H is March, Q is Dec, etc. In terms of integer months, F=1, G=2, H=3,.., Q=12.
column_key = chr(64 + today.month + 5)  
print(f"GSheets column key for current month: {today.strftime(('%B'))} => {column_key}")

# load current month
# - skip first 3 rows via .loc[2:]
# - 'Abbreviation' is column T
# - budgets from January to December are on columns F to Q
current_month_budgets = dataSourceDict['2_1'] \
                        .loc[2:, ['T', column_key]] \
                        .rename(columns={ \
                            'T' : RPT_COL_SBA_STRATEGY, \
                            column_key : RPT_COL_SBA_CAMPAIGN_BUDGET \
                        })

print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print("current_month_budgets first 10 rows", current_month_budgets.head(10))
### cleanup Budget values from GSheets
# remove empty epicorID rows
current_month_budgets = current_month_budgets.loc[current_month_budgets[RPT_COL_SBA_STRATEGY].notnull()]
# make sure SBA Strategy is STR
current_month_budgets[RPT_COL_SBA_STRATEGY] = current_month_budgets[RPT_COL_SBA_STRATEGY].astype(str)
# make sure budget is float; remove prefix if not
# note: can't check for `object` since not imported, so use `0` instead
if current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET].dtype == 'O':
    current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET] = current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET] \
                                    .str.replace('US$', '', case=False, regex=False) \
                                    .str.replace('$', '', case=False, regex=False) \
                                    .str.replace(',', '') \
                                    .str.strip() \
                                    .astype(float)

# remove empty budget rows
has_budget_group = current_month_budgets[RPT_COL_SBA_STRATEGY].notnull() & \
                (current_month_budgets[RPT_COL_SBA_STRATEGY].str.len() > 3)
has_budget_group_cap = current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET].notnull() & \
                (current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET] > 0.5)
current_month_budgets = current_month_budgets.loc[ has_budget_group & has_budget_group_cap]

print("after cleanup gsheets")
print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print(current_month_budgets.head().to_string())

### Cleanup input

# Convert RPT_COL_EPICORID_MONTHLY_BUDGET to numeric, coercing errors to NaN
inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET] = pd.to_numeric(inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET].fillna(0.0, inplace=True)

### convert SBA Strategy column into string and remove blank or non-numeric rows
inputDf[RPT_COL_SBA_STRATEGY] = inputDf[RPT_COL_SBA_STRATEGY].astype(str)
valid_abbrev = inputDf[RPT_COL_SBA_STRATEGY].str.len() > 3
inputDf = inputDf.loc[valid_abbrev]

print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
print("intpuDf.info", inputDf.info())

# fill in missing budget as 0 for comparison later
inputDf = inputDf.fillna(value={RPT_COL_SBA_CAMPAIGN_BUDGET: 0})

# make copy of input for use with select_changed
originalDf = inputDf.copy()

### Merge inputDf with current_month_budgets, and fill in missing budget as 0
mergedDf = inputDf.merge(current_month_budgets, on=RPT_COL_SBA_STRATEGY, how='left', suffixes=('_old', '')) \
                  .fillna(value={RPT_COL_SBA_CAMPAIGN_BUDGET: 0})

print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5).to_string())

outputDf, debugDf = select_changed(mergedDf, 
                          originalDf,
                          diff_cols=[RPT_COL_SBA_CAMPAIGN_BUDGET],
                          select_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_SBA_STRATEGY, RPT_COL_SBA_CAMPAIGN_BUDGET],
                          merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN]
                          )

change_to_zero_budget = (debugDf[RPT_COL_SBA_CAMPAIGN_BUDGET + '_new'] < 1) & \
                        (debugDf[RPT_COL_SBA_CAMPAIGN_BUDGET + '_orig'] > 1)
print("count of campaigns with budget cleared", sum(change_to_zero_budget))
print("campaigns with budget cleared", tableize(debugDf.loc[change_to_zero_budget].head()))

print("outputDf.shape", outputDf.shape)
print("outputDf sample")
print(tableize(outputDf.tail(10)))

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2024-05-15 07:44:05 GMT

comments powered by Disqus