Script 519: Epicor Budget Staging GSheets

Purpose:

The Python script automates the process of copying and updating Epicor monthly budgets from Google Sheets to a staging area at the start of each month.

To Elaborate

The script is designed to facilitate the transfer and update of monthly budget data from Google Sheets to a staging environment for Epicor, a business management software. It operates by first loading budget data from a specified Google Sheets document and then comparing it with existing data in the staging area. The script identifies campaigns with an EpicorID and updates their monthly budgets if there are discrepancies between the Google Sheets data and the current staging data. This ensures that the budget allocations are accurate and up-to-date at the beginning of each month, streamlining the budget management process for campaigns.

Walking Through the Code

  1. Local Mode Configuration:
    • The script begins with a configuration section for local execution, where it checks if the code is running on a server or locally.
    • If running locally, it loads a dataSourceDict from a pickle file specified by the pickle_path. This dictionary contains data frames used throughout the script.
  2. Data Preparation:
    • The script prepares the input data by converting budget columns to numeric types and handling missing values.
    • It loads the current month’s budget data from Google Sheets, ensuring that the EpicorID and budget values are correctly formatted and non-empty.
  3. Data Cleaning:
    • Both the input data and the Google Sheets data undergo cleaning processes to ensure that only valid, numeric EpicorIDs and non-zero budgets are considered.
  4. Data Merging and Comparison:
    • The cleaned input data is merged with the current month’s budget data from Google Sheets.
    • The script uses a utility function select_changed to identify and select rows where the budget has changed, preparing these for further processing.
  5. Output Preparation:
    • The script prepares the final output data frame, which includes campaigns with updated budgets, and prints relevant information for debugging and verification purposes.

Vitals

  • Script ID : 519
  • Client ID / Customer ID: 1306917127 / 60268084
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, epicorID - Monthly Budget
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Michael Huang (mhuang@marinsoftware.com)
  • Created by Michael Huang on 2023-11-10 12:43
  • Last Updated by Michael Huang on 2024-02-01 06:32
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
##
## name: Epicor Budget Staging - GSheets
## description:
##  Copy Epicor budget from staging GSheets on month start
## 
## author: Michael S. Huang
## created: 2023-11-10
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=True
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/heartland_dental_staging_20240131_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

    # import pandas
    import pandas as pd
    import numpy as np

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

# dial forward to preview next month budgets
# CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_EPICORID = 'epicorID'
RPT_COL_EPICORID_MONTHLY_BUDGET = 'epicorID - Monthly Budget'
RPT_COL_EPICORID_MONTHLY_BUDGET__STAGING = 'epicorID - Monthly Budget - Staging'
RPT_COL_EPICORID_MONTHLY_BUDGET__STAGING_MONTH = 'epicorID - Monthly Budget - Staging Month'

# reference data source and columns
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)
# To access 10th row of Column C, use gSheetsDf.loc[10, 'C']

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_EPICORID__MONTHLY_BUDGET = 'epicorID - Monthly Budget'
outputDf[BULK_COL_EPICORID__MONTHLY_BUDGET] = "<<YOUR VALUE>>"

########### User Code Starts Here ###########

## force expected types
# Convert RPT_COL_EPICORID_MONTHLY_BUDGET to numeric, coercing errors to NaN
inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET] = pd.to_numeric(inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET].fillna(0.0, inplace=True)

### Load current month Search Budget from GSheets 
key = f"2_{today.month}"

print(f"GSheets key for current month: {key}")
# get columns A and C from current month tab of GSheets
current_month_budgets = dataSourceDict[key] \
                        .loc[:, ['A', 'C']] \
                        .rename(columns={ \
                            'A' : RPT_COL_EPICORID, \
                            'C' : RPT_COL_EPICORID_MONTHLY_BUDGET \
                        })

print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.dtypes", current_month_budgets.dtypes)

### cleanup Budget values from GSheets
# remove empty epicorID rows
current_month_budgets = current_month_budgets.loc[current_month_budgets[RPT_COL_EPICORID].notnull()]
# make sure EpicorID is STR of Integers
current_month_budgets[RPT_COL_EPICORID] = current_month_budgets[RPT_COL_EPICORID].astype(str).str.extract('(\d+)')[0]
# make sure budget is float; remove prefix if not
# note: can't check for `object` since not imported, so use `0` instead
if current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET].dtype == 'O':
    current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET] = current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET] \
                                    .str.replace('US$', '', case=False, regex=False) \
                                    .str.replace(',', '') \
                                    .str.strip() \
                                    .astype(float)

# remove empty budget rows
has_epicor_id = current_month_budgets[RPT_COL_EPICORID].notnull() & \
                (current_month_budgets[RPT_COL_EPICORID].str.len() > 3)
has_epicor_budget = current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET].notnull() & \
                (current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET] > 0.5)
current_month_budgets = current_month_budgets.loc[ has_epicor_id & has_epicor_budget]

print("after cleanup gsheets")
print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.dtypes", current_month_budgets.dtypes)
print(current_month_budgets.head().to_string())

### Cleanup report

### Cleanup report

### convert EpicorID column into string and remove blank or non-numeric rows
inputDf[RPT_COL_EPICORID] = inputDf[RPT_COL_EPICORID].astype(str)
valid_epicor = inputDf[RPT_COL_EPICORID].str.isdigit() & inputDf[RPT_COL_EPICORID] != ''
inputDf = inputDf.loc[valid_epicor]

print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
print("intpuDf.dtypes", inputDf.dtypes)

# fill in missing budget as 0 for comparison later
inputDf = inputDf.fillna(value={RPT_COL_EPICORID_MONTHLY_BUDGET: 0})

# make copy of input for use with select_changed
originalDf = inputDf.copy()

### Merge inputDf with current_month_budgets, and fill in missing budget as 0
mergedDf = inputDf.merge(current_month_budgets, on=RPT_COL_EPICORID, how='left', suffixes=('_old', '')) \
                  .fillna(value={RPT_COL_EPICORID_MONTHLY_BUDGET: 0})

print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5).to_string())

outputDf, debugDf = select_changed(mergedDf, 
                          originalDf,
                          diff_cols=[RPT_COL_EPICORID_MONTHLY_BUDGET],
                          select_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_EPICORID, RPT_COL_EPICORID_MONTHLY_BUDGET],
                          # HACK: comment out next line on server
                          merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN]
                          )

change_to_zero_budget = (debugDf[RPT_COL_EPICORID_MONTHLY_BUDGET + '_new'] < 1) & \
                        (debugDf[RPT_COL_EPICORID_MONTHLY_BUDGET + '_orig'] > 1)
print("count of campaigns with budget cleared", sum(change_to_zero_budget))
print("campaigns with budget cleared", tableize(debugDf.loc[change_to_zero_budget].head()))

print("outputDf.shape", outputDf.shape)
print("outputDf head")
print(tableize(outputDf.head()))


Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus