Script 519: Epicor Budget Staging GSheets

Purpose

The Python script automates the process of copying and updating Epicor monthly budgets from Google Sheets to a staging area on the first day of each month.

To Elaborate

The script is designed to facilitate the transfer and update of monthly budget data from Google Sheets to a staging environment for Epicor, a business management software. It specifically targets campaigns with an EpicorID and compares their budget data from Google Sheets with existing data. If discrepancies are found, these campaigns are included in a bulk update sheet. The script ensures that the budget data is correctly formatted and cleansed before merging it with existing data. This process is crucial for maintaining accurate budget records and ensuring that any changes in budget allocations are captured and updated in the system efficiently.

Walking Through the Code

  1. Initialization and Configuration:
    • The script begins by setting up a local mode configuration, which includes downloading a preview input and setting a path for a pickle file containing data source information.
    • It checks whether the script is running on a server or locally, and loads the dataSourceDict from a pickle file if running locally.
  2. Data Preparation:
    • The script imports necessary libraries such as pandas, numpy, and utility functions from marin_scripts_utils.
    • It defines the primary data source and reference data source, which are data frames containing budget information from Epicor and Google Sheets, respectively.
  3. Data Cleaning and Transformation:
    • The script converts budget columns to numeric types and fills missing values with zeros.
    • It extracts and cleans the current month’s budget data from Google Sheets, ensuring that EpicorIDs are strings and budgets are floats.
  4. Data Merging and Comparison:
    • The cleaned Google Sheets data is merged with the existing Epicor data.
    • The script identifies changes in budget allocations by comparing the merged data with the original data, focusing on campaigns with different budget values.
  5. Output Preparation:
    • The script prepares the output data frame, which includes campaigns with updated budget information.
    • It also identifies and logs campaigns where the budget has been cleared to zero, ensuring transparency in budget changes.

Vitals

  • Script ID : 519
  • Client ID / Customer ID: 1306917127 / 60268084
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, epicorID - Monthly Budget
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Michael Huang (mhuang@marinsoftware.com)
  • Created by Michael Huang on 2023-11-10 12:43
  • Last Updated by Michael Huang on 2024-02-01 06:32
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
##
## name: Epicor Budget Staging - GSheets
## description:
##  Copy Epicor budget from staging GSheets on month start
## 
## author: Michael S. Huang
## created: 2023-11-10
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=True
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/heartland_dental_staging_20240131_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

    # import pandas
    import pandas as pd
    import numpy as np

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

# dial forward to preview next month budgets
# CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_EPICORID = 'epicorID'
RPT_COL_EPICORID_MONTHLY_BUDGET = 'epicorID - Monthly Budget'
RPT_COL_EPICORID_MONTHLY_BUDGET__STAGING = 'epicorID - Monthly Budget - Staging'
RPT_COL_EPICORID_MONTHLY_BUDGET__STAGING_MONTH = 'epicorID - Monthly Budget - Staging Month'

# reference data source and columns
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)
# To access 10th row of Column C, use gSheetsDf.loc[10, 'C']

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_EPICORID__MONTHLY_BUDGET = 'epicorID - Monthly Budget'
outputDf[BULK_COL_EPICORID__MONTHLY_BUDGET] = "<<YOUR VALUE>>"

########### User Code Starts Here ###########

## force expected types
# Convert RPT_COL_EPICORID_MONTHLY_BUDGET to numeric, coercing errors to NaN
inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET] = pd.to_numeric(inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET].fillna(0.0, inplace=True)

### Load current month Search Budget from GSheets 
key = f"2_{today.month}"

print(f"GSheets key for current month: {key}")
# get columns A and C from current month tab of GSheets
current_month_budgets = dataSourceDict[key] \
                        .loc[:, ['A', 'C']] \
                        .rename(columns={ \
                            'A' : RPT_COL_EPICORID, \
                            'C' : RPT_COL_EPICORID_MONTHLY_BUDGET \
                        })

print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.dtypes", current_month_budgets.dtypes)

### cleanup Budget values from GSheets
# remove empty epicorID rows
current_month_budgets = current_month_budgets.loc[current_month_budgets[RPT_COL_EPICORID].notnull()]
# make sure EpicorID is STR of Integers
current_month_budgets[RPT_COL_EPICORID] = current_month_budgets[RPT_COL_EPICORID].astype(str).str.extract('(\d+)')[0]
# make sure budget is float; remove prefix if not
# note: can't check for `object` since not imported, so use `0` instead
if current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET].dtype == 'O':
    current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET] = current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET] \
                                    .str.replace('US$', '', case=False, regex=False) \
                                    .str.replace(',', '') \
                                    .str.strip() \
                                    .astype(float)

# remove empty budget rows
has_epicor_id = current_month_budgets[RPT_COL_EPICORID].notnull() & \
                (current_month_budgets[RPT_COL_EPICORID].str.len() > 3)
has_epicor_budget = current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET].notnull() & \
                (current_month_budgets[RPT_COL_EPICORID_MONTHLY_BUDGET] > 0.5)
current_month_budgets = current_month_budgets.loc[ has_epicor_id & has_epicor_budget]

print("after cleanup gsheets")
print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.dtypes", current_month_budgets.dtypes)
print(current_month_budgets.head().to_string())

### Cleanup report

### Cleanup report

### convert EpicorID column into string and remove blank or non-numeric rows
inputDf[RPT_COL_EPICORID] = inputDf[RPT_COL_EPICORID].astype(str)
valid_epicor = inputDf[RPT_COL_EPICORID].str.isdigit() & inputDf[RPT_COL_EPICORID] != ''
inputDf = inputDf.loc[valid_epicor]

print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
print("intpuDf.dtypes", inputDf.dtypes)

# fill in missing budget as 0 for comparison later
inputDf = inputDf.fillna(value={RPT_COL_EPICORID_MONTHLY_BUDGET: 0})

# make copy of input for use with select_changed
originalDf = inputDf.copy()

### Merge inputDf with current_month_budgets, and fill in missing budget as 0
mergedDf = inputDf.merge(current_month_budgets, on=RPT_COL_EPICORID, how='left', suffixes=('_old', '')) \
                  .fillna(value={RPT_COL_EPICORID_MONTHLY_BUDGET: 0})

print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5).to_string())

outputDf, debugDf = select_changed(mergedDf, 
                          originalDf,
                          diff_cols=[RPT_COL_EPICORID_MONTHLY_BUDGET],
                          select_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_EPICORID, RPT_COL_EPICORID_MONTHLY_BUDGET],
                          # HACK: comment out next line on server
                          merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN]
                          )

change_to_zero_budget = (debugDf[RPT_COL_EPICORID_MONTHLY_BUDGET + '_new'] < 1) & \
                        (debugDf[RPT_COL_EPICORID_MONTHLY_BUDGET + '_orig'] > 1)
print("count of campaigns with budget cleared", sum(change_to_zero_budget))
print("campaigns with budget cleared", tableize(debugDf.loc[change_to_zero_budget].head()))

print("outputDf.shape", outputDf.shape)
print("outputDf head")
print(tableize(outputDf.head()))


Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus