Script 817: SBA Budget Staging via GSheets

Purpose:

The Python script automates the process of copying and updating monthly budget allocations from Google Sheets to a structured budget allocation system by matching campaign strategies.

To Elaborate

The script is designed to facilitate the transfer of monthly budget data from a Google Sheets document to a structured budget allocation (SBA) system. It achieves this by matching campaign strategies listed in the Google Sheets with those in the SBA system. The script processes the budget data by cleaning and formatting it, ensuring that only valid entries are considered. It then merges this data with existing budget information, updating the SBA system with any changes. This process helps maintain accurate and up-to-date budget allocations for various campaigns, ensuring that financial resources are allocated efficiently and in accordance with the latest strategic plans.

Walking Through the Code

Local Mode Configuration:
- The script begins by setting up configurations for local execution, including paths for data files and determining the execution environment (server or local).
- It checks if the script is running on a server or locally, and loads data from a pickle file if running locally.
Data Preparation:
- The script identifies the current month and constructs a column key to access the relevant budget data from Google Sheets.
- It loads the current month’s budget data, skipping initial rows and renaming columns for consistency.
Data Cleaning:
- The script cleans the budget data by removing rows with missing or invalid entries and ensuring that budget values are numeric.
- It also cleans the input data by converting relevant columns to strings and removing invalid entries.
Data Merging and Comparison:
- The cleaned input data is merged with the current month’s budget data based on the SBA strategy.
- The script identifies changes in budget allocations and prepares the updated data for output.
Output and Debugging:
- The script generates output data reflecting the updated budget allocations and provides debugging information if executed in a local development environment.

Vitals

Script ID : 817
Client ID / Customer ID: 1306926629 / 60270083
Action Type: Bulk Upload (Preview)
Item Changed: Campaign
Output Columns: Account, Campaign, SBA Campaign Budget
Linked Datasource: M1 Report
Reference Datasource: Google Sheets
Owner: Michael Huang (mhuang@marinsoftware.com)
Created by Michael Huang on 2024-03-14 21:17
Last Updated by dwaidhas@marinsoftware.com on 2024-05-30 13:18

> See it in Action

Python Code

##
## name: SBA Budget Staging - GSheets - All Campus
## description:
##  Copy SBA Budget from staging GSheets for SBA Campaign Budget Pacing
##  Use GSheets 'Abbrevation' column to match with 'SBA Strategy'
## 
## author: Michael S. Huang
## created: 2024-03-15
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/allcampus_budget_staging_20240315.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

    # import pandas
    import pandas as pd
    import numpy as np

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

# dial forward to preview next month budgets
# CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_SBA_CAMPAIGN_BUDGET = 'SBA Campaign Budget' # monthly budget for budget group specified via SBA Strategy 

# reference data source and columns
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)
# To access 10th row of Column C, use gSheetsDf.loc[10, 'C']

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_SBA_CAMPAIGN_BUDGET = 'SBA Campaign Budget'
outputDf[BULK_COL_SBA_CAMPAIGN_BUDGET] = "<<YOUR VALUE>>"

########### User Code Starts Here ###########

### Load Current Month Budgets from GSheets and rename

# For debugging, set arbitrary date
# today = datetime.date(2024, 4, 2)

# Construct column key by mapping current month to canonical column name
# Assume column F is January, G is February, H is March, Q is Dec, etc. In terms of integer months, F=1, G=2, H=3,.., Q=12.
column_key = chr(64 + today.month + 5)  
print(f"GSheets column key for current month: {today.strftime(('%B'))} => {column_key}")

# load current month
# - skip first 3 rows via .loc[2:]
# - 'Abbreviation' is column T
# - budgets from January to December are on columns F to Q
current_month_budgets = dataSourceDict['2_1'] \
                        .loc[2:, ['T', column_key]] \
                        .rename(columns={ \
                            'T' : RPT_COL_SBA_STRATEGY, \
                            column_key : RPT_COL_SBA_CAMPAIGN_BUDGET \
                        })

print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print("current_month_budgets first 10 rows", current_month_budgets.head(10))
### cleanup Budget values from GSheets
# remove empty epicorID rows
current_month_budgets = current_month_budgets.loc[current_month_budgets[RPT_COL_SBA_STRATEGY].notnull()]
# make sure SBA Strategy is STR
current_month_budgets[RPT_COL_SBA_STRATEGY] = current_month_budgets[RPT_COL_SBA_STRATEGY].astype(str)
# make sure budget is float; remove prefix if not
# note: can't check for `object` since not imported, so use `0` instead
if current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET].dtype == 'O':
    current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET] = current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET] \
                                    .str.replace('US$', '', case=False, regex=False) \
                                    .str.replace('$', '', case=False, regex=False) \
                                    .str.replace(',', '') \
                                    .str.strip() \
                                    .astype(float)

# remove empty budget rows
has_budget_group = current_month_budgets[RPT_COL_SBA_STRATEGY].notnull() & \
                (current_month_budgets[RPT_COL_SBA_STRATEGY].str.len() > 3)
has_budget_group_cap = current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET].notnull() & \
                (current_month_budgets[RPT_COL_SBA_CAMPAIGN_BUDGET] > 0.5)
current_month_budgets = current_month_budgets.loc[ has_budget_group & has_budget_group_cap]

print("after cleanup gsheets")
print("current_month_budgets.shape", current_month_budgets.shape)
print("current_month_budgets.info", current_month_budgets.info())
print(current_month_budgets.head().to_string())

### Cleanup input

# Convert RPT_COL_EPICORID_MONTHLY_BUDGET to numeric, coercing errors to NaN
inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET] = pd.to_numeric(inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_SBA_CAMPAIGN_BUDGET].fillna(0.0, inplace=True)

### convert SBA Strategy column into string and remove blank or non-numeric rows
inputDf[RPT_COL_SBA_STRATEGY] = inputDf[RPT_COL_SBA_STRATEGY].astype(str)
valid_abbrev = inputDf[RPT_COL_SBA_STRATEGY].str.len() > 3
inputDf = inputDf.loc[valid_abbrev]

print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
print("intpuDf.info", inputDf.info())

# fill in missing budget as 0 for comparison later
inputDf = inputDf.fillna(value={RPT_COL_SBA_CAMPAIGN_BUDGET: 0})

# make copy of input for use with select_changed
originalDf = inputDf.copy()

### Merge inputDf with current_month_budgets, and fill in missing budget as 0
mergedDf = inputDf.merge(current_month_budgets, on=RPT_COL_SBA_STRATEGY, how='left', suffixes=('_old', '')) \
                  .fillna(value={RPT_COL_SBA_CAMPAIGN_BUDGET: 0})

print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5).to_string())

outputDf, debugDf = select_changed(mergedDf, 
                          originalDf,
                          diff_cols=[RPT_COL_SBA_CAMPAIGN_BUDGET],
                          select_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_SBA_STRATEGY, RPT_COL_SBA_CAMPAIGN_BUDGET],
                          merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN]
                          )

change_to_zero_budget = (debugDf[RPT_COL_SBA_CAMPAIGN_BUDGET + '_new'] < 1) & \
                        (debugDf[RPT_COL_SBA_CAMPAIGN_BUDGET + '_orig'] > 1)
print("count of campaigns with budget cleared", sum(change_to_zero_budget))
print("campaigns with budget cleared", tableize(debugDf.loc[change_to_zero_budget].head()))

print("outputDf.shape", outputDf.shape)
print("outputDf sample")
print(tableize(outputDf.tail(10)))

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2025-03-11 01:25:51 GMT

14 Mar 2024

« Script 813: SBA Campaign Pacing with Rec. Budget and Current Budget Script 819: Campaign Anomalies Amazon Music Global AMU Signups »

MarinOne Scripts Creator's Corner