Script 327: Epicor Budget Pacing

Purpose:

The Python script allocates daily budgets for each Epicor Budget Group by considering remaining budgets, weekdays, historical spending, and campaign activity.

To Elaborate

The script is designed to optimize daily budget allocations for various Epicor Budget Groups by taking into account several factors such as the remaining budget, the number of weekdays left in the month, historical spending patterns, and the activity of campaigns within a specified lookback period. It ensures that each campaign receives an appropriate budget allocation based on its potential to spend, while also adhering to a minimum daily budget threshold. The script is particularly useful for managing and pacing budgets effectively to minimize lost impression share due to budget constraints, thereby maximizing the efficiency of advertising campaigns.

Walking Through the Code

Initialization and Configuration:
- The script begins by setting a configurable parameter for the minimum daily budget.
- It checks whether the code is running on a server or locally, and loads necessary data from a pickle file if running locally.
Data Preparation:
- The script converts certain columns to numeric types and fills NaN values with zeros.
- It groups the data by EpicorID and calculates the full potential spend by adjusting historical spend based on lost impression share due to budget.
Aggregation and Filtering:
- The script aggregates data by various columns and calculates month-to-date (MTD) spend.
- It filters out inactive campaigns and those without spend in the lookback period.
Budget Allocation Calculation:
- The script calculates a budget allocation ratio by capping full potential spend and comparing it to total spend within each EpicorID budget group.
- It calculates the remaining budget for each EpicorID by subtracting MTD spend from the monthly budget.
Daily Budget Calculation:
- The script calculates the daily budget by dividing the allocated budget by the number of weekdays remaining in the month.
- It applies a minimum daily budget rule to ensure allocations do not fall below a specified threshold.
Traffic Budget and Pacing Compliance:
- The script determines whether to traffic the budget based on the day of the week and campaign flags.
- It calculates pacing compliance percentages to ensure budgets are on track.
Output Generation:
- The script checks for changes in budget allocations and generates an output DataFrame with updated values for campaigns that require changes.

Vitals

Script ID : 327
Client ID / Customer ID: 1306917127 / 60268084
Action Type: Bulk Upload
Item Changed: Campaign
Output Columns: Account, Campaign, SBA Allocation, SBA Budget Pacing, SBA Calculated Budget Daily, Daily Budget
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Michael Huang (mhuang@marinsoftware.com)
Created by Michael Huang on 2023-09-30 13:18
Last Updated by Michael Huang on 2024-01-18 06:51

> See it in Action

Python Code

#
# Heartland Dental - Budget Pacing - Minimize Lost IS (Budget)
#
# Allocates according to:
# * Remaining budget for each Epicor Budget Group
# * Remaining weekdays in month
# * Historical spend and spend potential
# * Campaigns with spend in lookback period
# * Minimum daily budget
#
# Author: Michael S. Huang
#
# Created: 2023-09-30
# Updated: 2023-11-06
#

##### Configurable Param #####

MINIMUM_DAILY_BUDGET = 10

##############################


########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
pickle_path = '/Users/mhuang/Downloads/pickle/heartland_dental_pacing_20240118_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
    # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # Printing out the version of Python, Pandas and Numpy
    # import sys
    # python_version = sys.version
    # pandas_version = pd.__version__
    # numpy_version = np.__version__

    # print(f"python version: {python_version}")
    # print(f"pandas version: {pandas_version}")
    # print(f"numpy version: {numpy_version}")


    # other imports
    import re
    import urllib

    # import Marin util functions
    # from marin_scripts_utils import tableize, select_changed

    # pandas settings
    pd.set_option('display.max_columns', None)  # Display all columns
    pd.set_option('display.max_colwidth', None)  # Display full content of each column

else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CONV = 'Conv.'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_LOST_IMPR_SHARE_BUDGET = 'Lost Impr. Share (Budget) %'
RPT_COL_LOST_IMPR_SHARE_RANK = 'Lost Impr. Share (Rank) %'
RPT_COL_EPICORID = 'epicorID'
RPT_COL_EPICORID_MONTHLY_BUDGET = 'epicorID - Monthly Budget'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_SBA_CALCULATED_BUDGET_DAILY = 'SBA Calculated Budget Daily'
RPT_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
RPT_COL_SBA_TRAFFIC = 'SBA Traffic'

BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_DAILY_BUDGET = 'Daily Budget'
BULK_COL_SBA_ALLOCATION = 'SBA Allocation'
BULK_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
BULK_COL_SBA_CALCULATED_BUDGET_DAILY = 'SBA Calculated Budget Daily'

COL_SPEND_FULL_POTENTIAL = 'spend_lookback_full_potential'
COL_SPEND_FULL_POTENTIAL_CAPPED = 'spend_lookback_full_potential_capped'
COL_SPEND_MTD = 'spend_mtd'
COL_SBA_ALLOCATION_NEW_FLOAT = RPT_COL_SBA_ALLOCATION + '_new_float'
COL_SBA_ALLOCATION_NEW = RPT_COL_SBA_ALLOCATION + '_new'
COL_EPICOR_BUDGET_REMAINING = 'epicor_budget_remaining'
COL_SBA_BUDGET_PACING_NEW = RPT_COL_SBA_BUDGET_PACING + '_new'
COL_BUDGET_REMAINING = 'budget_remaining'
COL_DAILY_BUDGET_NEW = RPT_COL_DAILY_BUDGET + '_new'
COL_SBA_CALCULATED_BUDGET_DAILY_NEW = RPT_COL_SBA_CALCULATED_BUDGET_DAILY + '_new'
COL_WEEKDAYS_REMAINING= 'weekdays_remaining'
COL_WEEKDAYS_TOTAL= 'weekdays_total'
COL_PACING_CALC = 'pacing_calc'

outputDf[BULK_COL_DAILY_BUDGET] = "<<YOUR VALUE>>"

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# Convert RPT_COL_EPICORID_MONTHLY_BUDGET to numeric, coercing errors to NaN
inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET] = pd.to_numeric(inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_EPICORID_MONTHLY_BUDGET].fillna(0.0, inplace=True)

# change back to percent string
if inputDf[RPT_COL_SBA_ALLOCATION].dtype == "float":
    inputDf[RPT_COL_SBA_ALLOCATION] = round(inputDf[RPT_COL_SBA_ALLOCATION] * 100.0, 0).astype(str) + '%'
if inputDf[BULK_COL_SBA_BUDGET_PACING].dtype == "float":
    inputDf[BULK_COL_SBA_BUDGET_PACING] = round(inputDf[BULK_COL_SBA_BUDGET_PACING] * 100.0, 0).astype(str) + '%'


print("inputDf shape", inputDf.shape)
print("inputDf info\n", inputDf.info())


inputDf = inputDf.set_index([RPT_COL_EPICORID])
group_by_epicor = inputDf.groupby(RPT_COL_EPICORID)

# ## Calculate Full-Potential Spend
# * Adjust Historical Spend by _Lost Impression Share due to Budget_ (see [Formula](https://docs.google.com/document/d/1EbCQ5z9Up8TZ6GISEeCaRSB3Fc15vCPCfeIydree23M/edit#bookmark=id.5fsx7jlseze6))
# 

adj_ratio = 1 + (inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET] / (1 - inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET]))

inputDf[COL_SPEND_FULL_POTENTIAL] = round(inputDf[RPT_COL_PUB_COST] * adj_ratio, 2)

# ## Remove Date Segmentation
# * Calculate MTD Spend

# SUM Series with Date index and only includes current month
def current_month_sum(x):
    x = x.sort_index()
    mtd = x[ (x.index.month == today.month) & (x.values > 0)]
    return mtd.sum()

groupby_cols = [ \
    RPT_COL_EPICORID, \
    RPT_COL_STRATEGY, \
    RPT_COL_PUBLISHER_NAME, \
    RPT_COL_ACCOUNT, \
    RPT_COL_CAMPAIGN, \
]


agg_spec = {
    RPT_COL_CAMPAIGN_STATUS: 'last', \
    RPT_COL_DAILY_BUDGET: 'last', \
    RPT_COL_EPICORID_MONTHLY_BUDGET: 'last', \
    RPT_COL_SBA_ALLOCATION: 'last', \
    RPT_COL_SBA_CALCULATED_BUDGET_DAILY: 'last', \
    RPT_COL_SBA_BUDGET_PACING: 'last', \
    RPT_COL_SBA_TRAFFIC: 'last', \
    RPT_COL_CLICKS: 'sum', \
    RPT_COL_CONV: 'sum', \
    RPT_COL_PUB_COST: 'sum', \
    COL_SPEND_MTD: current_month_sum, \
    COL_SPEND_FULL_POTENTIAL: 'sum', \
}

inputDf[COL_SPEND_MTD] = inputDf[RPT_COL_PUB_COST]

df_campaign_agg = inputDf.reset_index() \
                         .set_index(RPT_COL_DATE) \
                         .groupby(groupby_cols) \
                         .agg(agg_spec) \
                         .reset_index() \
                         .set_index(RPT_COL_EPICORID)


# ## Only allocate budget for recently trafficking campaigns
# * Exclude Campaigns that are not ACTIVE and without spend in lookback period

inactive_campaigns = (df_campaign_agg[RPT_COL_CAMPAIGN_STATUS] != 'Active') & (df_campaign_agg[RPT_COL_PUB_COST] == 0)

df_campaign_agg = df_campaign_agg.loc[ ~inactive_campaigns ]


# ## Calculate Budget Allocation Ratio 
# * Cap full potential spend at 2X (don't spend twice as much as before)
# * Compare full potential spend for each campaign to total spend within same EpicorID budget group

df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] = df_campaign_agg \
    .apply(lambda row: min(row[COL_SPEND_FULL_POTENTIAL], 2 * row[RPT_COL_PUB_COST]), axis=1)


# use transform to calculate sum for each EpicorID and make it available on every row
# note: no need to build aggregate DataFrame and JOIN back to original

df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] = 100.0 * \
        df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] / \
        df_campaign_agg.groupby(RPT_COL_EPICORID)[COL_SPEND_FULL_POTENTIAL_CAPPED].transform('sum')

df_campaign_agg[COL_SBA_ALLOCATION_NEW] = round(df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT],0).astype(str) + '%'


# 
# ## Calculate Remaining Budget
# * For each EpicorID budget group, calculate how much Budget is left by substracting Epicor budget from MTD Epicor spend

df_campaign_agg[COL_EPICOR_BUDGET_REMAINING] =  \
        df_campaign_agg[RPT_COL_EPICORID_MONTHLY_BUDGET] - \
        df_campaign_agg.groupby(by=[RPT_COL_EPICORID])[COL_SPEND_MTD].sum()


# ## Allocate Budget
# * Allocate remaining budget to each campaign according to ratio calculated above

df_campaign_agg[COL_BUDGET_REMAINING] = round(df_campaign_agg[COL_EPICOR_BUDGET_REMAINING] * df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] / 100.0, 1)


# ## Calculate SBA Daily Budget
# 
# * Calcualte next day Daily Budget by dividing allocated budget by number of Business Days left in the current month


today_numpy = pd.to_datetime(today).to_numpy().astype('datetime64[D]')
next_month_start = (today_numpy + pd.offsets.BMonthBegin()).to_numpy().astype('datetime64[D]')

# for months ending on weekends, use max(1,x) to avoid dividing by zero
weekdays_left = max(1, np.busday_count(today_numpy, next_month_start))

df_campaign_agg[COL_WEEKDAYS_REMAINING] = weekdays_left
df_campaign_agg[COL_SBA_CALCULATED_BUDGET_DAILY_NEW] = round(df_campaign_agg[COL_BUDGET_REMAINING] / weekdays_left, 0)

# ### Apply Minimum Rule
# * Bump allocated budget above minimum

allocated_below_min = (df_campaign_agg[COL_SBA_CALCULATED_BUDGET_DAILY_NEW] < MINIMUM_DAILY_BUDGET)
df_campaign_agg.loc[allocated_below_min, COL_SBA_CALCULATED_BUDGET_DAILY_NEW] = MINIMUM_DAILY_BUDGET


# ### Traffic Budget
#  * only traffic budget when it's weekday, and only for campaigns with flag enabled
df_campaign_agg[COL_DAILY_BUDGET_NEW] = np.nan
# Mon=1..Sat=6, Sunday=7
if today.isoweekday() < 6:
    # campaigns to traffic
    to_traffic = df_campaign_agg[RPT_COL_SBA_TRAFFIC].notnull() & \
                (df_campaign_agg[RPT_COL_SBA_TRAFFIC].astype(str).str.lower() == 'traffic')
    print("Not weekend. Traffic count", to_traffic.sum())

    # copy budgets over
    df_campaign_agg[COL_DAILY_BUDGET_NEW] = df_campaign_agg[COL_SBA_CALCULATED_BUDGET_DAILY_NEW]
    # then blank out budget for non-traffic campaigns
    df_campaign_agg.loc[~to_traffic, COL_DAILY_BUDGET_NEW] = np.nan
else:
    print("Weekend. Not trafficking.")

# ## Calculate Epicor-level Pacing compliance percentage. Ideally should be 100% each day.

# number of elapsed workdays
current_month_start = pd.to_datetime(today.replace(day=1)).to_numpy().astype('datetime64[D]')
weekdays_in_month = np.busday_count(current_month_start, next_month_start)
df_campaign_agg[COL_WEEKDAYS_TOTAL] = weekdays_in_month
prorated_ratio = (weekdays_in_month - weekdays_left + 1) / weekdays_in_month

print("today", today)
print("current_month_start", current_month_start)
print("next_month_start", next_month_start)
print("weekdays_in_month", weekdays_in_month)
print("weekdays_left", weekdays_left)
print("prorated_ratio", prorated_ratio)

# divide MTD spend by prorated total budget
mask = df_campaign_agg[RPT_COL_EPICORID_MONTHLY_BUDGET] > 0
df_campaign_agg[COL_PACING_CALC] = round(100.0 * \
                                    df_campaign_agg.groupby(by=[RPT_COL_EPICORID])[COL_SPEND_MTD].sum() / \
                                    (prorated_ratio * df_campaign_agg[RPT_COL_EPICORID_MONTHLY_BUDGET]), \
                                    0).astype(str) + '%'
df_campaign_agg.loc[mask, COL_SBA_BUDGET_PACING_NEW] = df_campaign_agg.loc[mask, COL_PACING_CALC]



# Debug DF with full details
df_epicor_budget = group_by_epicor[[RPT_COL_EPICORID_MONTHLY_BUDGET]].transform('max').dropna().drop_duplicates()
print("epicor budgets", df_epicor_budget.head(5))


# ## Generate outputDf

# Check for changes
changed = df_campaign_agg[COL_SBA_CALCULATED_BUDGET_DAILY_NEW].notnull() & \
    ( \
       (df_campaign_agg[RPT_COL_SBA_CALCULATED_BUDGET_DAILY] != df_campaign_agg[COL_SBA_CALCULATED_BUDGET_DAILY_NEW]) | \
       (df_campaign_agg[RPT_COL_DAILY_BUDGET] != df_campaign_agg[COL_DAILY_BUDGET_NEW]) | \
       (df_campaign_agg[RPT_COL_SBA_ALLOCATION] != df_campaign_agg[COL_SBA_ALLOCATION_NEW]) | \
       (df_campaign_agg[RPT_COL_SBA_BUDGET_PACING] != df_campaign_agg[COL_SBA_BUDGET_PACING_NEW]) \
    )

print("Changed rows:", changed.sum())

# Debug
debugDf = df_campaign_agg.loc[changed] \
      .reset_index() \
      .sort_values(by=[RPT_COL_EPICORID, COL_DAILY_BUDGET_NEW, COL_SBA_CALCULATED_BUDGET_DAILY_NEW], ascending=False) 

# print("debugDf", tableize(debugDf))

# Only emit output for changed campaigns
if changed.sum() > 0:

    # construct outputDf
    outputDf = df_campaign_agg.loc[changed, [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, COL_DAILY_BUDGET_NEW, COL_SBA_CALCULATED_BUDGET_DAILY_NEW, COL_SBA_BUDGET_PACING_NEW, COL_SBA_ALLOCATION_NEW]] \
                      .copy() \
                      .rename(columns={ \
                            COL_DAILY_BUDGET_NEW: BULK_COL_DAILY_BUDGET, \
                            COL_SBA_CALCULATED_BUDGET_DAILY_NEW: BULK_COL_SBA_CALCULATED_BUDGET_DAILY, \
                            COL_SBA_BUDGET_PACING_NEW: BULK_COL_SBA_BUDGET_PACING, \
                            COL_SBA_ALLOCATION_NEW: BULK_COL_SBA_ALLOCATION, \
                        }) \
                      .reset_index() \
                      .sort_values(by=[RPT_COL_EPICORID, BULK_COL_DAILY_BUDGET, BULK_COL_SBA_CALCULATED_BUDGET_DAILY], ascending=False) \
                      .drop(RPT_COL_EPICORID, axis=1)

    print("outputDf shape", outputDf.shape)

else:
    print("No changes detected, returning an empty dataframe")
    outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_DAILY_BUDGET, BULK_COL_SBA_CALCULATED_BUDGET_DAILY, BULK_COL_SBA_BUDGET_PACING, BULK_COL_SBA_ALLOCATION])

Post generated on 2025-03-11 01:25:51 GMT

30 Sep 2023

« Script 319: NOLS NOLM Backorder keywords Script 329: Conversion Influencers »

MarinOne Scripts Creator's Corner