Script 1767: Auto Pause Intraday MTD Based on Strategy Target

Purpose:

The Python script automates the pausing and resuming of advertising campaigns based on their monthly budget targets and current spending.

To Elaborate

The script is designed to manage advertising campaigns by automatically pausing them when their month-to-date (MTD) spending approaches or exceeds their allocated monthly budget, as defined in their strategy. It also resumes campaigns that have been paused if their spending falls below the budget threshold. The script uses a safety margin to account for system lags and non-linear spending patterns throughout the day. It processes data to identify campaigns that need to be paused or resumed and updates their status accordingly. The script can be run locally or on a server, and it uses data from a pickled file to perform its operations. The main goal is to ensure that campaigns do not exceed their budget while allowing for flexibility in spending patterns.

Walking Through the Code

Configuration and Setup
- The script begins by defining a configurable parameter, BUDGET_CAP_SAFETY_MARGIN, which determines how close the MTD spend can get to the monthly budget before pausing campaigns.
- It checks whether the code is running on a server or locally, and if local, it loads data from a specified pickle file.
Data Preparation
- The script loads campaign data into a DataFrame, ensuring data types are correct and filling any missing values.
- It calculates the MTD spending for each budget group and identifies campaigns that have exceeded their budget.
Campaign Status Management
- Campaigns that exceed their budget are marked for pausing, while those under budget with a previous pause date are marked for resuming.
- The script updates the campaign status based on these conditions, ensuring that only campaigns with a ‘traffic’ status are affected.
Output Preparation
- The script cleans up any orphaned pause dates and prepares the output DataFrame, which includes only the rows where changes have occurred.
- If running locally, it writes the output and debug information to CSV files for further analysis.

Vitals

Script ID : 1767
Client ID / Customer ID: 1306928641 / 60270613
Action Type: Bulk Upload (Preview)
Item Changed: Campaign
Output Columns: Account, Campaign, Auto Pause Date, Auto Pause Rec. Campaign Status, Status
Linked Datasource: M1 Report
Reference Datasource: None
Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
Created by ascott@marinsoftware.com on 2025-02-27 15:14
Last Updated by ascott@marinsoftware.com on 2025-02-28 16:36

> See it in Action

Python Code

### name: Intraday Budget Cap via Strategy
## description:
##  Pause campaigns when MTD spend reaches Monthly Budget (stored in Strategy)
## 
## author: Adam Scott
## created: 2024-04-26
## 

##### Configurable Param #####
# Define how close MTD spend can get to Monthly Budget before being Paused
#  - compensates for lag in system
#  - compendates for non-linearity in intraday spend
BUDGET_CAP_SAFETY_MARGIN = 0.02 # set to 2%
##############################


########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
pickle_path = '/Users/mhuang/Downloads/pickle/allcampus_intraday_budget_cap_20241002_2.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
    # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # Printing out the version of Python, Pandas and Numpy
    # import sys
    # python_version = sys.version
    # pandas_version = pd.__version__
    # numpy_version = np.__version__

    # print(f"python version: {python_version}")
    # print(f"pandas version: {pandas_version}")
    # print(f"numpy version: {numpy_version}")

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_AUTO_PAUSE_STATUS = 'Auto Pause Status'
RPT_COL_BUDGET_GROUP = 'Strategy'
RPT_COL_GROUP_MONTHLY_BUDGET = 'Strategy Target'
RPT_COL_PAUSE_DATE = 'Auto Pause Date'
RPT_COL_RECOMMENDED_STATUS = 'Auto Pause Rec. Campaign Status'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_STATUS = 'Status'
BULK_COL_SBA_PAUSE_DATE = 'Auto Pause Date'
BULK_COL_SBA_RECOMMENDED_STATUS = 'Auto Pause Rec. Campaign Status'


originalDf = dataSourceDict["1"]

# Workaround: remove duplicate Meta campaigns via Group By and sum Pub Cost
originalDf = originalDf.groupby([RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN], as_index=False).agg({
    RPT_COL_PUB_COST: 'sum',
    RPT_COL_CAMPAIGN_STATUS: 'first',
    RPT_COL_AUTO_PAUSE_STATUS: 'first',
    RPT_COL_BUDGET_GROUP: 'first',
    RPT_COL_GROUP_MONTHLY_BUDGET: 'first',
    RPT_COL_PAUSE_DATE: 'first',
    RPT_COL_RECOMMENDED_STATUS: 'first'
})

# Make inputDf a copy of original to keep dataSourceDict pristine
inputDf = originalDf.copy()

# define some intermediate columns
COL_MTD_BUDGET_GROUP_SPEND = 'mtd_budget_group_spend'

# define Status values
VAL_STATUS_ACTIVE = 'Active'
VAL_STATUS_PAUSED = 'Paused'
VAL_BLANK = ''

print("inputDf shape", inputDf.shape)
print("inputDf info", inputDf.info())

## force expected types
# Convert RPT_COL_SBA_MONTHLY_BUDGET to numeric, coercing errors to NaN
inputDf[RPT_COL_GROUP_MONTHLY_BUDGET] = pd.to_numeric(inputDf[RPT_COL_GROUP_MONTHLY_BUDGET], errors='coerce')
# Replace NaN values with 0.0 if that's the desired behavior
inputDf[RPT_COL_GROUP_MONTHLY_BUDGET].fillna(0.0, inplace=True)
# Force RPT_COL_PAUSE_DATE to be Date type
inputDf[RPT_COL_PAUSE_DATE] = pd.to_datetime(inputDf[RPT_COL_PAUSE_DATE], errors='coerce').dt.date

# HACK: replace nan with empty strings so comparison doesn't fail
inputDf.fillna(VAL_BLANK, inplace=True)

# Clear out old Rec Status so they don't get trafficked
inputDf[RPT_COL_RECOMMENDED_STATUS] = VAL_BLANK

# Calculate MTD Budget Group Spend
inputDf[COL_MTD_BUDGET_GROUP_SPEND] = inputDf.groupby(RPT_COL_BUDGET_GROUP)[RPT_COL_PUB_COST].transform('sum')

# Recommend to Pause camapigns with MTD Budget Group Spend over Monthly Budget (by a margin)
has_monthly_group_budget = inputDf[RPT_COL_GROUP_MONTHLY_BUDGET] > 0.0
over_spent_campaigns = inputDf[COL_MTD_BUDGET_GROUP_SPEND] >= inputDf[RPT_COL_GROUP_MONTHLY_BUDGET] * (1 - BUDGET_CAP_SAFETY_MARGIN)
campaigns_to_pause = has_monthly_group_budget & over_spent_campaigns

inputDf.loc[campaigns_to_pause, 'pause'] = 1
print(f"campaigns_to_pause count: {sum(campaigns_to_pause)}")
if campaigns_to_pause.any():
    print("campaigns_to_pause campaigns", tableize(inputDf.loc[campaigns_to_pause].head()))

inputDf.loc[ campaigns_to_pause, \
             RPT_COL_RECOMMENDED_STATUS \
           ] = VAL_STATUS_PAUSED
 
# Recommend to reactivate campaigns with MTD Budget Group Spend under Monthly Group Budget (by a margin)
# but limited to campaigns with SBA Pause Date populated 10 digit date
under_spent_campaigns = inputDf[COL_MTD_BUDGET_GROUP_SPEND] < inputDf[RPT_COL_GROUP_MONTHLY_BUDGET] * (1 - BUDGET_CAP_SAFETY_MARGIN)
sba_paused_campaigns = inputDf[RPT_COL_PAUSE_DATE].astype('str').str.len() >= 10
campaigns_to_resume = under_spent_campaigns & sba_paused_campaigns

inputDf.loc[campaigns_to_resume, 'resume'] = 1
print(f"campaigns_to_resume count: {sum(campaigns_to_resume)}")
if campaigns_to_resume.any():
    print("campaigns_to_resume", tableize(inputDf.loc[campaigns_to_resume].head()))

inputDf.loc[ campaigns_to_resume, \
             RPT_COL_RECOMMENDED_STATUS  \
           ] = VAL_STATUS_ACTIVE

## Actually taffic PAUSE

should_traffic = inputDf[RPT_COL_AUTO_PAUSE_STATUS].astype(str).str.lower() == 'traffic'

should_traffic_pause = should_traffic & \
                       campaigns_to_pause & \
                       (inputDf[RPT_COL_RECOMMENDED_STATUS] == VAL_STATUS_PAUSED) & \
                       (inputDf[RPT_COL_RECOMMENDED_STATUS] != inputDf[RPT_COL_CAMPAIGN_STATUS])


inputDf.loc[should_traffic_pause, 'traffic_pause'] = 1
print(f"should_traffic_pause count: {sum(should_traffic_pause)}")
if should_traffic_pause.any():
    print("should_traffic_pause campaigns", tableize(inputDf.loc[should_traffic_pause].head()))


inputDf.loc[should_traffic_pause, RPT_COL_CAMPAIGN_STATUS] = inputDf.loc[should_traffic_pause, RPT_COL_RECOMMENDED_STATUS]
inputDf.loc[should_traffic_pause, RPT_COL_PAUSE_DATE] = today.strftime('%Y-%m-%d')

## Actually taffic RESUME

should_traffic_resume = should_traffic & \
                        campaigns_to_resume & \
                       sba_paused_campaigns & \
                       (inputDf[RPT_COL_RECOMMENDED_STATUS] == VAL_STATUS_ACTIVE) & \
                       (inputDf[RPT_COL_RECOMMENDED_STATUS] != inputDf[RPT_COL_CAMPAIGN_STATUS])


inputDf.loc[should_traffic_resume, 'traffic_resume'] = 1
print(f"should_traffic_resume count: {sum(should_traffic_resume)}")
if should_traffic_resume.any():
    print("should_traffic_resume campaigns", tableize(inputDf.loc[should_traffic_resume].head()))

inputDf.loc[should_traffic_resume, RPT_COL_CAMPAIGN_STATUS] = inputDf.loc[should_traffic_resume, RPT_COL_RECOMMENDED_STATUS]
inputDf.loc[should_traffic_resume, RPT_COL_PAUSE_DATE] = VAL_BLANK

## Prepare Output

# Cleanup. RPT_COL_PAUSE_DATE is a marker to indicate this Script actioned the Pause. If not Paused, for whatever reason, then a non-blank RPT_COL_PAUSE_DATE causes confusion. 
orphan_pause_date = sba_paused_campaigns & (inputDf[RPT_COL_CAMPAIGN_STATUS] == VAL_STATUS_ACTIVE)
inputDf.loc[orphan_pause_date, RPT_COL_PAUSE_DATE] = VAL_BLANK
print(f"Cleaned up {orphan_pause_date.sum()} orphaned {RPT_COL_PAUSE_DATE}")

# only include changed rows in bulk file
print(f"select_changed with inputDf shape {inputDf.shape} and originalDf shape {originalDf.shape}")
(outputDf, debugDf) = select_changed(inputDf, \
                                    originalDf, \
                                    diff_cols = [ \
                                        RPT_COL_CAMPAIGN_STATUS, \
                                        RPT_COL_RECOMMENDED_STATUS, \
                                    ], \
                                    select_cols = [ \
                                        RPT_COL_ACCOUNT, \
                                        RPT_COL_CAMPAIGN, \
                                        RPT_COL_CAMPAIGN_STATUS, \
                                        RPT_COL_RECOMMENDED_STATUS, \
                                        RPT_COL_PAUSE_DATE, \
                                    ], \
                                    merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN] \
                                    )


changed = (debugDf[RPT_COL_CAMPAIGN_STATUS+'_new'] != debugDf[RPT_COL_CAMPAIGN_STATUS+'_orig']) | \
          (debugDf[RPT_COL_RECOMMENDED_STATUS+'_new'] != debugDf[RPT_COL_RECOMMENDED_STATUS+'_orig'])

debugDf.loc[changed, 'changed'] = 1
print(f"changed count: {sum(changed)}")
if changed.any():
    print("changed campaigns", tableize(debugDf.loc[changed].head()))

# remember to use Bulk column header for Status
outputDf = outputDf.rename(columns = { \
                RPT_COL_CAMPAIGN_STATUS: BULK_COL_STATUS \
                })

print("outputDf shape", outputDf.shape)
print("outputDf", tableize(outputDf.tail(5)))

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")
  

Post generated on 2025-03-11 01:25:51 GMT

27 Feb 2025

« Script 1765: Offline Import Memira Script 1769: Script Set Posting Status at Campaign Level »

MarinOne Scripts Creator's Corner