Script 1299: Benchmark Assignments from Gsheet

Purpose:

The Python script updates benchmark dimensions in Marin by copying data from a staging Google Sheet, matching the ‘Abbreviation’ column with the ‘Strategy’ column.

To Elaborate

The script is designed to facilitate the transfer of benchmark data from a Google Sheet into Marin, a digital marketing management platform. It achieves this by matching the ‘Abbreviation’ column in the Google Sheet with the ‘Strategy’ column in the Marin data. The script processes data locally or on a server, depending on the environment, and uses a pickled data source dictionary to manage input data. The primary goal is to update benchmark dimensions such as Gross Lead and CPL benchmarks by aligning them with the corresponding strategies. This ensures that the marketing campaigns in Marin are updated with the latest benchmark data, which is crucial for accurate performance tracking and decision-making.

Walking Through the Code

Local Mode Configuration:
- The script begins by setting up configurations for local execution, including a flag to download preview input and a path for the pickled data source dictionary.
- It checks whether the code is running on a server or locally, and if local, it initializes the data source dictionary from a pickled file.
Data Source Initialization:
- The script loads the primary data source into inputDf and defines several column constants for both input and output data frames.
- It prepares the outputDf by copying the structure of inputDf and initializing benchmark columns with placeholder values.
Benchmark Data Processing:
- The script extracts relevant columns from the Google Sheet, renaming them to match the expected column names in Marin.
- It converts these columns to numeric types, handling any conversion errors gracefully.
Benchmark Assignment:
- For each row in the input data frame, the script matches the ‘Strategy’ with the corresponding row in the benchmark data.
- If a match is found, it updates the output data frame with the benchmark values, rounding them to two decimal places.
- If no match is found, it logs a message and assigns NaN or default values to the output data frame.
Output and Debugging:
- The script prints the first few rows of the output data frame for verification.
- In local development mode, it writes the output and debug data frames to CSV files for further inspection.

Vitals

Script ID : 1299
Client ID / Customer ID: 1306926629 / 60270083
Action Type: Bulk Upload
Item Changed: Campaign
Output Columns: Account, Campaign, Gross Lead - Benchmark - High, Gross Lead - Benchmark - Medium, Gross Lead - Benchmark - Low, Intv Rate - Benchmark, CPL - Benchmark, Campaign ID
Linked Datasource: M1 Report
Reference Datasource: Google Sheets
Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
Created by dwaidhas@marinsoftware.com on 2024-07-26 19:50
Last Updated by dwaidhas@marinsoftware.com on 2024-08-20 15:40

> See it in Action

Python Code

##
## name: Benchmark Assignments from Gsheet
## description:
##  Copy Benchmarks from staging GSheets to update Benchmark Dimensions in Marin.
##  Use GSheets 'Abbrevation' column to match with 'Strategy'.
##
## author: Dana Waidhas
## created: 2024-07-26
##

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/dwaidhas/Downloads/pickle/allcampus_'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))

    # import pandas
    import pandas as pd
    import numpy as np

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_GROSS_LEAD__BENCHMARK__HIGH = 'Gross Lead - Benchmark - High'
RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM = 'Gross Lead - Benchmark - Medium'
RPT_COL_GROSS_LEAD__BENCHMARK__LOW = 'Gross Lead - Benchmark - Low'
RPT_COL_CPL__BENCHMARK = 'CPL - Benchmark'
RPT_COL_INTV_RATE__BENCHMARK = 'Intv Rate - Benchmark'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_CAMPAIGN_ID = 'Campaign ID'
BULK_COL_CPL__BENCHMARK = 'CPL - Benchmark'
BULK_COL_GROSS_LEAD__BENCHMARK__HIGH = 'Gross Lead - Benchmark - High'
BULK_COL_GROSS_LEAD__BENCHMARK__LOW = 'Gross Lead - Benchmark - Low'
BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM = 'Gross Lead - Benchmark - Medium'
BULK_COL_INTV_RATE__BENCHMARK = 'Intv Rate - Benchmark'
outputDf[BULK_COL_CPL__BENCHMARK] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__HIGH] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__LOW] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM] = "<<YOUR VALUE>>"
outputDf[BULK_COL_INTV_RATE__BENCHMARK] = "<<YOUR VALUE>>"

########### User Code Starts Here ###########

# "Benchmark Upload" is the fourth sheet, so datasource key is 2_4
# "DIM1" is column B, which holds Gross Lead - Benchmark - High
# "DIM2" is column C, which holds Gross Lead - Benchmark - Medium
# "DIM3" is column D, which holds Gross Lead - Benchmark - Low
# "DIM4" is column E, which holds CPL Target
# "DIM5" is column F, which holds Intv Rate Target
# "Digital Ref" is column A, which holds Strategy

Benchmark_Upload = dataSourceDict['2_4'] \
                        .loc[0:, ['A', 'B', 'C', 'D', 'E', 'F']] \
                        .rename(columns={
                            'A': RPT_COL_STRATEGY,
                            'B': RPT_COL_GROSS_LEAD__BENCHMARK__HIGH,
                            'C': RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM,
                            'D': RPT_COL_GROSS_LEAD__BENCHMARK__LOW,
                            'E': RPT_COL_CPL__BENCHMARK,
                            'F': RPT_COL_INTV_RATE__BENCHMARK
                        })

# Convert columns to numeric, handling any errors
Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH] = pd.to_numeric(Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH], errors='coerce')
Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM] = pd.to_numeric(Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM], errors='coerce')
Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__LOW] = pd.to_numeric(Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__LOW], errors='coerce')
Benchmark_Upload[RPT_COL_CPL__BENCHMARK] = pd.to_numeric(Benchmark_Upload[RPT_COL_CPL__BENCHMARK], errors='coerce')
Benchmark_Upload[RPT_COL_INTV_RATE__BENCHMARK] = pd.to_numeric(Benchmark_Upload[RPT_COL_INTV_RATE__BENCHMARK], errors='coerce')

# Match 'Strategy' from inputDf with 'Strategy' in Benchmark_Upload and assign benchmarks
for index, row in inputDf.iterrows():
    strategy = row[RPT_COL_STRATEGY]
    matching_row = Benchmark_Upload[Benchmark_Upload[RPT_COL_STRATEGY] == strategy]
    if not matching_row.empty:
        outputDf.loc[index] = {
            BULK_COL_ACCOUNT: row[RPT_COL_ACCOUNT],
            BULK_COL_CAMPAIGN: row[RPT_COL_CAMPAIGN],
            BULK_COL_CAMPAIGN_ID: row[RPT_COL_CAMPAIGN_ID],
            BULK_COL_CPL__BENCHMARK: round(matching_row[RPT_COL_CPL__BENCHMARK].values[0], 2),
            BULK_COL_GROSS_LEAD__BENCHMARK__HIGH: round(matching_row[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH].values[0], 2),
            BULK_COL_GROSS_LEAD__BENCHMARK__LOW: round(matching_row[RPT_COL_GROSS_LEAD__BENCHMARK__LOW].values[0], 2),
            BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM: round(matching_row[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM].values[0], 2),
            BULK_COL_INTV_RATE__BENCHMARK: round(matching_row[RPT_COL_INTV_RATE__BENCHMARK].values[0], 2)
        }
    else:
        print(f"No matching benchmark found for strategy: {strategy}")
        # Optionally, you can add default values or handle it differently
        outputDf.loc[index] = {
            BULK_COL_ACCOUNT: row[RPT_COL_ACCOUNT],
            BULK_COL_CAMPAIGN: row[RPT_COL_CAMPAIGN],
            BULK_COL_CAMPAIGN_ID: row[RPT_COL_CAMPAIGN_ID],
            BULK_COL_CPL__BENCHMARK: np.nan,  # or some default value
            BULK_COL_GROSS_LEAD__BENCHMARK__HIGH: np.nan,  # or some default value
            BULK_COL_GROSS_LEAD__BENCHMARK__LOW: np.nan,  # or some default value
            BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM: np.nan,  # or some default value
            BULK_COL_INTV_RATE__BENCHMARK: np.nan  # or some default value
        }

# Display the first few rows of the output DataFrame
print(outputDf.head())

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2025-03-11 01:25:51 GMT

26 Jul 2024

« Script 1295: Extract Search Phrase from Keyword Script 1301: Extract Studio and Feed Category and Language Target »

MarinOne Scripts Creator's Corner