Script 1501: Sample Script Pause Poor performing Keywords with 0 Conv. and QS 5

Purpose:

The Python script identifies and recommends pausing keywords that are at least 30 days old, have zero conversions, and a quality score below 5.

To Elaborate

The script is designed to optimize keyword performance in advertising campaigns by identifying keywords that are underperforming. Specifically, it targets keywords that have been active for at least 30 days, have not generated any conversions, and have a quality score below 5. The script uses data from advertising platforms to evaluate each keyword’s performance based on these criteria. If a keyword meets these conditions, it is flagged for pausing, which can help improve the overall efficiency of the advertising campaign by reallocating resources to more effective keywords. This process is automated to ensure timely and consistent management of keyword performance.

Walking Through the Code

Initialization and Setup:
- The script begins by checking whether it is running on a server or locally. If running locally, it loads data from a specified pickle file containing keyword performance data.
- It sets up necessary libraries such as pandas and numpy for data manipulation and analysis.
Data Preparation:
- The script cleans the input data by removing unnecessary columns and ensuring date columns are in the correct format.
- It groups the data by account, campaign, group, keyword, and match type to facilitate analysis.
Intermediate Calculations:
- It calculates the first date with impressions for each keyword and adjusts the creation date accordingly.
- The script aggregates data to compute metrics like total impressions, conversions, and average historical quality score.
Keyword Evaluation:
- The script applies business rules to identify keywords that are at least 30 days old, have zero conversions, and meet specific quality score criteria.
- Keywords that meet these conditions are selected for pausing.
Output Preparation:
- The script prepares the output data, including the account, campaign, group, keyword, and match type for keywords recommended for pausing.
- It adds a note in the automation info column indicating the reason for pausing.
Local Development:
- If running locally, the script saves the output and debug information to CSV files for further analysis.

Vitals

Script ID : 1501
Client ID / Customer ID: 1306928469 / 60270543
Action Type: Bulk Upload (Preview)
Item Changed: Keyword
Output Columns: Account, Campaign, Group, Keyword, Match Type, AUTOMATION INFO
Linked Datasource: M1 Report
Reference Datasource: None
Owner: emerryfield@marinsoftware.com (emerryfield@marinsoftware.com)
Created by emerryfield@marinsoftware.com on 2024-11-08 23:18
Last Updated by emerryfield@marinsoftware.com on 2024-11-08 23:26

> See it in Action

Python Code

##
## name: Pause Poor performing Keywords with 0 Conv. and QS <5
## description:
##  Look at the creation Date and make sure the KW is at least 30 days old. Then, if Conversions are 0 and Quality score is <5 pause the KW
## 
## author: Dana Waidhas, Michael S. Huang
## created: 2024-04-02
## 


########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/roc_media_keyword_qs_20240410_no_conv_filter.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
  # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed

    # pandas settings
    pd.set_option('display.max_columns', None)  # Display all columns
    pd.set_option('display.max_colwidth', None)  # Display full content of each column

else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########


MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}

# Setup timezone
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-4))

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_DATE = 'Date'
RPT_COL_STATUS = 'Status'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_HIST_QS = 'Hist. QS'
RPT_COL_QUALITY_SCORE = 'Quality Score'
RPT_COL_IMPR = 'Impr.'
RPT_COL_CONV = 'Conv.'
RPT_COL_CREATION_DATE = 'Creation Date'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_KEYWORD = 'Keyword'
BULK_COL_MATCH_TYPE = 'Match Type'
BULK_COL_STATUS = 'Status'
BULK_COL_AUTOMATION_INFO = 'AUTOMATION INFO'
outputDf[BULK_COL_STATUS] = "<<YOUR VALUE>>"
outputDf[BULK_COL_AUTOMATION_INFO] = "<<YOUR VALUE>>"

VAL_STATUS_PAUSED = 'Paused'
VAL_STATUS_ACTIVE = 'Active'
VAL_PUBLISHER_GOOGLE = 'Google'

print("inputDf.shape", inputDf.shape)
print(inputDf.info())

### user code starts here

groupby_cols = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, RPT_COL_KEYWORD, RPT_COL_MATCH_TYPE]

## Data Cleanup
inputDf_reduced = inputDf.drop(columns=[RPT_COL_STATUS]) \
                         .set_index(groupby_cols)

# Ensure RPT_COL_CREATION_DATE is a Pandas DateTime type with the correct format
inputDf_reduced[RPT_COL_CREATION_DATE] = inputDf_reduced[RPT_COL_CREATION_DATE].str.replace('a.m.', 'AM').str.replace('p.m.', 'PM')
inputDf_reduced[RPT_COL_CREATION_DATE] = pd.to_datetime(inputDf_reduced[RPT_COL_CREATION_DATE])
inputDf_reduced[RPT_COL_DATE] = pd.to_datetime(inputDf_reduced[RPT_COL_DATE])

print(f"Report Date Range: {inputDf_reduced[RPT_COL_DATE].min().date()} - {inputDf_reduced[RPT_COL_DATE].max().date()}")

# DEBUG: pick a specific keyword. Format is (kw, matchtype)
# TEST_KWS = [('new townhouses for sale near me','Broad')]
# group_keys = list(inputDf.groupby(groupby_cols).groups.keys())
# test_keys = [key for key in group_keys if any((test_kw[0] in key[3]) & (test_kw[1] in key[4]) for test_kw in TEST_KWS)]
# inputDf_reduced = inputDf_reduced.loc[test_keys]

print("inputDf_reduced.shape", inputDf_reduced.shape)
print(inputDf_reduced.info())
print("inputDf example", inputDf.head())

## Intermediate calculations 
#  - first_traffic_date: first date with traffic in report
#  - effective_creation_date: Creation Date is date created in Marin, so adjust it if there is traffic prior to that

# Group inputDf by Account, Campaign, Group, Keyword, and Match Type
grouped = inputDf_reduced.groupby(groupby_cols)

# Iterate through each group to determine the earliest date with Impressions
# Add this date to a new column called first_traffic_date
def get_first_traffic_date(group):
    group_with_impressions = group[group[RPT_COL_IMPR] > 0]
    if not group_with_impressions.empty:
        return group_with_impressions[RPT_COL_DATE].min()
    else:
        return None

# Apply the function to each group to get first_traffic_date
first_traffic_date_series = grouped.apply(get_first_traffic_date)

# Custom mean function to ignore zeros in Hist QS
def mean_without_zeros(series):
    non_zero_values = series[series != 0]
    if len(non_zero_values) == 0:
        return 0  # or return None if you prefer
    return non_zero_values.mean()

agg_spec = {
    RPT_COL_PUBLISHER: 'last',
    RPT_COL_IMPR: 'sum',
    RPT_COL_CONV: 'sum',
    RPT_COL_HIST_QS: mean_without_zeros, # Hist QS is zero on days without impressions, ignore them when calc mean
    RPT_COL_QUALITY_SCORE: 'last',
    RPT_COL_CREATION_DATE: 'last'
}

# with first_traffic_date calculated, remove Date column by aggregating
df_keywords = grouped.agg(agg_spec)

# Map the first_traffic_date to the original DataFrame using the same groupby columns
df_keywords['first_traffic_date'] = df_keywords.index.map(first_traffic_date_series)

# Compare first_traffic_date and RPT_COL_CREATION_DATE, take the earlier date, and add to new column effective_creation_date
df_keywords['effective_creation_date'] = df_keywords.apply(
    lambda row: min(row['first_traffic_date'], row[RPT_COL_CREATION_DATE]).date() 
    if pd.notnull(row['first_traffic_date']) else row[RPT_COL_CREATION_DATE].date(), 
    axis=1
)


## Find low-performing keywords to pause
#  - at least 30-days old
#  - no Conversions
#  - and one of below:
#   - Hist QS between 0 and 5, non-inclusive
#   - Hist QS is 0, but Quality Score is not zero (to exclude publishers that don't support QS)
#   - both Hist QS and Quality Score are zero, but only for Google

thirty_days_ago = pd.Timestamp.today(CLIENT_TIMEZONE).date() - pd.Timedelta(days=30)
print(f"30 days ago: {thirty_days_ago}")
at_least_30_days_old = df_keywords['effective_creation_date'] < thirty_days_ago
no_conversions = df_keywords[RPT_COL_CONV] == 0
low_hist_qs = (df_keywords[RPT_COL_HIST_QS] > 0) & (df_keywords[RPT_COL_HIST_QS] < 5)
zero_hist_qs = (df_keywords[RPT_COL_HIST_QS] == 0) & (df_keywords[RPT_COL_QUALITY_SCORE] > 0)
zero_hist_qs_and_qs_google = (df_keywords[RPT_COL_HIST_QS] == 0) & (df_keywords[RPT_COL_QUALITY_SCORE] == 0) & (df_keywords[RPT_COL_PUBLISHER] == VAL_PUBLISHER_GOOGLE)

# apply business rules
df_pause = df_keywords.loc[
    at_least_30_days_old &
    no_conversions &
    (low_hist_qs | zero_hist_qs | zero_hist_qs_and_qs_google)
]

## Prepare output

debugDf = df_keywords.reset_index()
output_cols = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, RPT_COL_KEYWORD, RPT_COL_MATCH_TYPE]
outputDf = df_pause.reset_index()[output_cols].copy()

# outputDf[BULK_COL_STATUS] = VAL_STATUS_PAUSED
info_str = f"Paused by Scripts on {pd.Timestamp.today(CLIENT_TIMEZONE).date()} due to low QS"
outputDf[BULK_COL_AUTOMATION_INFO] = info_str

print("outputDf.shape", outputDf.shape)
print("outputDf example", outputDf.head())

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")
else:
    print("outputDf", outputDf.head(5))

Post generated on 2025-03-11 01:25:51 GMT

08 Nov 2024

« Script 1499: Auto Create Suggested Keywords with CPA Criteria Script 1503: Sample Scripts Duplicate Keywords last 180 days »

MarinOne Scripts Creator's Corner