Script 1503: Sample Scripts Duplicate Keywords last 180 days

Purpose:

The Python script identifies duplicate keywords in advertising campaigns and recommends which ones to pause based on performance metrics.

To Elaborate

The script is designed to manage and optimize advertising campaigns by identifying duplicate keywords within the same publisher and account. It focuses on keywords that are currently active and ignores case sensitivity. The script evaluates the performance of these duplicate keywords based on several criteria, including campaign category, publisher cost, and other performance metrics like conversion rate and quality score. The goal is to recommend which keywords should remain active and which should be paused. Additionally, it suggests renaming campaigns that only contain losing keywords, thereby helping advertisers streamline their keyword strategy and improve campaign efficiency.

Walking Through the Code

Initialization and Setup:
- The script begins by determining whether it is running on a server or locally. If running locally, it loads data from a specified pickle file.
- It imports necessary libraries such as pandas, numpy, and others for data manipulation and analysis.
Data Preparation:
- The script filters out inactive keywords, groups, and campaigns to focus only on active ones.
- It constructs a unique key for each keyword by combining publisher, account, keyword (in lowercase), match type, and status.
Duplicate Identification:
- The script calculates the count, total spend, and highest bid for each group of duplicate keywords.
- It sorts these duplicates based on performance metrics to identify the best-performing keyword in each group.
Recommendation Process:
- The script marks the best-performing duplicate keywords as active and recommends pausing the others.
- It identifies campaigns that only have losing keywords and suggests renaming them to indicate their status.
Output Generation:
- The processed data is sorted and returned, with recommendations for keyword status and campaign name changes.
- If running locally, the script saves the output and debug information to CSV files for further analysis.

Vitals

Script ID : 1503
Client ID / Customer ID: 1306928469 / 60270543
Action Type: Email Report
Item Changed: None
Output Columns:
Linked Datasource: M1 Report
Reference Datasource: None
Owner: emerryfield@marinsoftware.com (emerryfield@marinsoftware.com)
Created by emerryfield@marinsoftware.com on 2024-11-08 23:34
Last Updated by Michael Huang on 2024-11-12 08:06

> See it in Action

Python Code

##
## name: Duplication Keyword Alert
## description:
##  Identify duplicate keywords and alert via email
##   * duplicated within same Publisher and Account
##   * effective status = Active
##   * ignore keyword case
##   * recommend winning keyword based on: 'New' campaign category, pub cost, etc
##   * recommend to change name of campaigns with only losing keywords
## 
## author: Michael S. Huang
## created: 2023-12-03
## updated: 2024-11-12
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/outdoor_network_dup_kw_20241112.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
  # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed

    # pandas settings
    pd.set_option('display.max_columns', None)  # Display all columns
    pd.set_option('display.max_colwidth', None)  # Display full content of each column

else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########


MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_STATUS = 'Status'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_CLASSIFICATION = 'Campaign Classification'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_GROUP = 'Group'
RPT_COL_GROUP_STATUS = 'Group Status'
RPT_COL_SEARCH_BID = 'Search Bid'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CONV_RATE = 'Conv. Rate %'
RPT_COL_CTR = 'CTR %'
RPT_COL_AVG_CPC = 'Avg. CPC $'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_HIST_QS = 'Hist. QS'

# user code start here

COL_KEYWORD_UNIQUE_KEY = 'kw_uniq_key'
COL_DUPE_COUNT = 'dupe_count'
COL_DUPE_SPEND = 'dupe_spend'
COL_DUPE_HIGH_BID = 'dupe_high_bid'
COL_RECOMMENDED_STATUS = 'Dedupe Recommended Status'
COL_RECOMMENDED_SEARCH_BID = 'Dedupe Recommended Search Bid'
COL_RECOMMENDED_CAMPAIGN_NAME = 'Dedupe Recommended Campaign Name'
COL_RECOMMENDED_CAMPAIGN_STATUS = 'Dedupe Recommended Campaign Status'

VAL_PAUSED = 'Paused'
VAL_ACTIVE = 'Active'


# main function that takes inputDf and returns (outputDf, debugDf)
def process(inputDf):
    print(">> inputDf.shape", inputDf.shape)
    print(">> inputDf.info", inputDf.info())
    # print(tableize(inputDf.head()))

    # COL_RECOMMENDED_CAMPAIGN_NAME was set wrongly before, so clear and recalc
    inputDf[COL_RECOMMENDED_CAMPAIGN_NAME] = ''

    # Also recalc Rec Campaign Status
    inputDf[COL_RECOMMENDED_CAMPAIGN_STATUS] = ''

    ### Cleanup: exclude inactive keywords
    active_keywords = inputDf[RPT_COL_STATUS] == VAL_ACTIVE
    active_groups = inputDf[RPT_COL_GROUP_STATUS] == VAL_ACTIVE
    active_campaigns = inputDf[RPT_COL_CAMPAIGN_STATUS] == VAL_ACTIVE
    inactive_keywords = ~(active_keywords & active_groups & active_campaigns)

    # actually exclude them
    excluded_keywords = inactive_keywords
    print(">> total excluded keywords: ", excluded_keywords.sum())
    df_working = inputDf.loc[~excluded_keywords].copy()
    print(">> after cleanup. df_working.shape", df_working.shape)

    ### Find Duplicate Keywords within Account

    keys = df_working[RPT_COL_PUBLISHER] + '__' + df_working[RPT_COL_ACCOUNT] + '__' + df_working[RPT_COL_KEYWORD].str.lower() + '__' + df_working[RPT_COL_MATCH_TYPE] + '__' + df_working[RPT_COL_STATUS]
    df_working.loc[:, COL_KEYWORD_UNIQUE_KEY] = keys
    # calc count for each dupe group
    df_working[COL_DUPE_COUNT] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[COL_KEYWORD_UNIQUE_KEY].transform('size')
    # calc total spend for each dupe group
    df_working[COL_DUPE_SPEND] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_PUB_COST].transform('sum').round(2)
    # find highest bid for each dupe group
    df_working[COL_DUPE_HIGH_BID] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_SEARCH_BID].transform('max').round(2)

    # Find dupes and sort them according to performance
    df_working = df_working.loc[df_working[COL_DUPE_COUNT] > 1] \
                            .sort_values(by=[ \
                            RPT_COL_CAMPAIGN_CLASSIFICATION, \
                            COL_DUPE_SPEND, \
                            COL_KEYWORD_UNIQUE_KEY, \
                            RPT_COL_PUB_COST, \
                            RPT_COL_HIST_QS, \
                            RPT_COL_CONV_RATE, \
                            RPT_COL_IMPR_SHARE], \
                            ascending=[False, False, False, False, False, False, False])

    print(">> keep dupes only. df_working.shape", df_working.shape)

    # dupe winners have best performance
    df_winners = df_working.copy().drop_duplicates(subset=COL_KEYWORD_UNIQUE_KEY)
    df_winners[COL_RECOMMENDED_STATUS] = VAL_ACTIVE
    df_winners[COL_RECOMMENDED_SEARCH_BID] = df_winners[COL_DUPE_HIGH_BID]

    print(">> df_winners.shape", df_winners.shape)

    # join back to df_working, and mark the remaining dupes as Paused
    outputDf = df_working.merge(df_winners[[COL_RECOMMENDED_STATUS, COL_RECOMMENDED_SEARCH_BID]], left_index=True, right_index=True, how='left').copy()
    outputDf[COL_RECOMMENDED_STATUS].fillna(VAL_PAUSED, inplace=True)

    # Identify campaigns that only have losing keywords
    losing_keywords = outputDf[COL_RECOMMENDED_STATUS] == VAL_PAUSED
    losing_campaigns = outputDf[losing_keywords].groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
    all_campaigns = outputDf.groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
    only_losing_campaigns = losing_campaigns.merge(all_campaigns, on=RPT_COL_CAMPAIGN_ID, suffixes=('_losing', '_all'))
    only_losing_campaigns = only_losing_campaigns[only_losing_campaigns['Count_losing'] == only_losing_campaigns['Count_all']]
    only_losing_campaign_ids = only_losing_campaigns['Campaign ID']

    # Recommend a name change for campaigns with only losing keywords
    only_losing_campaign_index = outputDf[RPT_COL_CAMPAIGN_ID].isin(only_losing_campaign_ids)
    outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + outputDf.loc[only_losing_campaign_index, RPT_COL_CAMPAIGN]
    outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_STATUS] = VAL_PAUSED
   
    outputDf = outputDf.sort_values(by=[COL_KEYWORD_UNIQUE_KEY, RPT_COL_CAMPAIGN_CLASSIFICATION], ascending=[True, False])

    return (outputDf, df_working)

# actually process data
(outputDf, debugDf) = process(inputDf)

print("outputDf.shape", outputDf.shape)

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")
else:
    print("outputDf", outputDf.head(5))

Post generated on 2025-03-11 01:25:51 GMT

08 Nov 2024

« Script 1501: Sample Script Pause Poor performing Keywords with 0 Conv. and QS 5 Script 1505: Sample Script Campaign Benchmark Analysis »

MarinOne Scripts Creator's Corner