Script 551: Duplication Keywords Report

Purpose:

The Python script identifies duplicate keywords within the same publisher and account, recommends which keywords to pause based on performance metrics, and alerts users via email.

To Elaborate

The script is designed to manage and optimize keyword usage in advertising campaigns by identifying duplicate keywords within the same publisher and account. It focuses on active keywords, ignoring case differences and placeholder keywords. The script evaluates duplicates based on performance metrics such as cost, quality score, conversion rate, and impression share, recommending the best-performing keyword to remain active. Keywords in campaigns classified as ‘New’ are given priority. Additionally, the script suggests renaming campaigns that contain only losing keywords, ensuring efficient keyword management and improved campaign performance.

Walking Through the Code

Initialization and Setup:
- The script begins by determining whether it is running locally or on a server. If running locally, it loads data from a pickle file specified by the user.
- It imports necessary libraries and sets up configurations for data handling.
Data Cleaning:
- The script filters out placeholder keywords and inactive keywords, ensuring only relevant data is processed.
- It identifies keywords that are active across campaigns, groups, and accounts.
Duplicate Identification:
- The script constructs a unique key for each keyword based on publisher, account, keyword text, match type, and status.
- It calculates the count, total spend, and highest bid for each group of duplicate keywords.
Performance Evaluation:
- Duplicate keywords are sorted based on campaign classification and performance metrics.
- The script identifies the best-performing keyword in each duplicate group and recommends it to remain active.
Campaign Recommendations:
- Campaigns with only losing keywords are identified, and a recommendation is made to rename these campaigns.
- The script updates the status and search bid recommendations for keywords based on their performance.
Output and Testing:
- The script includes unit tests to validate its functionality.
- It processes the data and generates output, which can be saved locally if running in a development environment.

Vitals

Script ID : 551
Client ID / Customer ID: 1306922797 / 60269073
Action Type: Email Report
Item Changed: None
Output Columns:
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Michael Huang (mhuang@marinsoftware.com)
Created by Michael Huang on 2023-12-03 23:13
Last Updated by Grégory Pantaine on 2024-03-11 16:20

> See it in Action

Python Code

##
## name: Duplication Keyword Alert
## description:
##  Identify duplicate keywords and alert via email
##   * duplicated within same Publisher and Account
##   * effective status = Active
##   * ignore keyword case
##   * ignore placeholder keywords
##   * recommend winning keyword based on: 'New' campaign category, pub cost, etc
##   * recommend to change name of campaigns with only losing keywords
## 
## author: Michael S. Huang
## created: 2023-12-03
## updated: 2025-02-06
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/yotel_keyword_20240206_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
  # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed

    # pandas settings
    pd.set_option('display.max_columns', None)  # Display all columns
    pd.set_option('display.max_colwidth', None)  # Display full content of each column

else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########


MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_STATUS = 'Status'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_CLASSIFICATION = 'Campaign Classification'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_GROUP = 'Group'
RPT_COL_GROUP_STATUS = 'Group Status'
RPT_COL_SEARCH_BID = 'Search Bid'
RPT_COL_PUB_COST = 'Pub. Cost £'
RPT_COL_CONV_RATE = 'Conv. Rate %'
RPT_COL_CTR = 'CTR %'
RPT_COL_AVG_CPC = 'Avg. CPC £'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_HIST_QS = 'Hist. QS'

# user code start here

COL_KEYWORD_UNIQUE_KEY = 'kw_uniq_key'
COL_DUPE_COUNT = 'dupe_count'
COL_DUPE_SPEND = 'dupe_spend'
COL_DUPE_HIGH_BID = 'dupe_high_bid'
COL_RECOMMENDED_STATUS = 'Dedupe Recommended Status'
COL_RECOMMENDED_SEARCH_BID = 'Dedupe Recommended Search Bid'
COL_RECOMMENDED_CAMPAIGN_NAME = 'Dedupe Recommended Campaign Name'
COL_RECOMMENDED_CAMPAIGN_STATUS = 'Dedupe Recommended Campaign Status'

VAL_PAUSED = 'Paused'
VAL_ACTIVE = 'Active'


# main function that takes inputDf and returns (outputDf, debugDf)
def process(inputDf):
    print(">> inputDf.shape", inputDf.shape)
    print(">> inputDf.info", inputDf.info())
    # print(tableize(inputDf.head()))

    # COL_RECOMMENDED_CAMPAIGN_NAME was set wrongly before, so clear and recalc
    inputDf[COL_RECOMMENDED_CAMPAIGN_NAME] = ''

    # Also recalc Rec Campaign Status
    inputDf[COL_RECOMMENDED_CAMPAIGN_STATUS] = ''

    ### Change name of single-keyword campaigns
    # campaigns_with_single_keyword = inputDf.groupby(RPT_COL_CAMPAIGN).filter(lambda x: len(x) == 1)
    # print(f"There are {campaigns_with_single_keyword.shape[0]} campaigns with only a single keyword:")
    # print(campaigns_with_single_keyword[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_CAMPAIGN_ID, RPT_COL_KEYWORD]].head())
    # inputDf.loc[campaigns_with_single_keyword.index, COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + inputDf.loc[campaigns_with_single_keyword.index, RPT_COL_CAMPAIGN]
    # inputDf[COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + inputDf[RPT_COL_CAMPAIGN]

    ### Cleanup: exclude placholder keywords and paused keywords

    # derbysoft accounts only contain placeholder keywords like `Google/GB`
    derbysoft_accounts = inputDf[RPT_COL_ACCOUNT].str.contains('Derbysoft', case=False, regex=False)
    # other accounts may contain placeholders like `(not set)/GB`
    notset_keywords = inputDf[RPT_COL_KEYWORD].str.contains('not set', regex=False)
    # both are considered placeholders
    placeholder_keywords = derbysoft_accounts | notset_keywords

    # get count of placeholder keywords by account
    placeholder_keyword_counts = inputDf.loc[placeholder_keywords, [RPT_COL_ACCOUNT]].value_counts()
    print(">> placeholder keyword distribution")
    print(placeholder_keyword_counts.head(50))

    ### Cleanup: exclude inactive keywords
    active_keywords = inputDf[RPT_COL_STATUS] == VAL_ACTIVE
    active_groups = inputDf[RPT_COL_GROUP_STATUS] == VAL_ACTIVE
    active_campaigns = inputDf[RPT_COL_CAMPAIGN_STATUS] == VAL_ACTIVE
    inactive_keywords = ~(active_keywords & active_groups & active_campaigns)

    # actually exclude them
    excluded_keywords = placeholder_keywords | inactive_keywords
    print(">> total excluded keywords: ", excluded_keywords.sum())
    df_working = inputDf.loc[~excluded_keywords].copy()
    print(">> after cleanup. df_working.shape", df_working.shape)

    ### Find Duplicate Keywords within Account

    keys = df_working[RPT_COL_PUBLISHER] + '__' + df_working[RPT_COL_ACCOUNT] + '__' + df_working[RPT_COL_KEYWORD].str.lower() + '__' + df_working[RPT_COL_MATCH_TYPE] + '__' + df_working[RPT_COL_STATUS]
    df_working.loc[:, COL_KEYWORD_UNIQUE_KEY] = keys
    # calc count for each dupe group
    df_working[COL_DUPE_COUNT] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[COL_KEYWORD_UNIQUE_KEY].transform('size')
    # calc total spend for each dupe group
    df_working[COL_DUPE_SPEND] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_PUB_COST].transform('sum').round(2)
    # find highest bid for each dupe group
    df_working[COL_DUPE_HIGH_BID] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_SEARCH_BID].transform('max').round(2)

    # Find dupes and sort them according to performance
    df_working = df_working.loc[df_working[COL_DUPE_COUNT] > 1] \
                            .sort_values(by=[ \
                            RPT_COL_CAMPAIGN_CLASSIFICATION, \
                            COL_DUPE_SPEND, \
                            COL_KEYWORD_UNIQUE_KEY, \
                            RPT_COL_PUB_COST, \
                            RPT_COL_HIST_QS, \
                            RPT_COL_CONV_RATE, \
                            RPT_COL_IMPR_SHARE], \
                            ascending=[False, False, False, False, False, False, False])

    print(">> keep dupes only. df_working.shape", df_working.shape)

    # dupe winners have best performance
    df_winners = df_working.copy().drop_duplicates(subset=COL_KEYWORD_UNIQUE_KEY)
    df_winners[COL_RECOMMENDED_STATUS] = VAL_ACTIVE
    df_winners[COL_RECOMMENDED_SEARCH_BID] = df_winners[COL_DUPE_HIGH_BID]

    print(">> df_winners.shape", df_winners.shape)

    # join back to df_working, and mark the remaining dupes as Paused
    outputDf = df_working.merge(df_winners[[COL_RECOMMENDED_STATUS, COL_RECOMMENDED_SEARCH_BID]], left_index=True, right_index=True, how='left').copy()
    outputDf[COL_RECOMMENDED_STATUS].fillna(VAL_PAUSED, inplace=True)

    # Identify campaigns that only have losing keywords
    losing_keywords = outputDf[COL_RECOMMENDED_STATUS] == VAL_PAUSED
    losing_campaigns = outputDf[losing_keywords].groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
    all_campaigns = outputDf.groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
    only_losing_campaigns = losing_campaigns.merge(all_campaigns, on=RPT_COL_CAMPAIGN_ID, suffixes=('_losing', '_all'))
    only_losing_campaigns = only_losing_campaigns[only_losing_campaigns['Count_losing'] == only_losing_campaigns['Count_all']]
    only_losing_campaign_ids = only_losing_campaigns['Campaign ID']

    # Recommend a name change for campaigns with only losing keywords
    only_losing_campaign_index = outputDf[RPT_COL_CAMPAIGN_ID].isin(only_losing_campaign_ids)
    outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + outputDf.loc[only_losing_campaign_index, RPT_COL_CAMPAIGN]
    outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_STATUS] = VAL_PAUSED
   
    outputDf = outputDf.sort_values(by=[COL_KEYWORD_UNIQUE_KEY, RPT_COL_CAMPAIGN_CLASSIFICATION], ascending=[True, False])

    return (outputDf, df_working)

# unit test
def test_process_general_case():
    print ("### Running Test: test_process_general_case ###")
    # from io import StringIO
    test_data_string = '''
[
    {
        "Keyword": "yotel hotel miami",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 21993144,
        "Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
        "Campaign Status": "Paused",
        "Campaign Classification": "Legacy",
        "Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
        "Group Status": "Active",
        "Search Bid": 1.16,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact"
    },
    {
        "Keyword": "YOTEL HOTEL MIAMI",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22506165,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Campaign Status": "Active",
        "Campaign Classification": "New",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Group Status": "Active",
        "Search Bid": 0.48,
        "Pub. Cost \\u00a3": 3.66,
        "Conv. Rate %": 0.0,
        "CTR %": 0.1408,
        "Avg. CPC \\u00a3": 0.37,
        "Impr. share %": 0.5038,
        "Hist. QS": 8.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Deleted",
        "Match Type": "Broad",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22506165,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Campaign Status": "Active",
        "Campaign Classification": "New",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Group Status": "Active",
        "Search Bid": 0.0,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22506165,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Campaign Status": "Active",
        "Campaign Classification": "New",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Group Status": "Active",
        "Search Bid": 0.54,
        "Pub. Cost \\u00a3": 1.11,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0952,
        "Avg. CPC \\u00a3": 0.28,
        "Impr. share %": 0.3824,
        "Hist. QS": 8.12,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22551730,
        "Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact Landing Page Test",
        "Campaign Status": "Active",
        "Campaign Classification": "Legacy",
        "Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
        "Group Status": "Active",
        "Search Bid": 0.5,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact Landing Page Test"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22557952,
        "Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2",
        "Campaign Status": "Active",
        "Campaign Classification": "Legacy",
        "Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
        "Group Status": "Active",
        "Search Bid": 2.25,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22707648,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Campaign Status": "Active",
        "Campaign Classification": "Legacy",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Group Status": "Active",
        "Search Bid": 0.1,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Deleted",
        "Match Type": "Broad",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22707648,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Campaign Status": "Active",
        "Campaign Classification": "Legacy",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Group Status": "Active",
        "Search Bid": 0.0,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
    }
]
'''
    # string_io = StringIO(test_data_string)
    # inputDf = pd.read_json(string_io)
    inputDf = pd.read_json(test_data_string)
    (outputDf, debugDf) = process(inputDf)

    outputDf = outputDf.fillna('')
    print(">>OutputDf\n", outputDf)

    uniq_rec_campaign_name_count = outputDf \
                    .groupby(by=[RPT_COL_CAMPAIGN_ID])[COL_RECOMMENDED_CAMPAIGN_NAME] \
                    .nunique() \
                    .reset_index(name='Count')
    more_than_one_rec_name = uniq_rec_campaign_name_count.loc[uniq_rec_campaign_name_count['Count'] > 1]
    assert more_than_one_rec_name.empty, "Campaign has more than one Rec Name"

    # 2nd keyword wins
    winner = outputDf.loc[1]
    # print(">>winner\n", winner)
    assert winner[COL_RECOMMENDED_STATUS] == VAL_ACTIVE, "Winning keyword Rec Status not Active"
    assert winner[COL_RECOMMENDED_SEARCH_BID] == 2.25, "Winning keyword not receive high bid from cohort"
    assert winner[COL_RECOMMENDED_CAMPAIGN_NAME] == '', "Winning keyword should not have Rec Campaign Name"
    assert winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == '', "Winning keyword should not have Rec Campaign Status"

    # all but the 2nd keyword are losers
    losers = outputDf.loc[outputDf.index != 1]
    assert all(losers[COL_RECOMMENDED_STATUS] == VAL_PAUSED), "Losing keyword Rec Status not Paused"
    assert all(losers[COL_RECOMMENDED_SEARCH_BID] == ''), "Losing keyword Rec Bid not empty"

    # 2nd, 3rd (filtered out) and 4th keywords belong to the same campaign; campaign has winner
    campaigns_with_winner = outputDf.loc[[1, 3]]
    assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_NAME] == ''), "Campaign with winner should not have Rec Campaign Name"
    assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == ''), "Campaign with winner should not have Rec Campaign Status"

    # all other campaigns have 100% losing keywords
    campaigns_without_winner = outputDf.loc[(outputDf.index != 1) & (outputDf.index != 2) & (outputDf.index != 3)]
    assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_NAME].str.contains('ZZ__')), "Campaigns with no winners should have Rec Campaign Name starting with ZZ__"
    assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == VAL_PAUSED), "Campaigns with no winners should have Rec Campaign Status as Paused"


    print ("### Test PASSED ###")
    
# unit test
def test_process_campaign_category():
    print ("### Running Test: test_process_campaign_category ###")
    # from io import StringIO
    test_data_string = '''
[

    {
        "Keyword": "YOTEL HOTEL MIAMI",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22506165,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Campaign Status": "Active",
        "Campaign Classification": "Legacy",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
        "Group Status": "Active",
        "Search Bid": 0.48,
        "Pub. Cost \\u00a3": 3.66,
        "Conv. Rate %": 0.0,
        "CTR %": 0.1408,
        "Avg. CPC \\u00a3": 0.37,
        "Impr. share %": 0.5038,
        "Hist. QS": 8.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22707648,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Campaign Status": "Active",
        "Campaign Classification": "New",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Group Status": "Active",
        "Search Bid": 0.1,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Deleted",
        "Match Type": "Broad",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22707648,
        "Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Campaign Status": "Active",
        "Campaign Classification": "New",
        "Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
        "Group Status": "Active",
        "Search Bid": 0.0,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
    },
    {
        "Keyword": "yotel hotel miami",
        "Status": "Active",
        "Match Type": "Exact",
        "Publisher": "Google",
        "Account": "YOTEL Miami",
        "Campaign ID": 22557952,
        "Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2",
        "Campaign Status": "Active",
        "Campaign Classification": "Legacy",
        "Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
        "Group Status": "Active",
        "Search Bid": 2.25,
        "Pub. Cost \\u00a3": 0.0,
        "Conv. Rate %": 0.0,
        "CTR %": 0.0,
        "Avg. CPC \\u00a3": 0.0,
        "Impr. share %": 0.0,
        "Hist. QS": 0.0,
        "Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2"
    }
]
'''
    # string_io = StringIO(test_data_string)
    # inputDf = pd.read_json(string_io)
    inputDf = pd.read_json(test_data_string)
    (outputDf, debugDf) = process(inputDf)

    outputDf = outputDf.fillna('')
    print(">>OutputDf\n", outputDf)

    uniq_rec_campaign_name_count = outputDf \
                    .groupby(by=[RPT_COL_CAMPAIGN_ID])[COL_RECOMMENDED_CAMPAIGN_NAME] \
                    .nunique() \
                    .reset_index(name='Count')
    more_than_one_rec_name = uniq_rec_campaign_name_count.loc[uniq_rec_campaign_name_count['Count'] > 1]
    assert more_than_one_rec_name.empty, "Campaign has more than one Rec Name"

    # 2nd keyword wins
    winner = outputDf.loc[1]
    # print(">>winner\n", winner)
    assert winner[COL_RECOMMENDED_STATUS] == VAL_ACTIVE, "Winning keyword Rec Status not Active"
    assert winner[COL_RECOMMENDED_SEARCH_BID] == 2.25, "Winning keyword not receive high bid from cohort"
    assert winner[COL_RECOMMENDED_CAMPAIGN_NAME] == '', "Winning keyword should not have Rec Campaign Name"
    assert winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == '', "Winning keyword should not have Rec Campaign Status"

    # all but the 2nd keyword are losers
    losers = outputDf.loc[outputDf.index != 1]
    assert all(losers[COL_RECOMMENDED_STATUS] == VAL_PAUSED), "Losing keyword Rec Status not Paused"
    assert all(losers[COL_RECOMMENDED_SEARCH_BID] == ''), "Losing keyword Rec Bid not empty"

    # 2nd, 3rd (filtered out) belong to the same campaign; campaign has winner
    campaigns_with_winner = outputDf.loc[[1]]
    assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_NAME] == ''), "Campaign with winner should not have Rec Campaign Name"
    assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == ''), "Campaign with winner should not have Rec Campaign Status"

    # all other campaigns have 100% losing keywords
    campaigns_without_winner = outputDf.loc[(outputDf.index != 1)]
    assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_NAME].str.contains('ZZ__')), "Campaigns with no winners should have Rec Campaign Name starting with ZZ__"
    assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == VAL_PAUSED), "Campaigns with no winners should have Rec Campaign Status as Paused"


    print ("### Test PASSED ###")


# run unit test
test_process_general_case()
test_process_campaign_category()

# actually process data
(outputDf, debugDf) = process(inputDf)

print("outputDf.shape", outputDf.shape)

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")
else:
    print("outputDf", outputDf.head(5))

Post generated on 2025-03-11 01:25:51 GMT

03 Dec 2023

« Script 549: Budget Pacing Script 555: Set Pacing Start & End Date »

MarinOne Scripts Creator's Corner