Script 1503: Sample Scripts Duplicate Keywords last 180 days
Purpose
The Python script identifies duplicate keywords within advertising campaigns and recommends which keywords to pause based on performance metrics.
To Elaborate
The script is designed to manage duplicate keywords in advertising campaigns by identifying and recommending which keywords should remain active and which should be paused. It focuses on keywords that are duplicated within the same publisher and account, ensuring that only active keywords are considered. The script evaluates keywords based on several performance metrics, such as cost, quality score, conversion rate, and impression share, and prioritizes keywords in campaigns classified as “New.” The goal is to optimize campaign performance by retaining the best-performing keywords and suggesting changes for campaigns that only contain underperforming keywords.
Walking Through the Code
-
Initialization and Setup: The script begins by determining whether it is running on a server or locally. If running locally, it loads data from a specified pickle file and sets up the environment with necessary imports and configurations.
-
Data Preparation: The script filters out inactive keywords, groups, and campaigns to focus only on active elements. It constructs a unique key for each keyword to identify duplicates within the same publisher and account.
-
Duplicate Identification: It calculates the count, total spend, and highest bid for each group of duplicate keywords. The script then sorts these duplicates based on performance metrics to identify the best-performing keywords.
-
Recommendation Generation: The script marks the best-performing duplicates as active and recommends pausing the rest. It also identifies campaigns that only contain losing keywords and suggests renaming these campaigns to indicate their status.
-
Output: The processed data is sorted and returned, with recommendations for keyword status and campaign name changes. If running locally, the output is saved to CSV files for further analysis.
Vitals
- Script ID : 1503
- Client ID / Customer ID: 1306928469 / 60270543
- Action Type: Email Report
- Item Changed: None
- Output Columns:
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: emerryfield@marinsoftware.com (emerryfield@marinsoftware.com)
- Created by emerryfield@marinsoftware.com on 2024-11-08 23:34
- Last Updated by Michael Huang on 2024-11-12 08:06
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
##
## name: Duplication Keyword Alert
## description:
## Identify duplicate keywords and alert via email
## * duplicated within same Publisher and Account
## * effective status = Active
## * ignore keyword case
## * recommend winning keyword based on: 'New' campaign category, pub cost, etc
## * recommend to change name of campaigns with only losing keywords
##
## author: Michael S. Huang
## created: 2023-12-03
## updated: 2024-11-12
##
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/outdoor_network_dup_kw_20241112.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
# print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
inputDf = dataSourceDict["1"]
outputDf = inputDf.copy()
# setup timezone
import datetime
# Chicago Timezone is GMT-5. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))
# import pandas
import pandas as pd
import numpy as np
# other imports
import re
import urllib
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
# pandas settings
pd.set_option('display.max_columns', None) # Display all columns
pd.set_option('display.max_colwidth', None) # Display full content of each column
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_STATUS = 'Status'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_CLASSIFICATION = 'Campaign Classification'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_GROUP = 'Group'
RPT_COL_GROUP_STATUS = 'Group Status'
RPT_COL_SEARCH_BID = 'Search Bid'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CONV_RATE = 'Conv. Rate %'
RPT_COL_CTR = 'CTR %'
RPT_COL_AVG_CPC = 'Avg. CPC $'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_HIST_QS = 'Hist. QS'
# user code start here
COL_KEYWORD_UNIQUE_KEY = 'kw_uniq_key'
COL_DUPE_COUNT = 'dupe_count'
COL_DUPE_SPEND = 'dupe_spend'
COL_DUPE_HIGH_BID = 'dupe_high_bid'
COL_RECOMMENDED_STATUS = 'Dedupe Recommended Status'
COL_RECOMMENDED_SEARCH_BID = 'Dedupe Recommended Search Bid'
COL_RECOMMENDED_CAMPAIGN_NAME = 'Dedupe Recommended Campaign Name'
COL_RECOMMENDED_CAMPAIGN_STATUS = 'Dedupe Recommended Campaign Status'
VAL_PAUSED = 'Paused'
VAL_ACTIVE = 'Active'
# main function that takes inputDf and returns (outputDf, debugDf)
def process(inputDf):
print(">> inputDf.shape", inputDf.shape)
print(">> inputDf.info", inputDf.info())
# print(tableize(inputDf.head()))
# COL_RECOMMENDED_CAMPAIGN_NAME was set wrongly before, so clear and recalc
inputDf[COL_RECOMMENDED_CAMPAIGN_NAME] = ''
# Also recalc Rec Campaign Status
inputDf[COL_RECOMMENDED_CAMPAIGN_STATUS] = ''
### Cleanup: exclude inactive keywords
active_keywords = inputDf[RPT_COL_STATUS] == VAL_ACTIVE
active_groups = inputDf[RPT_COL_GROUP_STATUS] == VAL_ACTIVE
active_campaigns = inputDf[RPT_COL_CAMPAIGN_STATUS] == VAL_ACTIVE
inactive_keywords = ~(active_keywords & active_groups & active_campaigns)
# actually exclude them
excluded_keywords = inactive_keywords
print(">> total excluded keywords: ", excluded_keywords.sum())
df_working = inputDf.loc[~excluded_keywords].copy()
print(">> after cleanup. df_working.shape", df_working.shape)
### Find Duplicate Keywords within Account
keys = df_working[RPT_COL_PUBLISHER] + '__' + df_working[RPT_COL_ACCOUNT] + '__' + df_working[RPT_COL_KEYWORD].str.lower() + '__' + df_working[RPT_COL_MATCH_TYPE] + '__' + df_working[RPT_COL_STATUS]
df_working.loc[:, COL_KEYWORD_UNIQUE_KEY] = keys
# calc count for each dupe group
df_working[COL_DUPE_COUNT] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[COL_KEYWORD_UNIQUE_KEY].transform('size')
# calc total spend for each dupe group
df_working[COL_DUPE_SPEND] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_PUB_COST].transform('sum').round(2)
# find highest bid for each dupe group
df_working[COL_DUPE_HIGH_BID] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_SEARCH_BID].transform('max').round(2)
# Find dupes and sort them according to performance
df_working = df_working.loc[df_working[COL_DUPE_COUNT] > 1] \
.sort_values(by=[ \
RPT_COL_CAMPAIGN_CLASSIFICATION, \
COL_DUPE_SPEND, \
COL_KEYWORD_UNIQUE_KEY, \
RPT_COL_PUB_COST, \
RPT_COL_HIST_QS, \
RPT_COL_CONV_RATE, \
RPT_COL_IMPR_SHARE], \
ascending=[False, False, False, False, False, False, False])
print(">> keep dupes only. df_working.shape", df_working.shape)
# dupe winners have best performance
df_winners = df_working.copy().drop_duplicates(subset=COL_KEYWORD_UNIQUE_KEY)
df_winners[COL_RECOMMENDED_STATUS] = VAL_ACTIVE
df_winners[COL_RECOMMENDED_SEARCH_BID] = df_winners[COL_DUPE_HIGH_BID]
print(">> df_winners.shape", df_winners.shape)
# join back to df_working, and mark the remaining dupes as Paused
outputDf = df_working.merge(df_winners[[COL_RECOMMENDED_STATUS, COL_RECOMMENDED_SEARCH_BID]], left_index=True, right_index=True, how='left').copy()
outputDf[COL_RECOMMENDED_STATUS].fillna(VAL_PAUSED, inplace=True)
# Identify campaigns that only have losing keywords
losing_keywords = outputDf[COL_RECOMMENDED_STATUS] == VAL_PAUSED
losing_campaigns = outputDf[losing_keywords].groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
all_campaigns = outputDf.groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
only_losing_campaigns = losing_campaigns.merge(all_campaigns, on=RPT_COL_CAMPAIGN_ID, suffixes=('_losing', '_all'))
only_losing_campaigns = only_losing_campaigns[only_losing_campaigns['Count_losing'] == only_losing_campaigns['Count_all']]
only_losing_campaign_ids = only_losing_campaigns['Campaign ID']
# Recommend a name change for campaigns with only losing keywords
only_losing_campaign_index = outputDf[RPT_COL_CAMPAIGN_ID].isin(only_losing_campaign_ids)
outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + outputDf.loc[only_losing_campaign_index, RPT_COL_CAMPAIGN]
outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_STATUS] = VAL_PAUSED
outputDf = outputDf.sort_values(by=[COL_KEYWORD_UNIQUE_KEY, RPT_COL_CAMPAIGN_CLASSIFICATION], ascending=[True, False])
return (outputDf, df_working)
# actually process data
(outputDf, debugDf) = process(inputDf)
print("outputDf.shape", outputDf.shape)
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
else:
print("outputDf", outputDf.head(5))
Post generated on 2024-11-27 06:58:46 GMT