Script 1501: Sample Script Pause Poor performing Keywords with 0 Conv. and QS 5
Purpose
The Python script identifies and recommends pausing keywords that are at least 30 days old, have zero conversions, and a quality score below 5.
To Elaborate
The script is designed to optimize keyword performance in advertising campaigns by identifying underperforming keywords. It specifically targets keywords that have been active for at least 30 days but have not resulted in any conversions. Additionally, it checks the quality score of these keywords, focusing on those with a score below 5. The script aggregates data from various advertising accounts and campaigns, evaluates the performance of each keyword, and recommends pausing those that meet the criteria of being both old enough and ineffective. This process helps in maintaining efficient ad spend by ensuring that only productive keywords remain active.
Walking Through the Code
- Initialization and Setup:
- The script begins by determining whether it is running on a server or locally. If running locally, it loads data from a specified pickle file.
- It sets up necessary libraries and configurations, including pandas and numpy for data manipulation.
- Data Preparation:
- The script defines the primary data source and relevant columns, such as keyword, date, and quality score.
- It cleans the data by dropping unnecessary columns and converting date strings to datetime objects for accurate comparisons.
- Data Aggregation:
- Keywords are grouped by account, campaign, group, keyword, and match type.
- The script calculates the first date with impressions and aggregates data to determine historical quality scores and other metrics.
- Keyword Evaluation:
- The script identifies keywords that are at least 30 days old, have zero conversions, and meet specific quality score conditions.
- It applies business rules to filter out keywords that should be paused, considering historical and current quality scores.
- Output Preparation:
- The script prepares the output data frame with keywords recommended for pausing, including relevant account and campaign details.
- It adds an automation info note explaining the reason for pausing, based on the date and quality score criteria.
- Local Development and Debugging:
- If running locally, the script writes the output and debug information to CSV files for further analysis.
Vitals
- Script ID : 1501
- Client ID / Customer ID: 1306928469 / 60270543
- Action Type: Bulk Upload (Preview)
- Item Changed: Keyword
- Output Columns: Account, Campaign, Group, Keyword, Match Type, AUTOMATION INFO
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: emerryfield@marinsoftware.com (emerryfield@marinsoftware.com)
- Created by emerryfield@marinsoftware.com on 2024-11-08 23:18
- Last Updated by emerryfield@marinsoftware.com on 2024-11-08 23:26
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
##
## name: Pause Poor performing Keywords with 0 Conv. and QS <5
## description:
## Look at the creation Date and make sure the KW is at least 30 days old. Then, if Conversions are 0 and Quality score is <5 pause the KW
##
## author: Dana Waidhas, Michael S. Huang
## created: 2024-04-02
##
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/roc_media_keyword_qs_20240410_no_conv_filter.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
# print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
inputDf = dataSourceDict["1"]
outputDf = inputDf.copy()
# setup timezone
import datetime
# Chicago Timezone is GMT-5. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))
# import pandas
import pandas as pd
import numpy as np
# other imports
import re
import urllib
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
# pandas settings
pd.set_option('display.max_columns', None) # Display all columns
pd.set_option('display.max_colwidth', None) # Display full content of each column
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
# Setup timezone
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-4))
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_DATE = 'Date'
RPT_COL_STATUS = 'Status'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_HIST_QS = 'Hist. QS'
RPT_COL_QUALITY_SCORE = 'Quality Score'
RPT_COL_IMPR = 'Impr.'
RPT_COL_CONV = 'Conv.'
RPT_COL_CREATION_DATE = 'Creation Date'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_KEYWORD = 'Keyword'
BULK_COL_MATCH_TYPE = 'Match Type'
BULK_COL_STATUS = 'Status'
BULK_COL_AUTOMATION_INFO = 'AUTOMATION INFO'
outputDf[BULK_COL_STATUS] = "<<YOUR VALUE>>"
outputDf[BULK_COL_AUTOMATION_INFO] = "<<YOUR VALUE>>"
VAL_STATUS_PAUSED = 'Paused'
VAL_STATUS_ACTIVE = 'Active'
VAL_PUBLISHER_GOOGLE = 'Google'
print("inputDf.shape", inputDf.shape)
print(inputDf.info())
### user code starts here
groupby_cols = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, RPT_COL_KEYWORD, RPT_COL_MATCH_TYPE]
## Data Cleanup
inputDf_reduced = inputDf.drop(columns=[RPT_COL_STATUS]) \
.set_index(groupby_cols)
# Ensure RPT_COL_CREATION_DATE is a Pandas DateTime type with the correct format
inputDf_reduced[RPT_COL_CREATION_DATE] = inputDf_reduced[RPT_COL_CREATION_DATE].str.replace('a.m.', 'AM').str.replace('p.m.', 'PM')
inputDf_reduced[RPT_COL_CREATION_DATE] = pd.to_datetime(inputDf_reduced[RPT_COL_CREATION_DATE])
inputDf_reduced[RPT_COL_DATE] = pd.to_datetime(inputDf_reduced[RPT_COL_DATE])
print(f"Report Date Range: {inputDf_reduced[RPT_COL_DATE].min().date()} - {inputDf_reduced[RPT_COL_DATE].max().date()}")
# DEBUG: pick a specific keyword. Format is (kw, matchtype)
# TEST_KWS = [('new townhouses for sale near me','Broad')]
# group_keys = list(inputDf.groupby(groupby_cols).groups.keys())
# test_keys = [key for key in group_keys if any((test_kw[0] in key[3]) & (test_kw[1] in key[4]) for test_kw in TEST_KWS)]
# inputDf_reduced = inputDf_reduced.loc[test_keys]
print("inputDf_reduced.shape", inputDf_reduced.shape)
print(inputDf_reduced.info())
print("inputDf example", inputDf.head())
## Intermediate calculations
# - first_traffic_date: first date with traffic in report
# - effective_creation_date: Creation Date is date created in Marin, so adjust it if there is traffic prior to that
# Group inputDf by Account, Campaign, Group, Keyword, and Match Type
grouped = inputDf_reduced.groupby(groupby_cols)
# Iterate through each group to determine the earliest date with Impressions
# Add this date to a new column called first_traffic_date
def get_first_traffic_date(group):
group_with_impressions = group[group[RPT_COL_IMPR] > 0]
if not group_with_impressions.empty:
return group_with_impressions[RPT_COL_DATE].min()
else:
return None
# Apply the function to each group to get first_traffic_date
first_traffic_date_series = grouped.apply(get_first_traffic_date)
# Custom mean function to ignore zeros in Hist QS
def mean_without_zeros(series):
non_zero_values = series[series != 0]
if len(non_zero_values) == 0:
return 0 # or return None if you prefer
return non_zero_values.mean()
agg_spec = {
RPT_COL_PUBLISHER: 'last',
RPT_COL_IMPR: 'sum',
RPT_COL_CONV: 'sum',
RPT_COL_HIST_QS: mean_without_zeros, # Hist QS is zero on days without impressions, ignore them when calc mean
RPT_COL_QUALITY_SCORE: 'last',
RPT_COL_CREATION_DATE: 'last'
}
# with first_traffic_date calculated, remove Date column by aggregating
df_keywords = grouped.agg(agg_spec)
# Map the first_traffic_date to the original DataFrame using the same groupby columns
df_keywords['first_traffic_date'] = df_keywords.index.map(first_traffic_date_series)
# Compare first_traffic_date and RPT_COL_CREATION_DATE, take the earlier date, and add to new column effective_creation_date
df_keywords['effective_creation_date'] = df_keywords.apply(
lambda row: min(row['first_traffic_date'], row[RPT_COL_CREATION_DATE]).date()
if pd.notnull(row['first_traffic_date']) else row[RPT_COL_CREATION_DATE].date(),
axis=1
)
## Find low-performing keywords to pause
# - at least 30-days old
# - no Conversions
# - and one of below:
# - Hist QS between 0 and 5, non-inclusive
# - Hist QS is 0, but Quality Score is not zero (to exclude publishers that don't support QS)
# - both Hist QS and Quality Score are zero, but only for Google
thirty_days_ago = pd.Timestamp.today(CLIENT_TIMEZONE).date() - pd.Timedelta(days=30)
print(f"30 days ago: {thirty_days_ago}")
at_least_30_days_old = df_keywords['effective_creation_date'] < thirty_days_ago
no_conversions = df_keywords[RPT_COL_CONV] == 0
low_hist_qs = (df_keywords[RPT_COL_HIST_QS] > 0) & (df_keywords[RPT_COL_HIST_QS] < 5)
zero_hist_qs = (df_keywords[RPT_COL_HIST_QS] == 0) & (df_keywords[RPT_COL_QUALITY_SCORE] > 0)
zero_hist_qs_and_qs_google = (df_keywords[RPT_COL_HIST_QS] == 0) & (df_keywords[RPT_COL_QUALITY_SCORE] == 0) & (df_keywords[RPT_COL_PUBLISHER] == VAL_PUBLISHER_GOOGLE)
# apply business rules
df_pause = df_keywords.loc[
at_least_30_days_old &
no_conversions &
(low_hist_qs | zero_hist_qs | zero_hist_qs_and_qs_google)
]
## Prepare output
debugDf = df_keywords.reset_index()
output_cols = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_GROUP, RPT_COL_KEYWORD, RPT_COL_MATCH_TYPE]
outputDf = df_pause.reset_index()[output_cols].copy()
# outputDf[BULK_COL_STATUS] = VAL_STATUS_PAUSED
info_str = f"Paused by Scripts on {pd.Timestamp.today(CLIENT_TIMEZONE).date()} due to low QS"
outputDf[BULK_COL_AUTOMATION_INFO] = info_str
print("outputDf.shape", outputDf.shape)
print("outputDf example", outputDf.head())
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
else:
print("outputDf", outputDf.head(5))
Post generated on 2024-11-27 06:58:46 GMT