Script 551: Duplication Keywords Report
Purpose
The Python script identifies duplicate keywords within the same publisher and account, recommends which keywords to pause based on performance metrics, and sends an email report.
To Elaborate
The script is designed to manage and optimize keyword usage in advertising campaigns by identifying duplicate keywords within the same publisher and account. It focuses on active keywords, ignoring case sensitivity and placeholder keywords. The script evaluates the performance of duplicate keywords based on criteria such as campaign category, publisher cost, and other performance metrics. It recommends keeping the keyword with the best performance active and suggests pausing the others. Additionally, it identifies campaigns that only contain losing keywords and recommends renaming them. The script ultimately aims to streamline keyword management and improve campaign efficiency by ensuring that only the most effective keywords are active.
Walking Through the Code
- Initialization and Setup:
- The script begins by determining whether it is running on a server or locally. If running locally, it loads data from a specified pickle file.
- It imports necessary libraries such as
pandas
,numpy
, and others for data manipulation and analysis.
- Data Preparation:
- The script sets up the primary data source and defines constants for various columns used in the analysis.
- It filters out placeholder keywords and inactive keywords to focus on active and relevant data.
- Duplicate Keyword Identification:
- The script constructs a unique key for each keyword based on publisher, account, keyword text, match type, and status.
- It calculates the count, total spend, and highest bid for each group of duplicate keywords.
- Performance Evaluation and Recommendation:
- Duplicate keywords are sorted based on performance metrics, prioritizing those in ‘New’ campaign categories.
- The script identifies the best-performing keyword in each duplicate group and recommends it to remain active, while others are marked for pausing.
- Campaign Analysis:
- It identifies campaigns that only contain losing keywords and recommends renaming them to indicate their status.
- The script ensures that campaigns with winning keywords are not renamed or paused.
- Output and Testing:
- The script includes unit tests to verify the functionality of the keyword processing logic.
- It outputs the final recommendations and, if running locally, writes the results to CSV files for further analysis.
Vitals
- Script ID : 551
- Client ID / Customer ID: 1306922797 / 60269073
- Action Type: Email Report
- Item Changed: None
- Output Columns:
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Michael Huang (mhuang@marinsoftware.com)
- Created by Michael Huang on 2023-12-03 23:13
- Last Updated by Grégory Pantaine on 2024-03-11 16:20
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
##
## name: Duplication Keyword Alert
## description:
## Identify duplicate keywords and alert via email
## * duplicated within same Publisher and Account
## * effective status = Active
## * ignore keyword case
## * ignore placeholder keywords
## * recommend winning keyword based on: 'New' campaign category, pub cost, etc
## * recommend to change name of campaigns with only losing keywords
##
## author: Michael S. Huang
## created: 2023-12-03
## updated: 2025-02-06
##
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/yotel_keyword_20240206_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
# print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
inputDf = dataSourceDict["1"]
outputDf = inputDf.copy()
# setup timezone
import datetime
# Chicago Timezone is GMT-5. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))
# import pandas
import pandas as pd
import numpy as np
# other imports
import re
import urllib
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
# pandas settings
pd.set_option('display.max_columns', None) # Display all columns
pd.set_option('display.max_colwidth', None) # Display full content of each column
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_STATUS = 'Status'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_CLASSIFICATION = 'Campaign Classification'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_GROUP = 'Group'
RPT_COL_GROUP_STATUS = 'Group Status'
RPT_COL_SEARCH_BID = 'Search Bid'
RPT_COL_PUB_COST = 'Pub. Cost £'
RPT_COL_CONV_RATE = 'Conv. Rate %'
RPT_COL_CTR = 'CTR %'
RPT_COL_AVG_CPC = 'Avg. CPC £'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_HIST_QS = 'Hist. QS'
# user code start here
COL_KEYWORD_UNIQUE_KEY = 'kw_uniq_key'
COL_DUPE_COUNT = 'dupe_count'
COL_DUPE_SPEND = 'dupe_spend'
COL_DUPE_HIGH_BID = 'dupe_high_bid'
COL_RECOMMENDED_STATUS = 'Dedupe Recommended Status'
COL_RECOMMENDED_SEARCH_BID = 'Dedupe Recommended Search Bid'
COL_RECOMMENDED_CAMPAIGN_NAME = 'Dedupe Recommended Campaign Name'
COL_RECOMMENDED_CAMPAIGN_STATUS = 'Dedupe Recommended Campaign Status'
VAL_PAUSED = 'Paused'
VAL_ACTIVE = 'Active'
# main function that takes inputDf and returns (outputDf, debugDf)
def process(inputDf):
print(">> inputDf.shape", inputDf.shape)
print(">> inputDf.info", inputDf.info())
# print(tableize(inputDf.head()))
# COL_RECOMMENDED_CAMPAIGN_NAME was set wrongly before, so clear and recalc
inputDf[COL_RECOMMENDED_CAMPAIGN_NAME] = ''
# Also recalc Rec Campaign Status
inputDf[COL_RECOMMENDED_CAMPAIGN_STATUS] = ''
### Change name of single-keyword campaigns
# campaigns_with_single_keyword = inputDf.groupby(RPT_COL_CAMPAIGN).filter(lambda x: len(x) == 1)
# print(f"There are {campaigns_with_single_keyword.shape[0]} campaigns with only a single keyword:")
# print(campaigns_with_single_keyword[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_CAMPAIGN_ID, RPT_COL_KEYWORD]].head())
# inputDf.loc[campaigns_with_single_keyword.index, COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + inputDf.loc[campaigns_with_single_keyword.index, RPT_COL_CAMPAIGN]
# inputDf[COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + inputDf[RPT_COL_CAMPAIGN]
### Cleanup: exclude placholder keywords and paused keywords
# derbysoft accounts only contain placeholder keywords like `Google/GB`
derbysoft_accounts = inputDf[RPT_COL_ACCOUNT].str.contains('Derbysoft', case=False, regex=False)
# other accounts may contain placeholders like `(not set)/GB`
notset_keywords = inputDf[RPT_COL_KEYWORD].str.contains('not set', regex=False)
# both are considered placeholders
placeholder_keywords = derbysoft_accounts | notset_keywords
# get count of placeholder keywords by account
placeholder_keyword_counts = inputDf.loc[placeholder_keywords, [RPT_COL_ACCOUNT]].value_counts()
print(">> placeholder keyword distribution")
print(placeholder_keyword_counts.head(50))
### Cleanup: exclude inactive keywords
active_keywords = inputDf[RPT_COL_STATUS] == VAL_ACTIVE
active_groups = inputDf[RPT_COL_GROUP_STATUS] == VAL_ACTIVE
active_campaigns = inputDf[RPT_COL_CAMPAIGN_STATUS] == VAL_ACTIVE
inactive_keywords = ~(active_keywords & active_groups & active_campaigns)
# actually exclude them
excluded_keywords = placeholder_keywords | inactive_keywords
print(">> total excluded keywords: ", excluded_keywords.sum())
df_working = inputDf.loc[~excluded_keywords].copy()
print(">> after cleanup. df_working.shape", df_working.shape)
### Find Duplicate Keywords within Account
keys = df_working[RPT_COL_PUBLISHER] + '__' + df_working[RPT_COL_ACCOUNT] + '__' + df_working[RPT_COL_KEYWORD].str.lower() + '__' + df_working[RPT_COL_MATCH_TYPE] + '__' + df_working[RPT_COL_STATUS]
df_working.loc[:, COL_KEYWORD_UNIQUE_KEY] = keys
# calc count for each dupe group
df_working[COL_DUPE_COUNT] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[COL_KEYWORD_UNIQUE_KEY].transform('size')
# calc total spend for each dupe group
df_working[COL_DUPE_SPEND] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_PUB_COST].transform('sum').round(2)
# find highest bid for each dupe group
df_working[COL_DUPE_HIGH_BID] = df_working.groupby(COL_KEYWORD_UNIQUE_KEY)[RPT_COL_SEARCH_BID].transform('max').round(2)
# Find dupes and sort them according to performance
df_working = df_working.loc[df_working[COL_DUPE_COUNT] > 1] \
.sort_values(by=[ \
RPT_COL_CAMPAIGN_CLASSIFICATION, \
COL_DUPE_SPEND, \
COL_KEYWORD_UNIQUE_KEY, \
RPT_COL_PUB_COST, \
RPT_COL_HIST_QS, \
RPT_COL_CONV_RATE, \
RPT_COL_IMPR_SHARE], \
ascending=[False, False, False, False, False, False, False])
print(">> keep dupes only. df_working.shape", df_working.shape)
# dupe winners have best performance
df_winners = df_working.copy().drop_duplicates(subset=COL_KEYWORD_UNIQUE_KEY)
df_winners[COL_RECOMMENDED_STATUS] = VAL_ACTIVE
df_winners[COL_RECOMMENDED_SEARCH_BID] = df_winners[COL_DUPE_HIGH_BID]
print(">> df_winners.shape", df_winners.shape)
# join back to df_working, and mark the remaining dupes as Paused
outputDf = df_working.merge(df_winners[[COL_RECOMMENDED_STATUS, COL_RECOMMENDED_SEARCH_BID]], left_index=True, right_index=True, how='left').copy()
outputDf[COL_RECOMMENDED_STATUS].fillna(VAL_PAUSED, inplace=True)
# Identify campaigns that only have losing keywords
losing_keywords = outputDf[COL_RECOMMENDED_STATUS] == VAL_PAUSED
losing_campaigns = outputDf[losing_keywords].groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
all_campaigns = outputDf.groupby(RPT_COL_CAMPAIGN_ID)[COL_RECOMMENDED_STATUS].count().reset_index(name='Count')
only_losing_campaigns = losing_campaigns.merge(all_campaigns, on=RPT_COL_CAMPAIGN_ID, suffixes=('_losing', '_all'))
only_losing_campaigns = only_losing_campaigns[only_losing_campaigns['Count_losing'] == only_losing_campaigns['Count_all']]
only_losing_campaign_ids = only_losing_campaigns['Campaign ID']
# Recommend a name change for campaigns with only losing keywords
only_losing_campaign_index = outputDf[RPT_COL_CAMPAIGN_ID].isin(only_losing_campaign_ids)
outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_NAME] = 'ZZ__' + outputDf.loc[only_losing_campaign_index, RPT_COL_CAMPAIGN]
outputDf.loc[only_losing_campaign_index, COL_RECOMMENDED_CAMPAIGN_STATUS] = VAL_PAUSED
outputDf = outputDf.sort_values(by=[COL_KEYWORD_UNIQUE_KEY, RPT_COL_CAMPAIGN_CLASSIFICATION], ascending=[True, False])
return (outputDf, df_working)
# unit test
def test_process_general_case():
print ("### Running Test: test_process_general_case ###")
# from io import StringIO
test_data_string = '''
[
{
"Keyword": "yotel hotel miami",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 21993144,
"Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
"Campaign Status": "Paused",
"Campaign Classification": "Legacy",
"Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
"Group Status": "Active",
"Search Bid": 1.16,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact"
},
{
"Keyword": "YOTEL HOTEL MIAMI",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22506165,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Campaign Status": "Active",
"Campaign Classification": "New",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Group Status": "Active",
"Search Bid": 0.48,
"Pub. Cost \\u00a3": 3.66,
"Conv. Rate %": 0.0,
"CTR %": 0.1408,
"Avg. CPC \\u00a3": 0.37,
"Impr. share %": 0.5038,
"Hist. QS": 8.0,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
},
{
"Keyword": "yotel hotel miami",
"Status": "Deleted",
"Match Type": "Broad",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22506165,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Campaign Status": "Active",
"Campaign Classification": "New",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Group Status": "Active",
"Search Bid": 0.0,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
},
{
"Keyword": "yotel hotel miami",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22506165,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Campaign Status": "Active",
"Campaign Classification": "New",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Group Status": "Active",
"Search Bid": 0.54,
"Pub. Cost \\u00a3": 1.11,
"Conv. Rate %": 0.0,
"CTR %": 0.0952,
"Avg. CPC \\u00a3": 0.28,
"Impr. share %": 0.3824,
"Hist. QS": 8.12,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
},
{
"Keyword": "yotel hotel miami",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22551730,
"Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact Landing Page Test",
"Campaign Status": "Active",
"Campaign Classification": "Legacy",
"Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
"Group Status": "Active",
"Search Bid": 0.5,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact Landing Page Test"
},
{
"Keyword": "yotel hotel miami",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22557952,
"Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2",
"Campaign Status": "Active",
"Campaign Classification": "Legacy",
"Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
"Group Status": "Active",
"Search Bid": 2.25,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2"
},
{
"Keyword": "yotel hotel miami",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22707648,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Campaign Status": "Active",
"Campaign Classification": "Legacy",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Group Status": "Active",
"Search Bid": 0.1,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
},
{
"Keyword": "yotel hotel miami",
"Status": "Deleted",
"Match Type": "Broad",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22707648,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Campaign Status": "Active",
"Campaign Classification": "Legacy",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Group Status": "Active",
"Search Bid": 0.0,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
}
]
'''
# string_io = StringIO(test_data_string)
# inputDf = pd.read_json(string_io)
inputDf = pd.read_json(test_data_string)
(outputDf, debugDf) = process(inputDf)
outputDf = outputDf.fillna('')
print(">>OutputDf\n", outputDf)
uniq_rec_campaign_name_count = outputDf \
.groupby(by=[RPT_COL_CAMPAIGN_ID])[COL_RECOMMENDED_CAMPAIGN_NAME] \
.nunique() \
.reset_index(name='Count')
more_than_one_rec_name = uniq_rec_campaign_name_count.loc[uniq_rec_campaign_name_count['Count'] > 1]
assert more_than_one_rec_name.empty, "Campaign has more than one Rec Name"
# 2nd keyword wins
winner = outputDf.loc[1]
# print(">>winner\n", winner)
assert winner[COL_RECOMMENDED_STATUS] == VAL_ACTIVE, "Winning keyword Rec Status not Active"
assert winner[COL_RECOMMENDED_SEARCH_BID] == 2.25, "Winning keyword not receive high bid from cohort"
assert winner[COL_RECOMMENDED_CAMPAIGN_NAME] == '', "Winning keyword should not have Rec Campaign Name"
assert winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == '', "Winning keyword should not have Rec Campaign Status"
# all but the 2nd keyword are losers
losers = outputDf.loc[outputDf.index != 1]
assert all(losers[COL_RECOMMENDED_STATUS] == VAL_PAUSED), "Losing keyword Rec Status not Paused"
assert all(losers[COL_RECOMMENDED_SEARCH_BID] == ''), "Losing keyword Rec Bid not empty"
# 2nd, 3rd (filtered out) and 4th keywords belong to the same campaign; campaign has winner
campaigns_with_winner = outputDf.loc[[1, 3]]
assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_NAME] == ''), "Campaign with winner should not have Rec Campaign Name"
assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == ''), "Campaign with winner should not have Rec Campaign Status"
# all other campaigns have 100% losing keywords
campaigns_without_winner = outputDf.loc[(outputDf.index != 1) & (outputDf.index != 2) & (outputDf.index != 3)]
assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_NAME].str.contains('ZZ__')), "Campaigns with no winners should have Rec Campaign Name starting with ZZ__"
assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == VAL_PAUSED), "Campaigns with no winners should have Rec Campaign Status as Paused"
print ("### Test PASSED ###")
# unit test
def test_process_campaign_category():
print ("### Running Test: test_process_campaign_category ###")
# from io import StringIO
test_data_string = '''
[
{
"Keyword": "YOTEL HOTEL MIAMI",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22506165,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Campaign Status": "Active",
"Campaign Classification": "Legacy",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23",
"Group Status": "Active",
"Search Bid": 0.48,
"Pub. Cost \\u00a3": 3.66,
"Conv. Rate %": 0.0,
"CTR %": 0.1408,
"Avg. CPC \\u00a3": 0.37,
"Impr. share %": 0.5038,
"Hist. QS": 8.0,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_YOTEL HOTEL MIAMI_12.01.23"
},
{
"Keyword": "yotel hotel miami",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22707648,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Campaign Status": "Active",
"Campaign Classification": "New",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Group Status": "Active",
"Search Bid": 0.1,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
},
{
"Keyword": "yotel hotel miami",
"Status": "Deleted",
"Match Type": "Broad",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22707648,
"Campaign": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Campaign Status": "Active",
"Campaign Classification": "New",
"Group": "YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23",
"Group Status": "Active",
"Search Bid": 0.0,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOTEL_MIA_SRC_GOOGLE_ENG_BRAND_GLOBAL_yotel hotel miami_12.04.23"
},
{
"Keyword": "yotel hotel miami",
"Status": "Active",
"Match Type": "Exact",
"Publisher": "Google",
"Account": "YOTEL Miami",
"Campaign ID": 22557952,
"Campaign": "YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2",
"Campaign Status": "Active",
"Campaign Classification": "Legacy",
"Group": "YOMIAH_SRC_EN_Google_Brand_USA_Exact",
"Group Status": "Active",
"Search Bid": 2.25,
"Pub. Cost \\u00a3": 0.0,
"Conv. Rate %": 0.0,
"CTR %": 0.0,
"Avg. CPC \\u00a3": 0.0,
"Impr. share %": 0.0,
"Hist. QS": 0.0,
"Dedupe Recommended Campaign Name": "ZZ__YOMIAH_SRC_EN_Google_Brand_USA_Exact Combined LP Test 2"
}
]
'''
# string_io = StringIO(test_data_string)
# inputDf = pd.read_json(string_io)
inputDf = pd.read_json(test_data_string)
(outputDf, debugDf) = process(inputDf)
outputDf = outputDf.fillna('')
print(">>OutputDf\n", outputDf)
uniq_rec_campaign_name_count = outputDf \
.groupby(by=[RPT_COL_CAMPAIGN_ID])[COL_RECOMMENDED_CAMPAIGN_NAME] \
.nunique() \
.reset_index(name='Count')
more_than_one_rec_name = uniq_rec_campaign_name_count.loc[uniq_rec_campaign_name_count['Count'] > 1]
assert more_than_one_rec_name.empty, "Campaign has more than one Rec Name"
# 2nd keyword wins
winner = outputDf.loc[1]
# print(">>winner\n", winner)
assert winner[COL_RECOMMENDED_STATUS] == VAL_ACTIVE, "Winning keyword Rec Status not Active"
assert winner[COL_RECOMMENDED_SEARCH_BID] == 2.25, "Winning keyword not receive high bid from cohort"
assert winner[COL_RECOMMENDED_CAMPAIGN_NAME] == '', "Winning keyword should not have Rec Campaign Name"
assert winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == '', "Winning keyword should not have Rec Campaign Status"
# all but the 2nd keyword are losers
losers = outputDf.loc[outputDf.index != 1]
assert all(losers[COL_RECOMMENDED_STATUS] == VAL_PAUSED), "Losing keyword Rec Status not Paused"
assert all(losers[COL_RECOMMENDED_SEARCH_BID] == ''), "Losing keyword Rec Bid not empty"
# 2nd, 3rd (filtered out) belong to the same campaign; campaign has winner
campaigns_with_winner = outputDf.loc[[1]]
assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_NAME] == ''), "Campaign with winner should not have Rec Campaign Name"
assert all(campaigns_with_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == ''), "Campaign with winner should not have Rec Campaign Status"
# all other campaigns have 100% losing keywords
campaigns_without_winner = outputDf.loc[(outputDf.index != 1)]
assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_NAME].str.contains('ZZ__')), "Campaigns with no winners should have Rec Campaign Name starting with ZZ__"
assert all(campaigns_without_winner[COL_RECOMMENDED_CAMPAIGN_STATUS] == VAL_PAUSED), "Campaigns with no winners should have Rec Campaign Status as Paused"
print ("### Test PASSED ###")
# run unit test
test_process_general_case()
test_process_campaign_category()
# actually process data
(outputDf, debugDf) = process(inputDf)
print("outputDf.shape", outputDf.shape)
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
else:
print("outputDf", outputDf.head(5))
Post generated on 2024-11-27 06:58:46 GMT