Script 839: Duplicate MKWID replacement
Purpose:
The Python script identifies keywords with duplicate MKWID values and prepares them for reupload with blank custom parameters.
To Elaborate
The Python script is designed to address the issue of duplicate MKWID values within a dataset of keywords. MKWID values are custom parameters associated with keywords in advertising campaigns, and duplicates can lead to inefficiencies or errors in tracking and reporting. The script processes a dataset to locate these duplicates, ensuring that only the oldest entry for each MKWID value is retained while others are marked for modification. The duplicates are then prepared for reupload with their custom parameters set to blank, effectively removing the duplicate MKWID values and allowing for a clean slate in the tracking system. This process helps maintain the integrity and accuracy of keyword tracking in advertising campaigns.
Walking Through the Code
- Data Preparation
- The script begins by converting the ‘Creation Date’ column in the input DataFrame to a datetime format, which is essential for chronological sorting.
- The DataFrame is then sorted by ‘Custom Parameters’ and ‘Creation Date’ in ascending order, ensuring that the oldest entry for each MKWID value appears first.
- Identifying Duplicates
- Using the
duplicated
method, the script identifies duplicate entries based on the ‘Custom Parameters’ column, retaining only the first occurrence (oldest entry) and marking others as duplicates.
- Using the
- Output Preparation
- A new DataFrame is created to store the output, with columns for account, campaign, group, keyword, match type, and custom parameters.
- For the identified duplicates, relevant columns are copied to the output DataFrame, and the ‘Custom Parameters’ are set to blank, preparing them for reupload.
- Final Output
- The script concludes by printing the output DataFrame, which contains keywords with blank custom parameters, ready for reupload to rectify the duplicate MKWID issue.
Vitals
- Script ID : 839
- Client ID / Customer ID: 58118572 / 5572
- Action Type: Bulk Upload
- Item Changed: Keyword
- Output Columns: Account, Campaign, Group, Keyword, Match Type, Custom Parameters
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeff Sands (jsands@marinsoftware.com)
- Created by Jeff Sands on 2024-03-22 19:32
- Last Updated by Jeff Sands on 2024-03-22 19:40
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
##
## name: Duplicate MKWID replacement
## description: Locate keywords with duplicate MKWID values and reupload with blanks.
##
##
## author: Jeff Sands
## created: 2024-03-22
##
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_STATUS = 'Status'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
RPT_COL_CREATION_DATE = 'Creation Date'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_KEYWORD = 'Keyword'
BULK_COL_MATCH_TYPE = 'Match Type'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
outputDf[BULK_COL_CUSTOM_PARAMETERS] = "<<YOUR VALUE>>"
# user code start here
print(tableize(inputDf))
# Convert 'Creation Date' to datetime for comparison
inputDf[RPT_COL_CREATION_DATE] = pd.to_datetime(inputDf[RPT_COL_CREATION_DATE])
# Sort the DataFrame by 'Custom Parameters' and 'Creation Date' in ascending order
# to ensure the oldest entry for each custom parameter value is first
inputDf = inputDf.sort_values(by=[RPT_COL_CUSTOM_PARAMETERS, RPT_COL_CREATION_DATE], ascending=[True, True])
# Using `duplicated` to select duplicates based on 'Custom Parameters', keeping the oldest entry
# 'keep' is set to 'first' to retain the first occurrence (oldest due to sorting) and mark others as duplicates
duplicates = inputDf[inputDf.duplicated(subset=[RPT_COL_CUSTOM_PARAMETERS], keep='first')]
# Create a new DataFrame for output
outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_GROUP, BULK_COL_KEYWORD, BULK_COL_MATCH_TYPE, BULK_COL_CUSTOM_PARAMETERS])
# Copy relevant columns and set 'Custom Parameters' to blank for the selected duplicates
outputDf[BULK_COL_ACCOUNT] = duplicates[RPT_COL_ACCOUNT]
outputDf[BULK_COL_CAMPAIGN] = duplicates[RPT_COL_CAMPAIGN]
outputDf[BULK_COL_GROUP] = duplicates[RPT_COL_GROUP]
outputDf[BULK_COL_KEYWORD] = duplicates[RPT_COL_KEYWORD]
outputDf[BULK_COL_MATCH_TYPE] = duplicates[RPT_COL_MATCH_TYPE]
outputDf[BULK_COL_CUSTOM_PARAMETERS] = ''
# Now outputDf contains only the keywords with blank custom parameters
print(outputDf)
Post generated on 2025-03-11 01:25:51 GMT