Script 839: Duplicate MKWID replacement

Purpose

The script identifies keywords with duplicate MKWID values and prepares them for re-upload with blank custom parameters.

To Elaborate

The Python script addresses the issue of duplicate MKWID values within a dataset of keywords. MKWID, a custom parameter, is used to track and manage keywords in advertising campaigns. Duplicate MKWID values can lead to data inconsistencies and tracking errors. The script identifies these duplicates and prepares a new dataset where the custom parameters for these duplicates are set to blank, allowing for a clean re-upload and correction of the dataset. This ensures that each keyword entry is unique in terms of its custom parameters, maintaining data integrity and accuracy in keyword management.

Walking Through the Code

  1. Data Preparation:
    • The script begins by converting the ‘Creation Date’ column in the input DataFrame to a datetime format. This conversion is crucial for accurate sorting and comparison of dates.
  2. Sorting:
    • The DataFrame is sorted by ‘Custom Parameters’ and ‘Creation Date’ in ascending order. This ensures that the oldest entry for each custom parameter value appears first, which is important for identifying duplicates correctly.
  3. Identifying Duplicates:
    • The script uses the duplicated method to find duplicate entries based on the ‘Custom Parameters’ column. By keeping the first occurrence (the oldest due to prior sorting), it marks subsequent entries as duplicates.
  4. Preparing Output:
    • A new DataFrame, outputDf, is created to store the results. It includes columns for account, campaign, group, keyword, match type, and custom parameters.
    • For each duplicate identified, the script copies relevant data into outputDf and sets the ‘Custom Parameters’ column to blank, preparing the data for re-upload.
  5. Final Output:
    • The script concludes by printing the outputDf, which contains the keywords with blank custom parameters, ready for further processing or re-upload.

Vitals

  • Script ID : 839
  • Client ID / Customer ID: 58118572 / 5572
  • Action Type: Bulk Upload
  • Item Changed: Keyword
  • Output Columns: Account, Campaign, Group, Keyword, Match Type, Custom Parameters
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jeff Sands (jsands@marinsoftware.com)
  • Created by Jeff Sands on 2024-03-22 19:32
  • Last Updated by Jeff Sands on 2024-03-22 19:40
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
##
## name: Duplicate MKWID replacement
## description: Locate keywords with duplicate MKWID values and reupload with blanks.
##  
## 
## author: Jeff Sands
## created: 2024-03-22
## 

MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_STATUS = 'Status'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
RPT_COL_CREATION_DATE = 'Creation Date'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_KEYWORD = 'Keyword'
BULK_COL_MATCH_TYPE = 'Match Type'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
outputDf[BULK_COL_CUSTOM_PARAMETERS] = "<<YOUR VALUE>>"

# user code start here
print(tableize(inputDf))

# Convert 'Creation Date' to datetime for comparison
inputDf[RPT_COL_CREATION_DATE] = pd.to_datetime(inputDf[RPT_COL_CREATION_DATE])

# Sort the DataFrame by 'Custom Parameters' and 'Creation Date' in ascending order
# to ensure the oldest entry for each custom parameter value is first
inputDf = inputDf.sort_values(by=[RPT_COL_CUSTOM_PARAMETERS, RPT_COL_CREATION_DATE], ascending=[True, True])

# Using `duplicated` to select duplicates based on 'Custom Parameters', keeping the oldest entry
# 'keep' is set to 'first' to retain the first occurrence (oldest due to sorting) and mark others as duplicates
duplicates = inputDf[inputDf.duplicated(subset=[RPT_COL_CUSTOM_PARAMETERS], keep='first')]

# Create a new DataFrame for output
outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_GROUP, BULK_COL_KEYWORD, BULK_COL_MATCH_TYPE, BULK_COL_CUSTOM_PARAMETERS])

# Copy relevant columns and set 'Custom Parameters' to blank for the selected duplicates
outputDf[BULK_COL_ACCOUNT] = duplicates[RPT_COL_ACCOUNT]
outputDf[BULK_COL_CAMPAIGN] = duplicates[RPT_COL_CAMPAIGN]
outputDf[BULK_COL_GROUP] = duplicates[RPT_COL_GROUP]
outputDf[BULK_COL_KEYWORD] = duplicates[RPT_COL_KEYWORD]
outputDf[BULK_COL_MATCH_TYPE] = duplicates[RPT_COL_MATCH_TYPE]
outputDf[BULK_COL_CUSTOM_PARAMETERS] = ''

# Now outputDf contains only the keywords with blank custom parameters
print(outputDf)

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus