Script 839: Duplicate MKWID replacement
Purpose
Locate keywords with duplicate MKWID values and reupload with blanks.
To Elaborate
The Python script aims to identify keywords with duplicate MKWID values and replace them with blank values. MKWID is a unique identifier for each keyword. The script helps ensure that there are no duplicate MKWID values in the dataset, which can cause issues in data analysis and reporting.
Walking Through the Code
- The script defines a dictionary
MATCH_TYPE
that maps different match types. - The script retrieves the current date and time using the
datetime.now()
function and assigns it to the variabletoday
. - The script accesses a primary data source and assigns it to the variable
inputDf
. - The script defines constants for column names used in the input and output dataframes.
- The script initializes the output dataframe
outputDf
with the desired columns and sets the value of theBULK_COL_CUSTOM_PARAMETERS
column to a placeholder value. - The script prints the
inputDf
dataframe in a tabular format using thetableize()
function. - The script converts the ‘Creation Date’ column in the
inputDf
dataframe to datetime format for comparison purposes. - The script sorts the
inputDf
dataframe by ‘Custom Parameters’ and ‘Creation Date’ in ascending order to ensure the oldest entry for each custom parameter value is first. - The script identifies duplicates in the
inputDf
dataframe based on the ‘Custom Parameters’ column using theduplicated()
function and assigns them to theduplicates
dataframe. - The script creates a new dataframe
outputDf
with the desired columns. - The script copies relevant columns from the
duplicates
dataframe to theoutputDf
dataframe and sets the ‘Custom Parameters’ column to blank for the selected duplicates. - The
outputDf
dataframe now contains only the keywords with blank custom parameters. - The script prints the
outputDf
dataframe.
Vitals
- Script ID : 839
- Client ID / Customer ID: 58118572 / 5572
- Action Type: Bulk Upload
- Item Changed: Keyword
- Output Columns: Account, Campaign, Group, Keyword, Match Type, Custom Parameters
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeff Sands (jsands@marinsoftware.com)
- Created by Jeff Sands on 2024-03-22 19:32
- Last Updated by Jeff Sands on 2024-03-22 19:40
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
##
## name: Duplicate MKWID replacement
## description: Locate keywords with duplicate MKWID values and reupload with blanks.
##
##
## author: Jeff Sands
## created: 2024-03-22
##
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_STATUS = 'Status'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
RPT_COL_CREATION_DATE = 'Creation Date'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_KEYWORD = 'Keyword'
BULK_COL_MATCH_TYPE = 'Match Type'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
outputDf[BULK_COL_CUSTOM_PARAMETERS] = "<<YOUR VALUE>>"
# user code start here
print(tableize(inputDf))
# Convert 'Creation Date' to datetime for comparison
inputDf[RPT_COL_CREATION_DATE] = pd.to_datetime(inputDf[RPT_COL_CREATION_DATE])
# Sort the DataFrame by 'Custom Parameters' and 'Creation Date' in ascending order
# to ensure the oldest entry for each custom parameter value is first
inputDf = inputDf.sort_values(by=[RPT_COL_CUSTOM_PARAMETERS, RPT_COL_CREATION_DATE], ascending=[True, True])
# Using `duplicated` to select duplicates based on 'Custom Parameters', keeping the oldest entry
# 'keep' is set to 'first' to retain the first occurrence (oldest due to sorting) and mark others as duplicates
duplicates = inputDf[inputDf.duplicated(subset=[RPT_COL_CUSTOM_PARAMETERS], keep='first')]
# Create a new DataFrame for output
outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_GROUP, BULK_COL_KEYWORD, BULK_COL_MATCH_TYPE, BULK_COL_CUSTOM_PARAMETERS])
# Copy relevant columns and set 'Custom Parameters' to blank for the selected duplicates
outputDf[BULK_COL_ACCOUNT] = duplicates[RPT_COL_ACCOUNT]
outputDf[BULK_COL_CAMPAIGN] = duplicates[RPT_COL_CAMPAIGN]
outputDf[BULK_COL_GROUP] = duplicates[RPT_COL_GROUP]
outputDf[BULK_COL_KEYWORD] = duplicates[RPT_COL_KEYWORD]
outputDf[BULK_COL_MATCH_TYPE] = duplicates[RPT_COL_MATCH_TYPE]
outputDf[BULK_COL_CUSTOM_PARAMETERS] = ''
# Now outputDf contains only the keywords with blank custom parameters
print(outputDf)
Post generated on 2024-05-15 07:44:05 GMT