Script 839: Duplicate MKWID replacement

Purpose

Locate keywords with duplicate MKWID values and reupload with blanks.

To Elaborate

The Python script aims to identify keywords with duplicate MKWID values and replace them with blank values. MKWID is a unique identifier for each keyword. The script helps ensure that there are no duplicate MKWID values in the dataset, which can cause issues in data analysis and reporting.

Walking Through the Code

  1. The script defines a dictionary MATCH_TYPE that maps different match types.
  2. The script retrieves the current date and time using the datetime.now() function and assigns it to the variable today.
  3. The script accesses a primary data source and assigns it to the variable inputDf.
  4. The script defines constants for column names used in the input and output dataframes.
  5. The script initializes the output dataframe outputDf with the desired columns and sets the value of the BULK_COL_CUSTOM_PARAMETERS column to a placeholder value.
  6. The script prints the inputDf dataframe in a tabular format using the tableize() function.
  7. The script converts the ‘Creation Date’ column in the inputDf dataframe to datetime format for comparison purposes.
  8. The script sorts the inputDf dataframe by ‘Custom Parameters’ and ‘Creation Date’ in ascending order to ensure the oldest entry for each custom parameter value is first.
  9. The script identifies duplicates in the inputDf dataframe based on the ‘Custom Parameters’ column using the duplicated() function and assigns them to the duplicates dataframe.
  10. The script creates a new dataframe outputDf with the desired columns.
  11. The script copies relevant columns from the duplicates dataframe to the outputDf dataframe and sets the ‘Custom Parameters’ column to blank for the selected duplicates.
  12. The outputDf dataframe now contains only the keywords with blank custom parameters.
  13. The script prints the outputDf dataframe.

Vitals

  • Script ID : 839
  • Client ID / Customer ID: 58118572 / 5572
  • Action Type: Bulk Upload
  • Item Changed: Keyword
  • Output Columns: Account, Campaign, Group, Keyword, Match Type, Custom Parameters
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jeff Sands (jsands@marinsoftware.com)
  • Created by Jeff Sands on 2024-03-22 19:32
  • Last Updated by Jeff Sands on 2024-03-22 19:40
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
##
## name: Duplicate MKWID replacement
## description: Locate keywords with duplicate MKWID values and reupload with blanks.
##  
## 
## author: Jeff Sands
## created: 2024-03-22
## 

MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_STATUS = 'Status'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
RPT_COL_CREATION_DATE = 'Creation Date'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_KEYWORD = 'Keyword'
BULK_COL_MATCH_TYPE = 'Match Type'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
outputDf[BULK_COL_CUSTOM_PARAMETERS] = "<<YOUR VALUE>>"

# user code start here
print(tableize(inputDf))

# Convert 'Creation Date' to datetime for comparison
inputDf[RPT_COL_CREATION_DATE] = pd.to_datetime(inputDf[RPT_COL_CREATION_DATE])

# Sort the DataFrame by 'Custom Parameters' and 'Creation Date' in ascending order
# to ensure the oldest entry for each custom parameter value is first
inputDf = inputDf.sort_values(by=[RPT_COL_CUSTOM_PARAMETERS, RPT_COL_CREATION_DATE], ascending=[True, True])

# Using `duplicated` to select duplicates based on 'Custom Parameters', keeping the oldest entry
# 'keep' is set to 'first' to retain the first occurrence (oldest due to sorting) and mark others as duplicates
duplicates = inputDf[inputDf.duplicated(subset=[RPT_COL_CUSTOM_PARAMETERS], keep='first')]

# Create a new DataFrame for output
outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_GROUP, BULK_COL_KEYWORD, BULK_COL_MATCH_TYPE, BULK_COL_CUSTOM_PARAMETERS])

# Copy relevant columns and set 'Custom Parameters' to blank for the selected duplicates
outputDf[BULK_COL_ACCOUNT] = duplicates[RPT_COL_ACCOUNT]
outputDf[BULK_COL_CAMPAIGN] = duplicates[RPT_COL_CAMPAIGN]
outputDf[BULK_COL_GROUP] = duplicates[RPT_COL_GROUP]
outputDf[BULK_COL_KEYWORD] = duplicates[RPT_COL_KEYWORD]
outputDf[BULK_COL_MATCH_TYPE] = duplicates[RPT_COL_MATCH_TYPE]
outputDf[BULK_COL_CUSTOM_PARAMETERS] = ''

# Now outputDf contains only the keywords with blank custom parameters
print(outputDf)

Post generated on 2024-05-15 07:44:05 GMT

comments powered by Disqus