Script 1355: Dimension Update Brand
Purpose
Automates the process of populating the brand dimension in a dataset based on specific rules applied to campaign name values.
To Elaborate
The Python script is designed to automate the task of populating the ‘Brand’ dimension in a dataset by analyzing the ‘Campaign’ and ‘Campaign Type’ columns. It applies a set of predefined rules to determine the appropriate brand classification for each campaign. The script processes only those rows where the ‘Brand’ column is initially empty, ensuring that existing data is not overwritten. The rules are based on specific keywords found in the campaign names and types, which categorize campaigns into ‘Brand’, ‘Non-Brand’, ‘N/A’, or other categories. This automation helps streamline the data preparation process, ensuring consistency and accuracy in brand classification across large datasets.
Walking Through the Code
- Data Preparation
- The script begins by defining the primary data source and relevant columns, such as ‘Campaign’, ‘Account’, and ‘Brand’.
- It filters the input data to exclude rows where the ‘Brand’ column is already populated, focusing only on rows that need brand determination.
- Brand Determination Function
- A function
determine_brand
is defined to classify campaigns based on their names and types. - The function converts campaign names and types to lowercase to ensure case-insensitive matching.
- It checks for specific keywords in the campaign name and type to assign a brand category, such as ‘Non-Brand’, ‘Brand’, ‘N/A’, or ‘Other’.
- A function
- Applying the Function
- The script applies the
determine_brand
function to each row of the filtered dataset using theapply
method. - The results are stored in the ‘Brand’ column of the output dataframe.
- The script applies the
- Output
- Finally, the script prints the first few rows of the updated dataframe to display the results of the brand classification process.
Vitals
- Script ID : 1355
- Client ID / Customer ID: 1306913420 / 60268008
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Brand
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Kyle Perkins (kyle.perkins@genesys.com)
- Created by Kyle Perkins on 2024-08-28 20:14
- Last Updated by Kent Pearce on 2024-09-26 16:41
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
##
## name: Dimension Update - Brand
## description:
## Automates populating brand dimension based on campaign name values.
##
## author:
## created: 2024-08-28
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_BRAND = 'Brand' # Reference to the brand column
RPT_COL_CAMPAIGN_TYPE = 'Campaign Type' # Reference to the campaign type column
# Output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_BRAND = 'Brand' # Output brand column
# Define the function to determine the brand based on the provided rules
def determine_brand(campaign_name, campaign_type):
# Ensure campaign_name and campaign_type are strings
if not isinstance(campaign_name, str):
campaign_name = str(campaign_name)
if not isinstance(campaign_type, str):
campaign_type = str(campaign_type)
campaign_name = campaign_name.lower() # Convert to lowercase to ignore case
campaign_type = campaign_type.lower() # Convert campaign type to lowercase
# Check if the campaign type is one of the specified types where Brand should be "N/A"
if campaign_type in ["display", "audience", "video", "discovery", "performance max"]:
return "N/A"
if "ivr" in campaign_name:
return "Non-Brand"
elif "compet" in campaign_name:
return "Non-Brand"
elif "nonbrand" in campaign_name:
return "Non-Brand"
elif "dsa" in campaign_name:
return "N/A"
elif "na_pt" in campaign_name:
return "Non-Brand"
elif "chat" in campaign_name:
return "Non-Brand"
elif "callcenter" in campaign_name:
return "Non-Brand"
elif "rlsa-brand" in campaign_name or "rlsa_brand" in campaign_name:
return "Brand"
elif "rlsa-nonbrand" in campaign_name or "rlsa_nonbrand" in campaign_name:
return "Non-Brand"
elif "brand" in campaign_name:
return "Brand"
elif "rlsa-competitor" in campaign_name or "rlsa_competitor" in campaign_name:
return "Non-Brand"
elif "generic" in campaign_name:
return "Non-Brand"
elif "feature" in campaign_name:
return "Non-Brand"
elif "callmanagement" in campaign_name:
return "Non-Brand"
elif "contact" in campaign_name:
return "Non-Brand"
elif "dialer" in campaign_name:
return "Non-Brand"
elif "pointillist" in campaign_name:
return "Non-Brand"
elif "customerjourney" in campaign_name:
return "Non-Brand"
elif "rlsa" in campaign_name:
return "RLSA (Both)"
elif "customerservice" in campaign_name:
return "Non-Brand"
elif "wfm" in campaign_name or "wem" in campaign_name:
return "Non-Brand"
elif "affinity" in campaign_name or "customintent" in campaign_name or "inmarket" in campaign_name:
return "N/A"
elif "video" in campaign_name or "youtube" in campaign_name or "videoremarketing" in campaign_name or "remarketing" in campaign_name or "retargeting" in campaign_name:
return "N/A"
elif "max" in campaign_name:
return "N/A"
elif "cx" in campaign_name:
return "Non-Brand"
else:
return "Other"
# Filter inputDf to skip rows where Brand is already populated regardless of campaign type
filteredDf = inputDf[inputDf[RPT_COL_BRAND].isna()].copy()
# Initialize outputDf so that it only includes rows that have been filtered
outputDf = filteredDf
# Apply the function to each row in the filtered dataframe and set result as the output dataframe value
outputDf[BULK_COL_BRAND] = filteredDf.apply(
lambda row: determine_brand(row[RPT_COL_CAMPAIGN], row[RPT_COL_CAMPAIGN_TYPE]), axis=1
)
# Printing the result
print(tableize(outputDf.head()))
Post generated on 2024-11-27 06:58:46 GMT