Script 1031: Tagging social dimension Audience and Segment
Purpose
The Python script processes campaign data to extract and tag audience and segment information based on predefined keywords.
To Elaborate
The Python script is designed to analyze campaign data and identify specific audience and segment tags from campaign names. It uses predefined lists of keywords to determine which audience and segment each campaign belongs to. The script processes each campaign name, extracts relevant keywords, and updates the dataset with this information. This helps in categorizing campaigns for better analysis and reporting. The script also ensures data integrity by removing any rows with missing audience or segment information.
Walking Through the Code
- Define Configurable Parameters:
- The script begins by defining lists of keywords for audiences (
AUDIENCE_KEYWORDS
) and segments (SEGMENT_KEYWORDS
). These lists are user-changeable parameters that determine how campaigns are categorized.
- The script begins by defining lists of keywords for audiences (
- Extract Audience and Segment:
- A function
extract_audience_and_segment
is defined to process each campaign name. It converts the name to lowercase and checks for the presence of audience and segment keywords. The function returns the matched keywords as strings.
- A function
- Copy and Process Data:
- The script creates a copy of the input data (
inputDf
) tooutputDf
for processing. It iterates over each row of the input data, extracting the campaign name and using the function to determine the audience and segment.
- The script creates a copy of the input data (
- Update and Clean Data:
- For each campaign, the extracted audience and segment are printed and then updated in the
outputDf
. The script removes any rows fromoutputDf
that have missing values in the audience or segment columns to ensure completeness.
- For each campaign, the extracted audience and segment are printed and then updated in the
- Output Results:
- Finally, the script checks if the
outputDf
is empty and prints the results in a tabular format if data is present.
- Finally, the script checks if the
Vitals
- Script ID : 1031
- Client ID / Customer ID: 1306927457 / 60270313
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Audience, Segment
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Autumn Archibald (aarchibald@marinsoftware.com)
- Created by Autumn Archibald on 2024-04-30 00:09
- Last Updated by Autumn Archibald on 2024-04-30 00:13
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Define the configurable parameters for the script
AUDIENCE_KEYWORDS = ['pricing', 'downloads', 'awv']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_AUDIENCE = 'Audience'
BULK_COL_SEGMENT = 'Segment'
# Function to extract audience and segment from campaign name
def extract_audience_and_segment(campaign_name):
campaign_lower = campaign_name.lower()
audience_list = [keyword for keyword in AUDIENCE_KEYWORDS if keyword in campaign_lower]
segment_list = re.findall(r'\b(?:' + '|'.join(SEGMENT_KEYWORDS) + r')\b', campaign_lower)
return ', '.join(audience_list), ', '.join(segment_list)
# Copy input rows to output
outputDf = inputDf.copy()
# Loop through all rows
for index, row in inputDf.iterrows():
campaign_name = row[RPT_COL_CAMPAIGN]
# Extract audience and segment from campaign name
audience, segment = extract_audience_and_segment(campaign_name)
print("Campaign [%s] => Audience [%s], Segment [%s]" % (campaign_name, audience, segment))
# Update columns
outputDf.at[index, BULK_COL_AUDIENCE] = audience
outputDf.at[index, BULK_COL_SEGMENT] = segment
# Drop any rows with missing values
outputDf = outputDf.dropna(subset=[BULK_COL_AUDIENCE, BULK_COL_SEGMENT])
if not outputDf.empty:
print("outputDf", tableize(outputDf))
else:
print("Empty outputDf")
Post generated on 2024-11-27 06:58:46 GMT