Script 1031: Tagging social dimension Audience and Segment

Purpose:

The Python script processes campaign data to identify and tag audience and segment information based on predefined keywords.

To Elaborate

The script is designed to analyze campaign data and extract specific audience and segment information from campaign names. It uses predefined lists of keywords to identify relevant audience and segment tags within each campaign name. The extracted information is then used to update the corresponding columns in the dataset. This process helps in categorizing and organizing campaign data based on audience and segment characteristics, which can be crucial for targeted marketing strategies and performance analysis. The script also ensures data integrity by removing any rows that lack the necessary audience or segment information.

Walking Through the Code

  1. Define Configurable Parameters:
    • The script begins by setting up lists of keywords for audiences and segments. These lists (AUDIENCE_KEYWORDS and SEGMENT_KEYWORDS) are user-changeable parameters that determine what keywords the script will look for in campaign names.
  2. Function Definition:
    • A function extract_audience_and_segment is defined to process each campaign name. It converts the campaign name to lowercase and checks for the presence of audience and segment keywords. The function returns the matched keywords as strings.
  3. Data Preparation:
    • The script creates a copy of the input DataFrame (inputDf) to outputDf to preserve the original data while making modifications.
  4. Iterate Through Data:
    • The script loops through each row of the input DataFrame. For each campaign, it extracts the audience and segment information using the previously defined function.
  5. Update DataFrame:
    • The extracted audience and segment information is used to update the corresponding columns in the outputDf.
  6. Data Cleaning:
    • The script removes any rows from outputDf that have missing values in the audience or segment columns to ensure the dataset is complete and accurate.
  7. Output:
    • Finally, the script checks if the outputDf is empty and prints the DataFrame if it contains data, ensuring that only complete and processed data is presented.

Vitals

  • Script ID : 1031
  • Client ID / Customer ID: 1306927457 / 60270313
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Audience, Segment
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Autumn Archibald (aarchibald@marinsoftware.com)
  • Created by Autumn Archibald on 2024-04-30 00:09
  • Last Updated by Autumn Archibald on 2024-04-30 00:13
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Define the configurable parameters for the script
AUDIENCE_KEYWORDS = ['pricing', 'downloads', 'awv']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_AUDIENCE = 'Audience'
BULK_COL_SEGMENT = 'Segment'

# Function to extract audience and segment from campaign name
def extract_audience_and_segment(campaign_name):
    campaign_lower = campaign_name.lower()
    audience_list = [keyword for keyword in AUDIENCE_KEYWORDS if keyword in campaign_lower]
    segment_list = re.findall(r'\b(?:' + '|'.join(SEGMENT_KEYWORDS) + r')\b', campaign_lower)
    return ', '.join(audience_list), ', '.join(segment_list)

# Copy input rows to output
outputDf = inputDf.copy()

# Loop through all rows
for index, row in inputDf.iterrows():
    campaign_name = row[RPT_COL_CAMPAIGN]

    # Extract audience and segment from campaign name
    audience, segment = extract_audience_and_segment(campaign_name)
    print("Campaign [%s] => Audience [%s], Segment [%s]" % (campaign_name, audience, segment))

    # Update columns
    outputDf.at[index, BULK_COL_AUDIENCE] = audience
    outputDf.at[index, BULK_COL_SEGMENT] = segment

# Drop any rows with missing values
outputDf = outputDf.dropna(subset=[BULK_COL_AUDIENCE, BULK_COL_SEGMENT])

if not outputDf.empty:
    print("outputDf", tableize(outputDf))
else:
    print("Empty outputDf")

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus