Script 1025: Tagging dimensions geo segment targeting and platform

Purpose:

The Python script extracts and categorizes geographical, segment, targeting, and platform information from campaign names in a dataset.

To Elaborate

The Python script is designed to process a dataset containing campaign names and account information, extracting specific dimensions such as geographical location, segment type, targeting strategy, and platform used. These dimensions are identified based on predefined keywords and are extracted from the campaign names. The script then organizes this information into a structured format, ensuring that each campaign is tagged with the appropriate dimensions. This process helps in categorizing and analyzing marketing campaigns based on their characteristics, which can be crucial for reporting and strategic decision-making.

Walking Through the Code

Define Keywords: The script begins by defining lists of keywords for geographical locations, segments, targeting strategies, and platforms. These keywords are used to identify and extract relevant information from campaign names.
Extract Dimensions Function: A function named extract_dimensions is defined to process each campaign name. It converts the campaign name to lowercase and searches for keywords using regular expressions. The function:
- Identifies geographical keywords and stores them in a list.
- Sorts segment keywords by length to ensure the longest match is found first.
- Searches for targeting and platform keywords, storing them in respective lists.
- Returns the extracted dimensions as comma-separated strings.
Prepare DataFrame: The script assumes the existence of an input DataFrame inputDf containing campaign names and account information. It creates a copy of relevant columns to an output DataFrame outputDf.
Process Each Campaign: The script iterates over each row in the input DataFrame, using the extract_dimensions function to extract dimensions from the campaign name. It updates the output DataFrame with these extracted values.
Clean and Output Data: After processing all rows, the script removes any rows in the output DataFrame that have missing values in the key columns. It then prints the resulting DataFrame if it is not empty, providing a structured view of the extracted dimensions.

Vitals

Script ID : 1025
Client ID / Customer ID: 1306927457 / 60270313
Action Type: Bulk Upload
Item Changed: Campaign
Output Columns: Account, Campaign, Targeting, Segment, Platform, Geo
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Autumn Archibald (aarchibald@marinsoftware.com)
Created by Autumn Archibald on 2024-04-29 20:42
Last Updated by Autumn Archibald on 2024-04-30 17:48

> See it in Action

Python Code

GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
TARGETING_KEYWORDS = ['rm', 'pr']
PLATFORM_KEYWORDS = ['g', 'b', 'li', 't', 'fb', 'm']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'
BULK_COL_SEGMENT = 'Segment'
BULK_COL_TARGETING = 'Targeting'
BULK_COL_PLATFORM = 'Platform'

# Function to extract geo, segment, targeting, and platform from campaign name
def extract_dimensions(campaign_name):
    campaign_lower = campaign_name.lower()
    geo_list = [keyword for keyword in GEO_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
    
    # Sort segment keywords by length (longest to shortest)
    sorted_segment_keywords = sorted(SEGMENT_KEYWORDS, key=len, reverse=True)
    segment = next((keyword for keyword in sorted_segment_keywords if re.search(r'\b{}\b'.format(keyword), campaign_lower)), '')

    targeting_list = [keyword for keyword in TARGETING_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
    platform_list = [keyword for keyword in PLATFORM_KEYWORDS if re.search(r'\b{}\Z'.format(keyword), campaign_lower)]
    
    # Join lists to form comma-separated strings
    geo_str = ', '.join(geo_list)
    targeting_str = ', '.join(targeting_list)
    platform_str = ', '.join(platform_list)
    
    return geo_str, segment, targeting_str, platform_str

# Assuming input DataFrame 'inputDf' contains campaign names and account information
# Create a DataFrame with sample data (replace this with your actual input data)
# inputDf = ...

# Copy input rows to output DataFrame
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT]].copy()

# Loop through all rows in the input DataFrame
for index, row in inputDf.iterrows():
    campaign_name = row[RPT_COL_CAMPAIGN]

    # Extract geo, segment, targeting, and platform from campaign name
    geo, segment, targeting, platform = extract_dimensions(campaign_name)

    # Update columns in the output DataFrame
    outputDf.at[index, BULK_COL_GEO] = geo
    outputDf.at[index, BULK_COL_SEGMENT] = segment
    outputDf.at[index, BULK_COL_TARGETING] = targeting
    outputDf.at[index, BULK_COL_PLATFORM] = platform
    outputDf.at[index, BULK_COL_ACCOUNT] = row[RPT_COL_ACCOUNT]

# Drop any rows with missing values in geo, segment, targeting, platform, or account columns
outputDf = outputDf.dropna(subset=[BULK_COL_GEO, BULK_COL_SEGMENT, BULK_COL_TARGETING, BULK_COL_PLATFORM, BULK_COL_ACCOUNT])

if not outputDf.empty:
    print("outputDf:\n", outputDf.to_string(index=False))
else:
    print("Empty outputDf")

Post generated on 2025-03-11 01:25:51 GMT

29 Apr 2024

« Script 1023: Total Pub Cost & Pub Cost Percentage Column Script 1027: Tagging Social dimensions targeting and platform »

MarinOne Scripts Creator's Corner