Script 681: Dimension Value From Campaign Names

Purpose:

The Python script extracts pacing dates and goals from campaign names in a dataset to populate missing information.

To Elaborate

The script is designed to process a dataset containing campaign information, specifically focusing on extracting pacing start and end dates, as well as campaign goals, from the campaign names. This is particularly useful when such details are not explicitly provided in separate columns. The script uses regular expressions to identify date patterns and goal keywords within the campaign names. It then converts these extracted strings into appropriate date formats and assigns goals based on predefined patterns. The script ensures that only rows with missing goal information are processed, and it updates these rows with the extracted data. This helps in maintaining a structured and complete dataset for further analysis or reporting.

Walking Through the Code

Date Conversion Function:
- A function convert_date is defined to handle date strings, attempting to parse them with both four-digit and two-digit year formats. This ensures flexibility in handling various date formats.
Information Extraction Function:
- The extract_info_from_campaign_name_enhanced function uses regular expressions to extract date ranges and goal types from campaign names. It identifies patterns for dates and specific goal keywords like ‘CPM’ and ‘MS’.
Data Preparation:
- The script renames columns to match expected names and drops any unnecessary columns. It also cleans column names by stripping any leading or trailing spaces.
Filtering and Processing:
- The dataset is filtered to include only rows where the ‘Goal’ column is blank. This subset is then processed to extract and assign pacing dates and goals using the previously defined function.
Updating DataFrame:
- For each row in the filtered dataset, the script updates the pacing start date, end date, and goal based on the extracted information. Only rows with valid extracted data are retained.
Output Preparation:
- The script creates a new DataFrame containing only the relevant columns with extracted information and prints this DataFrame for verification.

Vitals

Script ID : 681
Client ID / Customer ID: 1306927189 / 60270139
Action Type: Bulk Upload
Item Changed: Campaign
Output Columns: Account, Campaign, Goal, Pacing - End Date, Pacing - Start Date
Linked Datasource: M1 Report
Reference Datasource: None
Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
Created by ascott@marinsoftware.com on 2024-02-08 18:44
Last Updated by Jesus Garza on 2024-07-23 21:41

> See it in Action

Python Code

## name: Dimension Tags from Campaign Name
## description: Extracts pacing dates and goal from campaign name
## author: 
## created: 2023-12-04
## 7/1 Updated version with CPM and MS for Social Campaigns 
## 6/27 Updated version w/o 'Target (Impr/Spend/Views)'

# Column Definitions
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
RPT_COL_GOAL = 'Goal'

# Function to convert date with enhanced logic to handle two-digit years
def convert_date(date_str):
    if date_str is None:
        return None
    try:
        # Try to parse date with four-digit year
        return datetime.datetime.strptime(date_str, '%m/%d/%Y').date()
    except ValueError:
        try:
            # Try to parse date with two-digit year
            return datetime.datetime.strptime(date_str, '%m/%d/%y').date()
        except ValueError:
            return None

# Function to extract information from the campaign name with enhanced logic
def extract_info_from_campaign_name_enhanced(campaign_name):
    # Updated date pattern to handle various delimiters and formats
    date_pattern = r'\(?(\d{1,2}/\d{1,2}/\d{2,4})[-\s_]+(\d{1,2}/\d{1,2}/\d{2,4})\)?'
    goal_pattern = r'\b(Search|RGD|SGD|SBD|CPM|MS)\b'
    
    date_match = re.search(date_pattern, campaign_name)
    goal_match = re.search(goal_pattern, campaign_name)
    
    start_date_str, end_date_str = (date_match.groups() if date_match else (None, None))
    goal_segment = goal_match.group(1) if goal_match else None
    
    if goal_segment == 'CPM':
        goal = 'CPM'
    elif goal_segment == 'MS':
        goal = 'MS'
    else:
        goal = 'CPM' if goal_segment in ['RGD', 'SGD', 'SBD'] else ''
    
    start_date = convert_date(start_date_str)
    end_date = convert_date(end_date_str)
    
    return start_date, end_date, goal

# Rename columns to match the script's expectations
inputDf.columns = ['Campaign', 'Account', 'Pacing - Start Date', 'Pacing - End Date', 'Goal', 'Unnamed', 'Target (Impr/Spend/Views)']

# Drop the 'Unnamed' column
inputDf.drop(columns=['Unnamed'], inplace=True)

# Clean any leading/trailing spaces in the column names
inputDf.columns = inputDf.columns.str.strip()

# Filter for rows where the Goal column is blank and create a copy to avoid SettingWithCopyWarning
inputDf_filtered = inputDf[inputDf[RPT_COL_GOAL].isna()].copy()

# Ensure that the campaign names are treated as strings to avoid TypeError
inputDf_filtered[RPT_COL_CAMPAIGN] = inputDf_filtered[RPT_COL_CAMPAIGN].astype(str)

# Adding columns for extracted information to inputDf_filtered
inputDf_filtered[RPT_COL_PACING_START_DATE] = np.nan
inputDf_filtered[RPT_COL_PACING_END_DATE] = np.nan
inputDf_filtered[RPT_COL_GOAL] = np.nan

# Process each row in inputDf_filtered to extract information
for index, row in inputDf_filtered.iterrows():
    start_date, end_date, goal = extract_info_from_campaign_name_enhanced(row[RPT_COL_CAMPAIGN])
    if start_date or end_date or goal:  # Include rows with any valid extracted information
        inputDf_filtered.loc[index, RPT_COL_PACING_START_DATE] = start_date
        inputDf_filtered.loc[index, RPT_COL_PACING_END_DATE] = end_date
        inputDf_filtered.loc[index, RPT_COL_GOAL] = goal

# Filter inputDf_filtered for rows with extracted information
filteredDf = inputDf_filtered.dropna(subset=[RPT_COL_GOAL])

# Define the columns to be included in the output DataFrame
cols = [
    RPT_COL_CAMPAIGN,
    RPT_COL_ACCOUNT,
    RPT_COL_PACING_START_DATE,
    RPT_COL_PACING_END_DATE,
    RPT_COL_GOAL
]

# Create output DataFrame with the selected columns
outputDf = filteredDf[cols].copy()

# Print the output DataFrame to check the extracted information
print("Output DataFrame with extracted information:")
print(outputDf)

Post generated on 2025-03-11 01:25:51 GMT

08 Feb 2024

« Script 677: Budget Pacing Example Script 683: Dimension Value From Campaign Name »

MarinOne Scripts Creator's Corner