Script 697: Dimensions from Campaign Name

Purpose

The Python script extracts pacing dates and goals from campaign names in a dataset.

To Elaborate

The script is designed to process a dataset containing campaign information, specifically focusing on extracting pacing start and end dates, as well as campaign goals from the campaign names. The campaign names are expected to contain these details in specific formats, and the script uses regular expressions to identify and extract this information. The extracted data is then used to populate corresponding columns in the dataset, ensuring that campaigns with missing goal information are updated. This process is particularly useful for marketing and advertising teams who need to manage and analyze campaign performance based on specific timeframes and objectives.

Walking Through the Code

  1. Function Definitions:
    • The script defines a convert_date function to handle date conversion from strings, accommodating both two-digit and four-digit year formats.
    • The extract_info_from_campaign_name_enhanced function uses regular expressions to extract start and end dates, and campaign goals from the campaign name.
  2. Data Preparation:
    • The script renames columns in the input DataFrame to match expected names and removes unnecessary columns.
    • It filters the DataFrame to focus on rows where the goal is missing, ensuring these rows are processed for information extraction.
  3. Information Extraction:
    • For each campaign with a missing goal, the script extracts start and end dates, and goals using the defined functions.
    • Extracted information is added to the DataFrame, updating the relevant columns.
  4. Output Preparation:
    • The script filters the DataFrame to include only rows with extracted information and selects specific columns for the final output.
    • The resulting DataFrame is printed to verify the extracted information.

Vitals

  • Script ID : 697
  • Client ID / Customer ID: 1306927185 / 60270139
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Goal, Pacing - End Date, Pacing - Start Date
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
  • Created by ascott@marinsoftware.com on 2024-02-13 15:18
  • Last Updated by Jesus Garza on 2024-07-09 00:50
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
## name: Dimension Tags from Campaign Name
## description: Extracts pacing dates and goal from campaign name
## author: 
## created: 2023-12-04
## 7/1 Updated version with CPM and MS for Social Campaigns 
## 6/27 Updated version w/o 'Target (Impr/Spend/Views)'

# Column Definitions
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
RPT_COL_GOAL = 'Goal'

# Function to convert date with enhanced logic to handle two-digit years
def convert_date(date_str):
    if date_str is None:
        return None
    try:
        # Try to parse date with four-digit year
        return datetime.datetime.strptime(date_str, '%m/%d/%Y').date()
    except ValueError:
        try:
            # Try to parse date with two-digit year
            return datetime.datetime.strptime(date_str, '%m/%d/%y').date()
        except ValueError:
            return None

# Function to extract information from the campaign name with enhanced logic
def extract_info_from_campaign_name_enhanced(campaign_name):
    # Updated date pattern to handle various delimiters and formats
    date_pattern = r'\(?(\d{1,2}/\d{1,2}/\d{2,4})[-\s_]+(\d{1,2}/\d{1,2}/\d{2,4})\)?'
    goal_pattern = r'(Search|RGD|SGD|SBD|CPM|MS)'
    
    date_match = re.search(date_pattern, campaign_name)
    goal_match = re.search(goal_pattern, campaign_name)
    
    start_date_str, end_date_str = (date_match.groups() if date_match else (None, None))
    goal_segment = goal_match.group(1) if goal_match else None
    
    if 'CPM' in campaign_name:
        goal = 'CPM'
    elif 'MS' in campaign_name:
        goal = 'MS'
    else:
        goal = 'MS' if goal_segment and 'Search' in goal_segment else 'CPM' if goal_segment in ['RGD', 'SGD', 'SBD'] else ''
    
    start_date = convert_date(start_date_str)
    end_date = convert_date(end_date_str)
    
    return start_date, end_date, goal

# Rename columns to match the script's expectations
inputDf.columns = ['Campaign', 'Account', 'Pacing - Start Date', 'Pacing - End Date', 'Goal', 'Unnamed', 'Target (Impr/Spend/Views)']

# Drop the 'Unnamed' column
inputDf.drop(columns=['Unnamed'], inplace=True)

# Clean any leading/trailing spaces in the column names
inputDf.columns = inputDf.columns.str.strip()

# Filter for rows where the Goal column is blank and create a copy to avoid SettingWithCopyWarning
inputDf_filtered = inputDf[inputDf[RPT_COL_GOAL].isna()].copy()

# Ensure that the campaign names are treated as strings to avoid TypeError
inputDf_filtered[RPT_COL_CAMPAIGN] = inputDf_filtered[RPT_COL_CAMPAIGN].astype(str)

# Adding columns for extracted information to inputDf_filtered
inputDf_filtered[RPT_COL_PACING_START_DATE] = np.nan
inputDf_filtered[RPT_COL_PACING_END_DATE] = np.nan
inputDf_filtered[RPT_COL_GOAL] = np.nan

# Process each row in inputDf_filtered to extract information
for index, row in inputDf_filtered.iterrows():
    start_date, end_date, goal = extract_info_from_campaign_name_enhanced(row[RPT_COL_CAMPAIGN])
    if start_date or end_date or goal:  # Include rows with any valid extracted information
        inputDf_filtered.loc[index, RPT_COL_PACING_START_DATE] = start_date
        inputDf_filtered.loc[index, RPT_COL_PACING_END_DATE] = end_date
        inputDf_filtered.loc[index, RPT_COL_GOAL] = goal

# Filter inputDf_filtered for rows with extracted information
filteredDf = inputDf_filtered.dropna(subset=[RPT_COL_GOAL])

# Define the columns to be included in the output DataFrame
cols = [
    RPT_COL_CAMPAIGN,
    RPT_COL_ACCOUNT,
    RPT_COL_PACING_START_DATE,
    RPT_COL_PACING_END_DATE,
    RPT_COL_GOAL
]

# Create output DataFrame with the selected columns
outputDf = filteredDf[cols].copy()

# Print the output DataFrame to check the extracted information
print("Output DataFrame with extracted information:")
print(outputDf)

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus