Script 699: Dimension from Campaign Name
Purpose:
The Python script extracts pacing dates and goals from campaign names in a dataset to enhance structured budget allocation (SBA) processes.
To Elaborate
The script is designed to parse campaign names from a dataset to extract specific information related to pacing dates and goals. This is particularly useful for structured budget allocation (SBA) processes in marketing and advertising campaigns. The script identifies patterns within campaign names to determine start and end dates, as well as the campaign goal, which can be one of several predefined types such as CPM or MS. By automating the extraction of these details, the script helps streamline the management and analysis of campaign data, ensuring that campaigns are correctly categorized and budgeted according to their pacing and objectives.
Walking Through the Code
- Date Conversion Function:
- The script includes a function
convert_date
that attempts to convert date strings into date objects, handling both four-digit and two-digit year formats. This ensures flexibility in date parsing from campaign names.
- The script includes a function
- Information Extraction Function:
- The
extract_info_from_campaign_name_enhanced
function uses regular expressions to identify date patterns and goal types within campaign names. It extracts start and end dates and determines the campaign goal based on specific keywords.
- The
- Data Preparation:
- The script renames columns in the input DataFrame to match expected names and drops unnecessary columns. It also cleans column names by stripping spaces.
- Filtering and Processing:
- Rows with blank goals are filtered, and campaign names are converted to strings to prevent errors. New columns for pacing dates and goals are added to the filtered DataFrame.
- Row Processing:
- Each row in the filtered DataFrame is processed to extract pacing dates and goals using the extraction function. Valid extracted information is added to the DataFrame.
- Output Preparation:
- The script filters the DataFrame to include only rows with extracted goals and selects specific columns for the output. The final DataFrame is printed to display the extracted information.
Vitals
- Script ID : 699
- Client ID / Customer ID: 1306927183 / 60270139
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Pacing - End Date, Pacing - Start Date, Goal
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
- Created by ascott@marinsoftware.com on 2024-02-13 19:49
- Last Updated by Jesus Garza on 2024-07-09 00:40
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
## name: Dimension Tags from Campaign Name
## description: Extracts pacing dates and goal from campaign name
## author:
## created: 2023-12-04
## 7/1 Updated version with CPM and MS for Social Campaigns
## 6/27 Updated version w/o 'Target (Impr/Spend/Views)'
# Column Definitions
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
RPT_COL_GOAL = 'Goal'
# Function to convert date with enhanced logic to handle two-digit years
def convert_date(date_str):
if date_str is None:
return None
try:
# Try to parse date with four-digit year
return datetime.datetime.strptime(date_str, '%m/%d/%Y').date()
except ValueError:
try:
# Try to parse date with two-digit year
return datetime.datetime.strptime(date_str, '%m/%d/%y').date()
except ValueError:
return None
# Function to extract information from the campaign name with enhanced logic
def extract_info_from_campaign_name_enhanced(campaign_name):
# Updated date pattern to handle various delimiters and formats
date_pattern = r'\(?(\d{1,2}/\d{1,2}/\d{2,4})[-\s_]+(\d{1,2}/\d{1,2}/\d{2,4})\)?'
goal_pattern = r'(Search|RGD|SGD|SBD|CPM|MS)'
date_match = re.search(date_pattern, campaign_name)
goal_match = re.search(goal_pattern, campaign_name)
start_date_str, end_date_str = (date_match.groups() if date_match else (None, None))
goal_segment = goal_match.group(1) if goal_match else None
if 'CPM' in campaign_name:
goal = 'CPM'
elif 'MS' in campaign_name:
goal = 'MS'
else:
goal = 'MS' if goal_segment and 'Search' in goal_segment else 'CPM' if goal_segment in ['RGD', 'SGD', 'SBD'] else ''
start_date = convert_date(start_date_str)
end_date = convert_date(end_date_str)
return start_date, end_date, goal
# Rename columns to match the script's expectations
inputDf.columns = ['Campaign', 'Account', 'Pacing - Start Date', 'Pacing - End Date', 'Goal', 'Unnamed', 'Target (Impr/Spend/Views)']
# Drop the 'Unnamed' column
inputDf.drop(columns=['Unnamed'], inplace=True)
# Clean any leading/trailing spaces in the column names
inputDf.columns = inputDf.columns.str.strip()
# Filter for rows where the Goal column is blank and create a copy to avoid SettingWithCopyWarning
inputDf_filtered = inputDf[inputDf[RPT_COL_GOAL].isna()].copy()
# Ensure that the campaign names are treated as strings to avoid TypeError
inputDf_filtered[RPT_COL_CAMPAIGN] = inputDf_filtered[RPT_COL_CAMPAIGN].astype(str)
# Adding columns for extracted information to inputDf_filtered
inputDf_filtered[RPT_COL_PACING_START_DATE] = np.nan
inputDf_filtered[RPT_COL_PACING_END_DATE] = np.nan
inputDf_filtered[RPT_COL_GOAL] = np.nan
# Process each row in inputDf_filtered to extract information
for index, row in inputDf_filtered.iterrows():
start_date, end_date, goal = extract_info_from_campaign_name_enhanced(row[RPT_COL_CAMPAIGN])
if start_date or end_date or goal: # Include rows with any valid extracted information
inputDf_filtered.loc[index, RPT_COL_PACING_START_DATE] = start_date
inputDf_filtered.loc[index, RPT_COL_PACING_END_DATE] = end_date
inputDf_filtered.loc[index, RPT_COL_GOAL] = goal
# Filter inputDf_filtered for rows with extracted information
filteredDf = inputDf_filtered.dropna(subset=[RPT_COL_GOAL])
# Define the columns to be included in the output DataFrame
cols = [
RPT_COL_CAMPAIGN,
RPT_COL_ACCOUNT,
RPT_COL_PACING_START_DATE,
RPT_COL_PACING_END_DATE,
RPT_COL_GOAL
]
# Create output DataFrame with the selected columns
outputDf = filteredDf[cols].copy()
# Print the output DataFrame to check the extracted information
print("Output DataFrame with extracted information:")
print(outputDf)
Post generated on 2025-03-11 01:25:51 GMT