Script 1039: Jeremy Script
Purpose:
The Python script processes campaign data to extract and categorize specific dimensions such as geo, segment, targeting, and platform from campaign names.
To Elaborate
The script is designed to analyze campaign data by extracting specific dimensions from campaign names. It focuses on identifying geographical regions, market segments, targeting strategies, and platforms based on predefined keywords. The extracted information is then organized into a structured format, allowing for better analysis and reporting. The script ensures that only complete data entries are retained by removing any rows with missing values in the key dimensions. This process aids in the structured budget allocation (SBA) by categorizing campaigns according to their attributes, which can be crucial for marketing analysis and decision-making.
Walking Through the Code
-
Keyword Definitions: The script begins by defining lists of keywords for geographical regions, segments, targeting strategies, and platforms. These keywords are used to identify and categorize parts of the campaign names.
-
Function Definition: A function
extract_dimensions
is defined to process each campaign name. It converts the campaign name to lowercase and searches for matches with the predefined keywords. The function returns the identified geo, segment, targeting, and platform as comma-separated strings. -
DataFrame Preparation: The script copies relevant columns from the input DataFrame to a new output DataFrame, which will store the processed data.
-
Data Processing Loop: The script iterates over each row in the input DataFrame, extracting dimensions using the
extract_dimensions
function. It updates the output DataFrame with the extracted information. -
Data Cleaning: After processing, the script removes any rows from the output DataFrame that have missing values in the key columns, ensuring only complete data entries are retained.
-
Output: Finally, the script checks if the output DataFrame is empty and prints the results if it contains data.
Vitals
- Script ID : 1039
- Client ID / Customer ID: 1306927457 / 60270313
- Action Type: Bulk Upload (Preview)
- Item Changed: Campaign
- Output Columns: Account, Campaign, Geo, Platform, Segment, Targeting
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeremy Brown (jbrown@marinsoftware.com)
- Created by Jeremy Brown on 2024-04-30 16:43
- Last Updated by Jeremy Brown on 2024-04-30 16:59
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
TARGETING_KEYWORDS = ['rm', 'pr']
PLATFORM_KEYWORDS = ['g', 'b', 'li', 't', 'fb', 'm']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'
BULK_COL_SEGMENT = 'Segment'
BULK_COL_TARGETING = 'Targeting'
BULK_COL_PLATFORM = 'Platform'
# Function to extract geo, segment, targeting, and platform from campaign name
def extract_dimensions(campaign_name):
campaign_lower = campaign_name.lower()
geo_list = [keyword for keyword in GEO_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
# Sort segment keywords by length (longest to shortest)
sorted_segment_keywords = sorted(SEGMENT_KEYWORDS, key=len, reverse=True)
segment = next((keyword for keyword in sorted_segment_keywords if re.search(r'\b{}\b'.format(keyword), campaign_lower)), '')
targeting_list = [keyword for keyword in TARGETING_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
platform_list = [keyword for keyword in PLATFORM_KEYWORDS if re.search(r'\b{}\Z'.format(keyword), campaign_lower)]
# Join lists to form comma-separated strings
geo_str = ', '.join(geo_list)
targeting_str = ', '.join(targeting_list)
platform_str = ', '.join(platform_list)
return geo_str, segment, targeting_str, platform_str
# Copy input rows to output DataFrame
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT]].copy()
# Loop through all rows in the input DataFrame
for index, row in inputDf.iterrows():
campaign_name = row[RPT_COL_CAMPAIGN]
# Extract geo, segment, targeting, and platform from campaign name
geo, segment, targeting, platform = extract_dimensions(campaign_name)
# Update columns in the output DataFrame
outputDf.at[index, BULK_COL_GEO] = geo
outputDf.at[index, BULK_COL_SEGMENT] = segment
outputDf.at[index, BULK_COL_TARGETING] = targeting
outputDf.at[index, BULK_COL_PLATFORM] = platform
outputDf.at[index, BULK_COL_ACCOUNT] = row[RPT_COL_ACCOUNT]
# Drop any rows with missing values in geo, segment, targeting, platform, or account columns
outputDf = outputDf.dropna(subset=[BULK_COL_GEO, BULK_COL_SEGMENT, BULK_COL_TARGETING, BULK_COL_PLATFORM, BULK_COL_ACCOUNT])
if not outputDf.empty:
print("outputDf:\n", outputDf.to_string(index=False))
else:
print("Empty outputDf")
Post generated on 2025-03-11 01:25:51 GMT