Script 1039: Jeremy Script
Purpose
The script processes campaign data to extract and categorize geographical, segment, targeting, and platform information from campaign names.
To Elaborate
The Python script is designed to analyze campaign data by extracting specific dimensions such as geographical location, segment, targeting, and platform from the campaign names. It processes each campaign name to identify and categorize these dimensions based on predefined keywords. The extracted information is then organized into a structured format, allowing for better analysis and reporting. The script ensures that only complete data entries are retained by removing any rows with missing values in the key columns. This process aids in the structured budget allocation (SBA) by providing a clear breakdown of campaign attributes, which can be crucial for marketing analysis and decision-making.
Walking Through the Code
-
Define Keywords: The script begins by defining lists of keywords for geographical locations, segments, targeting, and platforms. These keywords are used to identify and categorize parts of the campaign names.
-
Extract Dimensions Function: A function named
extract_dimensions
is defined to process each campaign name. It converts the campaign name to lowercase and searches for keywords in the name to identify geographical, segment, targeting, and platform dimensions. The function returns these dimensions as strings. -
Prepare Output DataFrame: The script creates a copy of the input DataFrame, selecting only the campaign and account columns to form the basis of the output DataFrame.
-
Process Each Campaign: The script iterates over each row in the input DataFrame, extracting dimensions using the
extract_dimensions
function. It updates the output DataFrame with the extracted information for each campaign. -
Clean Data: After processing all rows, the script removes any rows from the output DataFrame that have missing values in the key columns, ensuring that only complete data is retained.
-
Output Results: Finally, the script checks if the output DataFrame is empty. If not, it prints the DataFrame; otherwise, it indicates that the DataFrame is empty.
Vitals
- Script ID : 1039
- Client ID / Customer ID: 1306927457 / 60270313
- Action Type: Bulk Upload (Preview)
- Item Changed: Campaign
- Output Columns: Account, Campaign, Geo, Platform, Segment, Targeting
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeremy Brown (jbrown@marinsoftware.com)
- Created by Jeremy Brown on 2024-04-30 16:43
- Last Updated by Jeremy Brown on 2024-04-30 16:59
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
TARGETING_KEYWORDS = ['rm', 'pr']
PLATFORM_KEYWORDS = ['g', 'b', 'li', 't', 'fb', 'm']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'
BULK_COL_SEGMENT = 'Segment'
BULK_COL_TARGETING = 'Targeting'
BULK_COL_PLATFORM = 'Platform'
# Function to extract geo, segment, targeting, and platform from campaign name
def extract_dimensions(campaign_name):
campaign_lower = campaign_name.lower()
geo_list = [keyword for keyword in GEO_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
# Sort segment keywords by length (longest to shortest)
sorted_segment_keywords = sorted(SEGMENT_KEYWORDS, key=len, reverse=True)
segment = next((keyword for keyword in sorted_segment_keywords if re.search(r'\b{}\b'.format(keyword), campaign_lower)), '')
targeting_list = [keyword for keyword in TARGETING_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
platform_list = [keyword for keyword in PLATFORM_KEYWORDS if re.search(r'\b{}\Z'.format(keyword), campaign_lower)]
# Join lists to form comma-separated strings
geo_str = ', '.join(geo_list)
targeting_str = ', '.join(targeting_list)
platform_str = ', '.join(platform_list)
return geo_str, segment, targeting_str, platform_str
# Copy input rows to output DataFrame
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT]].copy()
# Loop through all rows in the input DataFrame
for index, row in inputDf.iterrows():
campaign_name = row[RPT_COL_CAMPAIGN]
# Extract geo, segment, targeting, and platform from campaign name
geo, segment, targeting, platform = extract_dimensions(campaign_name)
# Update columns in the output DataFrame
outputDf.at[index, BULK_COL_GEO] = geo
outputDf.at[index, BULK_COL_SEGMENT] = segment
outputDf.at[index, BULK_COL_TARGETING] = targeting
outputDf.at[index, BULK_COL_PLATFORM] = platform
outputDf.at[index, BULK_COL_ACCOUNT] = row[RPT_COL_ACCOUNT]
# Drop any rows with missing values in geo, segment, targeting, platform, or account columns
outputDf = outputDf.dropna(subset=[BULK_COL_GEO, BULK_COL_SEGMENT, BULK_COL_TARGETING, BULK_COL_PLATFORM, BULK_COL_ACCOUNT])
if not outputDf.empty:
print("outputDf:\n", outputDf.to_string(index=False))
else:
print("Empty outputDf")
Post generated on 2024-11-27 06:58:46 GMT