Script 1039: Jeremy Script
Purpose
Python script to extract dimensions (geo, segment, targeting, and platform) from campaign names and create an output DataFrame.
To Elaborate
The Python script solves the problem of extracting specific dimensions from campaign names and organizing them into a structured output DataFrame. The dimensions include geo, segment, targeting, and platform. The script takes an input DataFrame with campaign names and account information, and creates an output DataFrame with the extracted dimensions and corresponding account information. The script also drops any rows with missing values in the extracted dimensions or account columns.
Walking Through the Code
- Define lists of keywords for geo, segment, targeting, and platform.
- Define column constants for the input and output DataFrames.
- Define a function to extract the dimensions from a campaign name.
- Convert the campaign name to lowercase.
- Use regular expressions to find keywords for geo, segment, targeting, and platform.
- Join the extracted keywords into comma-separated strings.
- Return the extracted dimensions.
- Create an output DataFrame by copying the input DataFrame columns for campaign and account.
- Loop through each row in the input DataFrame.
- Get the campaign name from the current row.
- Call the extract_dimensions function to extract the dimensions from the campaign name.
- Update the corresponding columns in the output DataFrame with the extracted dimensions and account information.
- Drop any rows in the output DataFrame that have missing values in the extracted dimensions or account columns.
- Check if the output DataFrame is empty.
- If not empty, print the output DataFrame.
- If empty, print a message indicating an empty output DataFrame.
Vitals
- Script ID : 1039
- Client ID / Customer ID: 1306927457 / 60270313
- Action Type: Bulk Upload (Preview)
- Item Changed: Campaign
- Output Columns: Account, Campaign, Geo, Platform, Segment, Targeting
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeremy Brown (jbrown@marinsoftware.com)
- Created by Jeremy Brown on 2024-04-30 16:43
- Last Updated by Jeremy Brown on 2024-04-30 16:59
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
TARGETING_KEYWORDS = ['rm', 'pr']
PLATFORM_KEYWORDS = ['g', 'b', 'li', 't', 'fb', 'm']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'
BULK_COL_SEGMENT = 'Segment'
BULK_COL_TARGETING = 'Targeting'
BULK_COL_PLATFORM = 'Platform'
# Function to extract geo, segment, targeting, and platform from campaign name
def extract_dimensions(campaign_name):
campaign_lower = campaign_name.lower()
geo_list = [keyword for keyword in GEO_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
# Sort segment keywords by length (longest to shortest)
sorted_segment_keywords = sorted(SEGMENT_KEYWORDS, key=len, reverse=True)
segment = next((keyword for keyword in sorted_segment_keywords if re.search(r'\b{}\b'.format(keyword), campaign_lower)), '')
targeting_list = [keyword for keyword in TARGETING_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
platform_list = [keyword for keyword in PLATFORM_KEYWORDS if re.search(r'\b{}\Z'.format(keyword), campaign_lower)]
# Join lists to form comma-separated strings
geo_str = ', '.join(geo_list)
targeting_str = ', '.join(targeting_list)
platform_str = ', '.join(platform_list)
return geo_str, segment, targeting_str, platform_str
# Copy input rows to output DataFrame
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT]].copy()
# Loop through all rows in the input DataFrame
for index, row in inputDf.iterrows():
campaign_name = row[RPT_COL_CAMPAIGN]
# Extract geo, segment, targeting, and platform from campaign name
geo, segment, targeting, platform = extract_dimensions(campaign_name)
# Update columns in the output DataFrame
outputDf.at[index, BULK_COL_GEO] = geo
outputDf.at[index, BULK_COL_SEGMENT] = segment
outputDf.at[index, BULK_COL_TARGETING] = targeting
outputDf.at[index, BULK_COL_PLATFORM] = platform
outputDf.at[index, BULK_COL_ACCOUNT] = row[RPT_COL_ACCOUNT]
# Drop any rows with missing values in geo, segment, targeting, platform, or account columns
outputDf = outputDf.dropna(subset=[BULK_COL_GEO, BULK_COL_SEGMENT, BULK_COL_TARGETING, BULK_COL_PLATFORM, BULK_COL_ACCOUNT])
if not outputDf.empty:
print("outputDf:\n", outputDf.to_string(index=False))
else:
print("Empty outputDf")
Post generated on 2024-05-15 07:44:05 GMT