Script 1025: Tagging dimensions geo segment targeting and platform
Purpose
Tagging dimensions geo, segment, targeting, and platform.
To Elaborate
The Python script solves the problem of extracting and tagging dimensions (geo, segment, targeting, and platform) from campaign names in a DataFrame. It creates a new DataFrame with the extracted dimensions and drops any rows with missing values in the dimensions or account columns.
Walking Through the Code
- The script defines lists of keywords for each dimension: GEO_KEYWORDS, SEGMENT_KEYWORDS, TARGETING_KEYWORDS, and PLATFORM_KEYWORDS.
- The function extract_dimensions(campaign_name) takes a campaign name as input and extracts the dimensions from it.
- It converts the campaign name to lowercase.
- It uses regular expressions to find matches for each dimension keyword in the campaign name.
- It sorts the segment keywords by length (longest to shortest) and finds the first match in the campaign name.
- It joins the matched keywords for each dimension into comma-separated strings.
- It returns the extracted dimensions as a tuple: geo_str, segment, targeting_str, platform_str.
- The script creates an output DataFrame by copying the columns RPT_COL_CAMPAIGN and RPT_COL_ACCOUNT from the input DataFrame.
- It loops through each row in the input DataFrame using the iterrows() function.
- For each row, it extracts the campaign name and calls the extract_dimensions() function to get the dimensions.
- It updates the corresponding columns in the output DataFrame with the extracted dimensions and the account value from the input DataFrame.
- After the loop, it drops any rows in the output DataFrame that have missing values in the dimensions or account columns.
- If the output DataFrame is not empty, it prints the DataFrame as a string without the index. Otherwise, it prints “Empty outputDf”.
Vitals
- Script ID : 1025
- Client ID / Customer ID: 1306927457 / 60270313
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Targeting, Segment, Platform, Geo
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Autumn Archibald (aarchibald@marinsoftware.com)
- Created by Autumn Archibald on 2024-04-29 20:42
- Last Updated by Autumn Archibald on 2024-04-30 17:48
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
TARGETING_KEYWORDS = ['rm', 'pr']
PLATFORM_KEYWORDS = ['g', 'b', 'li', 't', 'fb', 'm']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'
BULK_COL_SEGMENT = 'Segment'
BULK_COL_TARGETING = 'Targeting'
BULK_COL_PLATFORM = 'Platform'
# Function to extract geo, segment, targeting, and platform from campaign name
def extract_dimensions(campaign_name):
campaign_lower = campaign_name.lower()
geo_list = [keyword for keyword in GEO_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
# Sort segment keywords by length (longest to shortest)
sorted_segment_keywords = sorted(SEGMENT_KEYWORDS, key=len, reverse=True)
segment = next((keyword for keyword in sorted_segment_keywords if re.search(r'\b{}\b'.format(keyword), campaign_lower)), '')
targeting_list = [keyword for keyword in TARGETING_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
platform_list = [keyword for keyword in PLATFORM_KEYWORDS if re.search(r'\b{}\Z'.format(keyword), campaign_lower)]
# Join lists to form comma-separated strings
geo_str = ', '.join(geo_list)
targeting_str = ', '.join(targeting_list)
platform_str = ', '.join(platform_list)
return geo_str, segment, targeting_str, platform_str
# Assuming input DataFrame 'inputDf' contains campaign names and account information
# Create a DataFrame with sample data (replace this with your actual input data)
# inputDf = ...
# Copy input rows to output DataFrame
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT]].copy()
# Loop through all rows in the input DataFrame
for index, row in inputDf.iterrows():
campaign_name = row[RPT_COL_CAMPAIGN]
# Extract geo, segment, targeting, and platform from campaign name
geo, segment, targeting, platform = extract_dimensions(campaign_name)
# Update columns in the output DataFrame
outputDf.at[index, BULK_COL_GEO] = geo
outputDf.at[index, BULK_COL_SEGMENT] = segment
outputDf.at[index, BULK_COL_TARGETING] = targeting
outputDf.at[index, BULK_COL_PLATFORM] = platform
outputDf.at[index, BULK_COL_ACCOUNT] = row[RPT_COL_ACCOUNT]
# Drop any rows with missing values in geo, segment, targeting, platform, or account columns
outputDf = outputDf.dropna(subset=[BULK_COL_GEO, BULK_COL_SEGMENT, BULK_COL_TARGETING, BULK_COL_PLATFORM, BULK_COL_ACCOUNT])
if not outputDf.empty:
print("outputDf:\n", outputDf.to_string(index=False))
else:
print("Empty outputDf")
Post generated on 2024-05-15 07:44:05 GMT