Script 1039: Jeremy Script

Purpose

The script processes campaign data to extract and categorize geographical, segment, targeting, and platform information from campaign names.

To Elaborate

The Python script is designed to analyze campaign data by extracting specific dimensions such as geographical location, segment, targeting, and platform from the campaign names. It processes each campaign name to identify and categorize these dimensions based on predefined keywords. The extracted information is then organized into a structured format, allowing for better analysis and reporting. The script ensures that only complete data entries are retained by removing any rows with missing values in the key columns. This process aids in the structured budget allocation (SBA) by providing a clear breakdown of campaign attributes, which can be crucial for marketing analysis and decision-making.

Walking Through the Code

  1. Define Keywords: The script begins by defining lists of keywords for geographical locations, segments, targeting, and platforms. These keywords are used to identify and categorize parts of the campaign names.

  2. Extract Dimensions Function: A function named extract_dimensions is defined to process each campaign name. It converts the campaign name to lowercase and searches for keywords in the name to identify geographical, segment, targeting, and platform dimensions. The function returns these dimensions as strings.

  3. Prepare Output DataFrame: The script creates a copy of the input DataFrame, selecting only the campaign and account columns to form the basis of the output DataFrame.

  4. Process Each Campaign: The script iterates over each row in the input DataFrame, extracting dimensions using the extract_dimensions function. It updates the output DataFrame with the extracted information for each campaign.

  5. Clean Data: After processing all rows, the script removes any rows from the output DataFrame that have missing values in the key columns, ensuring that only complete data is retained.

  6. Output Results: Finally, the script checks if the output DataFrame is empty. If not, it prints the DataFrame; otherwise, it indicates that the DataFrame is empty.

Vitals

  • Script ID : 1039
  • Client ID / Customer ID: 1306927457 / 60270313
  • Action Type: Bulk Upload (Preview)
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Geo, Platform, Segment, Targeting
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jeremy Brown (jbrown@marinsoftware.com)
  • Created by Jeremy Brown on 2024-04-30 16:43
  • Last Updated by Jeremy Brown on 2024-04-30 16:59
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
TARGETING_KEYWORDS = ['rm', 'pr']
PLATFORM_KEYWORDS = ['g', 'b', 'li', 't', 'fb', 'm']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'
BULK_COL_SEGMENT = 'Segment'
BULK_COL_TARGETING = 'Targeting'
BULK_COL_PLATFORM = 'Platform'

# Function to extract geo, segment, targeting, and platform from campaign name
def extract_dimensions(campaign_name):
    campaign_lower = campaign_name.lower()
    geo_list = [keyword for keyword in GEO_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
    
    # Sort segment keywords by length (longest to shortest)
    sorted_segment_keywords = sorted(SEGMENT_KEYWORDS, key=len, reverse=True)
    segment = next((keyword for keyword in sorted_segment_keywords if re.search(r'\b{}\b'.format(keyword), campaign_lower)), '')

    targeting_list = [keyword for keyword in TARGETING_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
    platform_list = [keyword for keyword in PLATFORM_KEYWORDS if re.search(r'\b{}\Z'.format(keyword), campaign_lower)]
    
    # Join lists to form comma-separated strings
    geo_str = ', '.join(geo_list)
    targeting_str = ', '.join(targeting_list)
    platform_str = ', '.join(platform_list)
    
    return geo_str, segment, targeting_str, platform_str

# Copy input rows to output DataFrame
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT]].copy()

# Loop through all rows in the input DataFrame
for index, row in inputDf.iterrows():
    campaign_name = row[RPT_COL_CAMPAIGN]

    # Extract geo, segment, targeting, and platform from campaign name
    geo, segment, targeting, platform = extract_dimensions(campaign_name)

    # Update columns in the output DataFrame
    outputDf.at[index, BULK_COL_GEO] = geo
    outputDf.at[index, BULK_COL_SEGMENT] = segment
    outputDf.at[index, BULK_COL_TARGETING] = targeting
    outputDf.at[index, BULK_COL_PLATFORM] = platform
    outputDf.at[index, BULK_COL_ACCOUNT] = row[RPT_COL_ACCOUNT]

# Drop any rows with missing values in geo, segment, targeting, platform, or account columns
outputDf = outputDf.dropna(subset=[BULK_COL_GEO, BULK_COL_SEGMENT, BULK_COL_TARGETING, BULK_COL_PLATFORM, BULK_COL_ACCOUNT])

if not outputDf.empty:
    print("outputDf:\n", outputDf.to_string(index=False))
else:
    print("Empty outputDf")

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus