Script 1039: Jeremy Script

Purpose:

The Python script processes campaign data to extract and categorize specific dimensions such as geo, segment, targeting, and platform from campaign names.

To Elaborate

The script is designed to analyze campaign data by extracting specific dimensions from campaign names. It focuses on identifying geographical regions, market segments, targeting strategies, and platforms based on predefined keywords. The extracted information is then organized into a structured format, allowing for better analysis and reporting. The script ensures that only complete data entries are retained by removing any rows with missing values in the key dimensions. This process aids in the structured budget allocation (SBA) by categorizing campaigns according to their attributes, which can be crucial for marketing analysis and decision-making.

Walking Through the Code

  1. Keyword Definitions: The script begins by defining lists of keywords for geographical regions, segments, targeting strategies, and platforms. These keywords are used to identify and categorize parts of the campaign names.

  2. Function Definition: A function extract_dimensions is defined to process each campaign name. It converts the campaign name to lowercase and searches for matches with the predefined keywords. The function returns the identified geo, segment, targeting, and platform as comma-separated strings.

  3. DataFrame Preparation: The script copies relevant columns from the input DataFrame to a new output DataFrame, which will store the processed data.

  4. Data Processing Loop: The script iterates over each row in the input DataFrame, extracting dimensions using the extract_dimensions function. It updates the output DataFrame with the extracted information.

  5. Data Cleaning: After processing, the script removes any rows from the output DataFrame that have missing values in the key columns, ensuring only complete data entries are retained.

  6. Output: Finally, the script checks if the output DataFrame is empty and prints the results if it contains data.

Vitals

  • Script ID : 1039
  • Client ID / Customer ID: 1306927457 / 60270313
  • Action Type: Bulk Upload (Preview)
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Geo, Platform, Segment, Targeting
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jeremy Brown (jbrown@marinsoftware.com)
  • Created by Jeremy Brown on 2024-04-30 16:43
  • Last Updated by Jeremy Brown on 2024-04-30 16:59
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
SEGMENT_KEYWORDS = ['b', 'nb', 'd', 'a', 'v', 'n']
TARGETING_KEYWORDS = ['rm', 'pr']
PLATFORM_KEYWORDS = ['g', 'b', 'li', 't', 'fb', 'm']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'
BULK_COL_SEGMENT = 'Segment'
BULK_COL_TARGETING = 'Targeting'
BULK_COL_PLATFORM = 'Platform'

# Function to extract geo, segment, targeting, and platform from campaign name
def extract_dimensions(campaign_name):
    campaign_lower = campaign_name.lower()
    geo_list = [keyword for keyword in GEO_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
    
    # Sort segment keywords by length (longest to shortest)
    sorted_segment_keywords = sorted(SEGMENT_KEYWORDS, key=len, reverse=True)
    segment = next((keyword for keyword in sorted_segment_keywords if re.search(r'\b{}\b'.format(keyword), campaign_lower)), '')

    targeting_list = [keyword for keyword in TARGETING_KEYWORDS if re.search(r'\b{}\b'.format(keyword), campaign_lower)]
    platform_list = [keyword for keyword in PLATFORM_KEYWORDS if re.search(r'\b{}\Z'.format(keyword), campaign_lower)]
    
    # Join lists to form comma-separated strings
    geo_str = ', '.join(geo_list)
    targeting_str = ', '.join(targeting_list)
    platform_str = ', '.join(platform_list)
    
    return geo_str, segment, targeting_str, platform_str

# Copy input rows to output DataFrame
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT]].copy()

# Loop through all rows in the input DataFrame
for index, row in inputDf.iterrows():
    campaign_name = row[RPT_COL_CAMPAIGN]

    # Extract geo, segment, targeting, and platform from campaign name
    geo, segment, targeting, platform = extract_dimensions(campaign_name)

    # Update columns in the output DataFrame
    outputDf.at[index, BULK_COL_GEO] = geo
    outputDf.at[index, BULK_COL_SEGMENT] = segment
    outputDf.at[index, BULK_COL_TARGETING] = targeting
    outputDf.at[index, BULK_COL_PLATFORM] = platform
    outputDf.at[index, BULK_COL_ACCOUNT] = row[RPT_COL_ACCOUNT]

# Drop any rows with missing values in geo, segment, targeting, platform, or account columns
outputDf = outputDf.dropna(subset=[BULK_COL_GEO, BULK_COL_SEGMENT, BULK_COL_TARGETING, BULK_COL_PLATFORM, BULK_COL_ACCOUNT])

if not outputDf.empty:
    print("outputDf:\n", outputDf.to_string(index=False))
else:
    print("Empty outputDf")

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus