Script 1029: Tagging social dimension Geo

Purpose:

The Python script identifies and tags geographical regions in campaign names based on predefined keywords.

To Elaborate

The script is designed to analyze campaign names and identify geographical regions mentioned within them. It uses a predefined list of geographical keywords to search through each campaign name, extracting any matching keywords that indicate a specific region. The extracted geographical information is then tagged to each campaign, allowing for better organization and analysis of campaigns based on their geographical focus. The script also ensures that only campaigns with identified geographical tags are retained, removing any entries that do not have associated geographical information.

Walking Through the Code

  1. Define Configurable Parameters:
    • The script begins by defining a list of geographical keywords (GEO_KEYWORDS) that are used to identify regions within campaign names. These keywords are user-changeable and can be updated to include additional regions as needed.
  2. Extract Geo Function:
    • A function named extract_geo is defined to process each campaign name. It converts the campaign name to lowercase and checks for the presence of any keywords from the GEO_KEYWORDS list. If a keyword is found, it is added to a list, which is then returned as a comma-separated string.
  3. Copy and Process Data:
    • The script copies the input data to a new DataFrame (outputDf) to preserve the original data. It iterates over each row, extracting the campaign name and using the extract_geo function to identify geographical tags.
  4. Update Geo Column:
    • For each campaign, the identified geographical tags are updated in the Geo column of the outputDf. The script prints the campaign name alongside its identified geographical tags for verification.
  5. Filter Data:
    • The script removes any rows from outputDf that do not have geographical tags, ensuring that only relevant campaigns are retained for further analysis.
  6. Output Verification:
    • Finally, the script checks if the outputDf is empty and prints the tableized output if it contains data, or a message indicating that it is empty.

Vitals

  • Script ID : 1029
  • Client ID / Customer ID: 1306927457 / 60270313
  • Action Type: Bulk Upload (Preview)
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Geo
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Autumn Archibald (aarchibald@marinsoftware.com)
  • Created by Autumn Archibald on 2024-04-29 23:18
  • Last Updated by Autumn Archibald on 2024-04-30 04:55
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Define the configurable parameters for the script
GEO_KEYWORDS = ['us', 'ca', 'apac', 'nordics', 'uk', 'au']
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GEO = 'Geo'

# Function to extract geo from campaign name
def extract_geo(campaign_name):
    campaign_lower = campaign_name.lower()
    geo_list = []
    for geo in GEO_KEYWORDS:
        if geo in campaign_lower:
            geo_list.append(geo)
    return ', '.join(geo_list)

# Copy input rows to output
outputDf = inputDf.copy()

# Loop through all rows
for index, row in inputDf.iterrows():
    campaign_name = row[RPT_COL_CAMPAIGN]

    # Extract geo from campaign name
    geo = extract_geo(campaign_name)
    print("Campaign [%s] => Geo [%s]" % (campaign_name, geo))

    # Update geo column
    outputDf.at[index, BULK_COL_GEO] = geo

# Drop any rows with missing geo values
outputDf = outputDf.dropna(subset=[BULK_COL_GEO])

if not outputDf.empty:
    print("outputDf", tableize(outputDf))
else:
    print("Empty outputDf")

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus