Script 1459: Tagging County Dimension for Google

Purpose:

The Python script tags county codes to campaigns based on their names in a DataFrame.

To Elaborate

The Python script is designed to identify and tag specific county codes within campaign names from a given DataFrame. It focuses on matching predefined county codes to the campaign names and appending these codes to a new column in the DataFrame. The script ensures that only campaigns with identifiable county codes are retained, thus filtering out any entries without a match. This process is crucial for organizing and categorizing campaign data by geographical regions, which can be particularly useful for targeted marketing and analysis.

Walking Through the Code

  1. Define County Codes and Parameters:
    • The script begins by defining a list of county codes (COUNTY_CODES) that it will search for within campaign names.
    • It also sets up configurable parameters for column names, such as RPT_COL_CAMPAIGN, BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, and BULK_COL_COUNTY, which can be adjusted by the user to match the specific structure of their input data.
  2. Extract County Code Function:
    • A function extract_county is defined to search for county codes within a campaign name. It converts the campaign name to lowercase and checks for the presence of any county code from the predefined list. If a match is found, it returns the county code; otherwise, it returns an empty string.
  3. Validate Input DataFrame:
    • The script checks if the required columns, specifically the campaign column (RPT_COL_CAMPAIGN), are present in the input DataFrame. If not, it raises an error, ensuring that the input data is correctly structured.
  4. Process DataFrame:
    • A copy of the input DataFrame is created to preserve the original data. The script then applies the extract_county function to each campaign name, populating a new column (BULK_COL_COUNTY) with the extracted county codes.
  5. Filter and Output DataFrame:
    • Rows with missing county values are dropped from the DataFrame, ensuring that only relevant data is retained. The resulting DataFrame is then printed, provided it is not empty, using a function like tableize to format the output.

Vitals

  • Script ID : 1459
  • Client ID / Customer ID: 1306928165 / 60270439
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, County
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Autumn Archibald (aarchibald@marinsoftware.com)
  • Created by Autumn Archibald on 2024-10-25 05:42
  • Last Updated by Autumn Archibald on 2024-10-25 05:42
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Define the list of county codes
COUNTY_CODES = ['DAL', 'LUB', 'MDO', 'FTW', 'AMA']

# Define the configurable parameters for the script
RPT_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_COUNTY = 'County'

# Function to extract county code from campaign name
def extract_county(campaign_name):
    campaign_lower = campaign_name.lower()
    for county in COUNTY_CODES:
        if county.lower() in campaign_lower:
            return county
    return ''

# Check if required columns are in the input DataFrame
required_columns = [RPT_COL_CAMPAIGN]
if not all(col in inputDf.columns for col in required_columns):
    raise ValueError(f"Input DataFrame must contain the following columns: {', '.join(required_columns)}")

# Copy input rows to output
outputDf = inputDf.copy()

# Apply county extraction to the DataFrame
outputDf[BULK_COL_COUNTY] = outputDf[RPT_COL_CAMPAIGN].apply(extract_county)

# Drop any rows with missing county values
outputDf = outputDf.dropna(subset=[BULK_COL_COUNTY])

# Output the resulting DataFrame
if not outputDf.empty:
    print("outputDf", tableize(outputDf))  # Ensure tableize function is defined or use an alternative method
else:
    print("Empty outputDf")

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus