Script 1459: Tagging County Dimension for Google
Purpose:
The Python script tags county codes to campaigns based on their names in a DataFrame.
To Elaborate
The Python script is designed to identify and tag specific county codes within campaign names from a given DataFrame. It focuses on matching predefined county codes to the campaign names and appending these codes to a new column in the DataFrame. The script ensures that only campaigns with identifiable county codes are retained, thus filtering out any entries without a match. This process is crucial for organizing and categorizing campaign data by geographical regions, which can be particularly useful for targeted marketing and analysis.
Walking Through the Code
- Define County Codes and Parameters:
- The script begins by defining a list of county codes (
COUNTY_CODES
) that it will search for within campaign names. - It also sets up configurable parameters for column names, such as
RPT_COL_CAMPAIGN
,BULK_COL_ACCOUNT
,BULK_COL_CAMPAIGN
, andBULK_COL_COUNTY
, which can be adjusted by the user to match the specific structure of their input data.
- The script begins by defining a list of county codes (
- Extract County Code Function:
- A function
extract_county
is defined to search for county codes within a campaign name. It converts the campaign name to lowercase and checks for the presence of any county code from the predefined list. If a match is found, it returns the county code; otherwise, it returns an empty string.
- A function
- Validate Input DataFrame:
- The script checks if the required columns, specifically the campaign column (
RPT_COL_CAMPAIGN
), are present in the input DataFrame. If not, it raises an error, ensuring that the input data is correctly structured.
- The script checks if the required columns, specifically the campaign column (
- Process DataFrame:
- A copy of the input DataFrame is created to preserve the original data. The script then applies the
extract_county
function to each campaign name, populating a new column (BULK_COL_COUNTY
) with the extracted county codes.
- A copy of the input DataFrame is created to preserve the original data. The script then applies the
- Filter and Output DataFrame:
- Rows with missing county values are dropped from the DataFrame, ensuring that only relevant data is retained. The resulting DataFrame is then printed, provided it is not empty, using a function like
tableize
to format the output.
- Rows with missing county values are dropped from the DataFrame, ensuring that only relevant data is retained. The resulting DataFrame is then printed, provided it is not empty, using a function like
Vitals
- Script ID : 1459
- Client ID / Customer ID: 1306928165 / 60270439
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, County
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Autumn Archibald (aarchibald@marinsoftware.com)
- Created by Autumn Archibald on 2024-10-25 05:42
- Last Updated by Autumn Archibald on 2024-10-25 05:42
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Define the list of county codes
COUNTY_CODES = ['DAL', 'LUB', 'MDO', 'FTW', 'AMA']
# Define the configurable parameters for the script
RPT_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_COUNTY = 'County'
# Function to extract county code from campaign name
def extract_county(campaign_name):
campaign_lower = campaign_name.lower()
for county in COUNTY_CODES:
if county.lower() in campaign_lower:
return county
return ''
# Check if required columns are in the input DataFrame
required_columns = [RPT_COL_CAMPAIGN]
if not all(col in inputDf.columns for col in required_columns):
raise ValueError(f"Input DataFrame must contain the following columns: {', '.join(required_columns)}")
# Copy input rows to output
outputDf = inputDf.copy()
# Apply county extraction to the DataFrame
outputDf[BULK_COL_COUNTY] = outputDf[RPT_COL_CAMPAIGN].apply(extract_county)
# Drop any rows with missing county values
outputDf = outputDf.dropna(subset=[BULK_COL_COUNTY])
# Output the resulting DataFrame
if not outputDf.empty:
print("outputDf", tableize(outputDf)) # Ensure tableize function is defined or use an alternative method
else:
print("Empty outputDf")
Post generated on 2025-03-11 01:25:51 GMT