Script 1459: Tagging County Dimension for Google
Purpose
The script identifies and tags county codes within campaign names in a DataFrame for further processing.
To Elaborate
The Python script is designed to process a DataFrame containing campaign data and identify specific county codes within the campaign names. It uses a predefined list of county codes and checks each campaign name to see if it contains any of these codes. If a county code is found, it is extracted and assigned to a new column in the DataFrame. The script ensures that only rows with identified county codes are retained, effectively filtering out any campaigns that do not correspond to the specified counties. This process is essential for organizing and categorizing campaign data based on geographic regions, which can be crucial for targeted marketing and analysis.
Walking Through the Code
- Define County Codes and Parameters
- The script begins by defining a list of county codes (
COUNTY_CODES
) that it will search for within campaign names. - It also sets up configurable parameters for column names, such as
RPT_COL_CAMPAIGN
,BULK_COL_ACCOUNT
,BULK_COL_CAMPAIGN
, andBULK_COL_COUNTY
. These parameters allow users to specify which columns in the DataFrame correspond to campaign and county data.
- The script begins by defining a list of county codes (
- Extract County Code Function
- A function
extract_county
is defined to search for county codes within a given campaign name. It converts the campaign name to lowercase and checks for the presence of any county code from the list. If a match is found, it returns the county code; otherwise, it returns an empty string.
- A function
- Validate Input DataFrame
- The script checks if the required columns, specifically the campaign column, are present in the input DataFrame. If not, it raises an error, ensuring that the input data is correctly structured before proceeding.
- Process DataFrame
- A copy of the input DataFrame is created to preserve the original data.
- The
extract_county
function is applied to the campaign column, and the results are stored in a new column designated for county codes. - Rows without identified county codes are removed from the DataFrame, ensuring that only relevant data is retained.
- Output Processed DataFrame
- The script checks if the resulting DataFrame is empty. If not, it prints the DataFrame using a
tableize
function, which should be defined elsewhere in the code or replaced with an alternative method for displaying the data.
- The script checks if the resulting DataFrame is empty. If not, it prints the DataFrame using a
Vitals
- Script ID : 1459
- Client ID / Customer ID: 1306928165 / 60270439
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, County
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Autumn Archibald (aarchibald@marinsoftware.com)
- Created by Autumn Archibald on 2024-10-25 05:42
- Last Updated by Autumn Archibald on 2024-10-25 05:42
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Define the list of county codes
COUNTY_CODES = ['DAL', 'LUB', 'MDO', 'FTW', 'AMA']
# Define the configurable parameters for the script
RPT_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_COUNTY = 'County'
# Function to extract county code from campaign name
def extract_county(campaign_name):
campaign_lower = campaign_name.lower()
for county in COUNTY_CODES:
if county.lower() in campaign_lower:
return county
return ''
# Check if required columns are in the input DataFrame
required_columns = [RPT_COL_CAMPAIGN]
if not all(col in inputDf.columns for col in required_columns):
raise ValueError(f"Input DataFrame must contain the following columns: {', '.join(required_columns)}")
# Copy input rows to output
outputDf = inputDf.copy()
# Apply county extraction to the DataFrame
outputDf[BULK_COL_COUNTY] = outputDf[RPT_COL_CAMPAIGN].apply(extract_county)
# Drop any rows with missing county values
outputDf = outputDf.dropna(subset=[BULK_COL_COUNTY])
# Output the resulting DataFrame
if not outputDf.empty:
print("outputDf", tableize(outputDf)) # Ensure tableize function is defined or use an alternative method
else:
print("Empty outputDf")
Post generated on 2024-11-27 06:58:46 GMT