Script 1393: Tagging State Dimension
Purpose:
The script identifies and tags US states mentioned in the “Campaign” column of a DataFrame.
To Elaborate
The Python script is designed to process a DataFrame containing campaign data and identify any US state names mentioned within the “Campaign” column. The primary goal is to extract and tag these state names, thereby enriching the dataset with geographical information. This can be particularly useful for businesses or organizations that need to analyze campaign performance or allocate resources based on geographic regions. The script ensures that only campaigns with identifiable state names are retained, thus filtering out irrelevant data and focusing on location-specific insights.
Walking Through the Code
-
Define US States: The script begins by defining a list of all US states. This list is used to check against the campaign names to identify any state mentions.
-
Configurable Parameters: Several column names are defined as constants, which can be adjusted by the user if the DataFrame structure changes. These include the campaign and state columns.
-
State Extraction Function: A function named
extract_state
is defined to search for state names within a given campaign name. It converts the campaign name to lowercase and checks for the presence of any state name from the predefined list. -
Input Validation: The script checks if the required “Campaign” column is present in the input DataFrame. If not, it raises an error, ensuring that the input data is correctly structured.
-
Data Processing: The input DataFrame is copied to a new DataFrame, and the state extraction function is applied to each campaign name. This populates a new “State” column with the identified state names.
-
Data Cleaning: Rows with missing state values are dropped from the DataFrame, ensuring that only relevant data is retained for further analysis.
-
Output: The resulting DataFrame, now enriched with state information, is printed if it contains any data. If the DataFrame is empty, a message indicating this is printed instead.
Vitals
- Script ID : 1393
- Client ID / Customer ID: 569613644 / 42130977
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, State
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Autumn Archibald (aarchibald@marinsoftware.com)
- Created by Autumn Archibald on 2024-09-18 02:37
- Last Updated by Autumn Archibald on 2024-09-18 02:37
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Define the list of US states
US_STATES = [
'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut',
'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa',
'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan',
'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire',
'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio',
'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia',
'Wisconsin', 'Wyoming'
]
# Define the configurable parameters for the script
RPT_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_STATE = 'State'
# Function to extract state from campaign name
def extract_state(campaign_name):
campaign_lower = campaign_name.lower()
for state in US_STATES:
if state.lower() in campaign_lower:
return state
return ''
# Check if required columns are in the input DataFrame
required_columns = [RPT_COL_CAMPAIGN]
if not all(col in inputDf.columns for col in required_columns):
raise ValueError(f"Input DataFrame must contain the following columns: {', '.join(required_columns)}")
# Copy input rows to output
outputDf = inputDf.copy()
# Apply state extraction to the DataFrame
outputDf[BULK_COL_STATE] = outputDf[RPT_COL_CAMPAIGN].apply(extract_state)
# Drop any rows with missing state values
outputDf = outputDf.dropna(subset=[BULK_COL_STATE])
# Output the resulting DataFrame
if not outputDf.empty:
print("outputDf", tableize(outputDf)) # Ensure tableize function is defined or use an alternative method
else:
print("Empty outputDf")
Post generated on 2025-03-11 01:25:51 GMT