Script 1393: Tagging State Dimension
Purpose
The script identifies and tags U.S. states mentioned in campaign names within a dataset.
To Elaborate
The Python script is designed to process a dataset containing campaign information and identify any U.S. states mentioned within the campaign names. The primary objective is to extract the state names from the campaign titles and tag them accordingly in a new column. This is particularly useful for businesses or organizations that need to analyze or report on campaign data with a geographical focus. The script ensures that only campaigns with identifiable state names are retained, thereby filtering out irrelevant data and enhancing the quality of the dataset for further analysis.
Walking Through the Code
-
Define U.S. States: The script begins by defining a list of all U.S. states, which will be used to check against the campaign names.
-
Configurable Parameters: It sets up several configurable parameters that define the column names used in the dataset, such as
RPT_COL_CAMPAIGN
for the campaign column andBULK_COL_STATE
for the state column. -
State Extraction Function: A function
extract_state
is defined to convert campaign names to lowercase and check if any state name is present. If a state is found, it returns the state name; otherwise, it returns an empty string. -
Input Validation: The script checks if the required columns are present in the input DataFrame. If not, it raises an error to ensure the input data is correctly formatted.
-
Data Processing: It copies the input DataFrame to an output DataFrame and applies the
extract_state
function to populate the state column. -
Data Cleaning: Rows with missing state values are dropped from the output DataFrame to ensure only relevant data is retained.
-
Output: Finally, the script prints the resulting DataFrame if it is not empty, ensuring that the data is ready for further use or analysis.
Vitals
- Script ID : 1393
- Client ID / Customer ID: 569613644 / 42130977
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, State
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Autumn Archibald (aarchibald@marinsoftware.com)
- Created by Autumn Archibald on 2024-09-18 02:37
- Last Updated by Autumn Archibald on 2024-09-18 02:37
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Define the list of US states
US_STATES = [
'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut',
'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa',
'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan',
'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire',
'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio',
'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia',
'Wisconsin', 'Wyoming'
]
# Define the configurable parameters for the script
RPT_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_STATE = 'State'
# Function to extract state from campaign name
def extract_state(campaign_name):
campaign_lower = campaign_name.lower()
for state in US_STATES:
if state.lower() in campaign_lower:
return state
return ''
# Check if required columns are in the input DataFrame
required_columns = [RPT_COL_CAMPAIGN]
if not all(col in inputDf.columns for col in required_columns):
raise ValueError(f"Input DataFrame must contain the following columns: {', '.join(required_columns)}")
# Copy input rows to output
outputDf = inputDf.copy()
# Apply state extraction to the DataFrame
outputDf[BULK_COL_STATE] = outputDf[RPT_COL_CAMPAIGN].apply(extract_state)
# Drop any rows with missing state values
outputDf = outputDf.dropna(subset=[BULK_COL_STATE])
# Output the resulting DataFrame
if not outputDf.empty:
print("outputDf", tableize(outputDf)) # Ensure tableize function is defined or use an alternative method
else:
print("Empty outputDf")
Post generated on 2024-11-27 06:58:46 GMT