Script 1395: Tagging product type dimension
Purpose
The Python script tags product types based on campaign names in a DataFrame.
To Elaborate
The script processes a DataFrame containing campaign data to identify and tag product types based on the campaign names. It uses a predefined list of product types and checks if any of these types are present in the campaign names. If a match is found, the corresponding product type is extracted and added to a new column in the DataFrame. The script ensures that only rows with identified product types are retained, effectively filtering out any campaigns that do not match the predefined product types. This process helps in categorizing and organizing campaign data based on product types, which can be useful for reporting and analysis purposes.
Walking Through the Code
- Define Product Types and Parameters
- A list of product types is defined, which includes ‘post-licensing’, ‘pre-licensing’, ‘remarketing’, ‘continuing education’, ‘syc’, and ‘brand’.
- Configurable parameters for column names are set, such as
RPT_COL_CAMPAIGN
,BULK_COL_ACCOUNT
,BULK_COL_CAMPAIGN
, andBULK_COL_PRODUCT_TYPE
.
- Extract Product Types
- A function
extract_product_types
is defined to identify product types within a campaign name by converting the name to lowercase and checking for matches with the predefined product types. - The function returns a comma-separated string of matched product types or an empty string if no matches are found.
- A function
- Validate Input DataFrame
- The script checks if the required column, specified by
RPT_COL_CAMPAIGN
, exists in the input DataFrame. If not, it raises aValueError
.
- The script checks if the required column, specified by
- Process DataFrame
- A copy of the input DataFrame is created to preserve the original data.
- The
extract_product_types
function is applied to the campaign column, and the results are stored in a new column specified byBULK_COL_PRODUCT_TYPE
.
- Filter and Output Data
- Rows with missing product type values are dropped from the DataFrame.
- The resulting DataFrame is printed if it is not empty, using a function like
tableize
to format the output.
Vitals
- Script ID : 1395
- Client ID / Customer ID: 569613644 / 42130977
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Product Type
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Autumn Archibald (aarchibald@marinsoftware.com)
- Created by Autumn Archibald on 2024-09-18 04:08
- Last Updated by Autumn Archibald on 2024-09-18 04:10
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Define the list of product types
PRODUCT_TYPES = [
'post-licensing', 'pre-licensing', 'remarketing',
'continuing education', 'syc', 'brand'
]
# Define the configurable parameters for the script
RPT_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PRODUCT_TYPE = 'Product Type'
# Function to extract product types from campaign name
def extract_product_types(campaign_name):
campaign_lower = campaign_name.lower()
matched_types = [product_type for product_type in PRODUCT_TYPES if product_type in campaign_lower]
return ', '.join(matched_types) if matched_types else ''
# Check if required columns are in the input DataFrame
required_columns = [RPT_COL_CAMPAIGN]
if not all(col in inputDf.columns for col in required_columns):
raise ValueError(f"Input DataFrame must contain the following columns: {', '.join(required_columns)}")
# Copy input rows to output
outputDf = inputDf.copy()
# Apply product type extraction to the DataFrame
outputDf[BULK_COL_PRODUCT_TYPE] = outputDf[RPT_COL_CAMPAIGN].apply(extract_product_types)
# Drop any rows with missing product type values
outputDf = outputDf.dropna(subset=[BULK_COL_PRODUCT_TYPE])
# Output the resulting DataFrame
if not outputDf.empty:
print("outputDf", tableize(outputDf)) # Ensure tableize function is defined or use an alternative method
else:
print("Empty outputDf")
Post generated on 2024-11-27 06:58:46 GMT