Script 1395: Tagging product type dimension

Purpose:

The Python script tags product types based on campaign names in a DataFrame.

To Elaborate

The Python script is designed to identify and tag product types within a given dataset, specifically by analyzing the names of marketing campaigns. It operates by scanning the campaign names for specific keywords that correspond to predefined product types. If a campaign name contains any of these keywords, the script tags the campaign with the relevant product type(s). This process helps in categorizing and organizing marketing data, which can be crucial for reporting and analysis purposes. The script ensures that only campaigns with identifiable product types are retained in the final output, thereby filtering out irrelevant data.

Walking Through the Code

  1. Define Product Types: The script begins by defining a list of product types that it will search for within campaign names. These are the keywords that the script uses to tag the campaigns.

  2. Configurable Parameters: Several parameters are defined for column names, allowing users to adjust these if the column names in their dataset differ from the defaults.

  3. Extract Product Types Function: A function named extract_product_types is created to process each campaign name. It converts the campaign name to lowercase and checks for the presence of any product type keywords. If found, it returns a string of matched product types.

  4. Validate Input DataFrame: The script checks if the required columns are present in the input DataFrame. If any are missing, it raises an error, ensuring that the input data is correctly structured.

  5. Copy and Process DataFrame: The input DataFrame is copied to a new DataFrame, which will be used for output. The script applies the extract_product_types function to each campaign name, populating a new column with the identified product types.

  6. Filter DataFrame: Rows without any identified product types are removed from the DataFrame, ensuring that the output only contains relevant data.

  7. Output the DataFrame: Finally, the script checks if the resulting DataFrame is empty. If not, it prints the DataFrame using a function called tableize, which formats the output for display. If the DataFrame is empty, it outputs a message indicating this.

Vitals

  • Script ID : 1395
  • Client ID / Customer ID: 569613644 / 42130977
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Product Type
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Autumn Archibald (aarchibald@marinsoftware.com)
  • Created by Autumn Archibald on 2024-09-18 04:08
  • Last Updated by Autumn Archibald on 2024-09-18 04:10
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Define the list of product types
PRODUCT_TYPES = [
    'post-licensing', 'pre-licensing', 'remarketing',
    'continuing education', 'syc', 'brand'
]

# Define the configurable parameters for the script
RPT_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PRODUCT_TYPE = 'Product Type'

# Function to extract product types from campaign name
def extract_product_types(campaign_name):
    campaign_lower = campaign_name.lower()
    matched_types = [product_type for product_type in PRODUCT_TYPES if product_type in campaign_lower]
    return ', '.join(matched_types) if matched_types else ''

# Check if required columns are in the input DataFrame
required_columns = [RPT_COL_CAMPAIGN]
if not all(col in inputDf.columns for col in required_columns):
    raise ValueError(f"Input DataFrame must contain the following columns: {', '.join(required_columns)}")

# Copy input rows to output
outputDf = inputDf.copy()

# Apply product type extraction to the DataFrame
outputDf[BULK_COL_PRODUCT_TYPE] = outputDf[RPT_COL_CAMPAIGN].apply(extract_product_types)

# Drop any rows with missing product type values
outputDf = outputDf.dropna(subset=[BULK_COL_PRODUCT_TYPE])

# Output the resulting DataFrame
if not outputDf.empty:
    print("outputDf", tableize(outputDf))  # Ensure tableize function is defined or use an alternative method
else:
    print("Empty outputDf")

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus