Script 1523: Script AutoTag Campaign ID Number

Purpose

The script extracts and tags a numerical identifier from campaign names based on a specific format.

To Elaborate

The Python script is designed to process a dataset containing campaign information, specifically focusing on extracting a numerical identifier from the campaign names. The identifier is expected to be enclosed in parentheses and located before a dash (‘-‘) in the campaign name. The script aims to tag this identifier as the “Campaign ID #” for each campaign entry. It ensures that only valid identifiers are tagged and removes any entries without a valid identifier. Additionally, it cleans up the campaign names by removing extra whitespace to prevent formatting issues in subsequent data handling processes.

Walking Through the Code

  1. Configurable Parameters:
    • The script begins by defining a configurable parameter PLACEMENT_KEY, which is set to the dash (‘-‘) character. This key is used to identify the position in the campaign name where the numerical identifier is expected to appear.
  2. Data Preparation:
    • The script retrieves the primary data source, inputDf, which contains the campaign data. It also defines the column names for both input and output data, focusing on the campaign name and the campaign ID number.
  3. Function Definition:
    • A function get_value_before_dash is defined to extract the numerical identifier from the campaign name. It uses regular expressions to match and extract numbers enclosed in parentheses.
  4. Data Processing:
    • The script creates a copy of the input data to outputDf for processing. It iterates over each row in the dataset, checking if the campaign name contains the PLACEMENT_KEY. If the key is present, it attempts to extract the numerical identifier using the defined function.
  5. Tagging and Cleaning:
    • If a valid identifier is found, it is tagged in the output data. The script ensures that only non-empty tags are retained by dropping rows without a valid identifier. It also trims any extra whitespace from the campaign names to maintain clean data formatting.
  6. Output Handling:
    • Finally, the script checks if the output data is not empty and prints a preview of the processed data. If the output is empty, it prints a message indicating this state.

Vitals

  • Script ID : 1523
  • Client ID / Customer ID: 1306928453 / 60270539
  • Action Type: Bulk Upload (Preview)
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Campaign ID #
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Grégory Pantaine (gpantaine@marinsoftware.com)
  • Created by Grégory Pantaine on 2024-11-14 17:23
  • Last Updated by Grégory Pantaine on 2024-11-14 17:23
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
##
## name: Script AutoTag - Campaign ID Number
## description: Tags the value before the first - in the campaign name as the number without the brackets,
## ie: 5802 
## 
## author: G Pantaine with help from ChatGPT & M Huang.
## created: 2024-11-14
## 

# Configurable Params - START
PLACEMENT_KEY = '-'

# Primary data source and columns
inputDf = dataSourceDict["1"]

# Output columns and initial values
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_NUMERO_PROJET = 'Campaign ID #'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_NUMERO_PROJET = 'Campaign ID #'

# Function to extract the value before the first '-' and remove parentheses
def get_value_before_dash(campaign_name):
    match = re.match(r"\((\d+)\)", campaign_name)
    if match:
        return match.group(1)
    else:
        print("Value not found: " + campaign_name)
        return np.nan

# Copy all input rows to output
outputDf = inputDf.copy()

# Loop through all rows
for index, row in inputDf.iterrows():
    campaign_name = row[RPT_COL_CAMPAIGN]
    
    # Skip processing if campaign name does not contain the placement key
    if PLACEMENT_KEY not in campaign_name:
        continue

    value = get_value_before_dash(campaign_name)

    # Only tag if it's different than the existing tag
    if pd.notna(value):
        outputDf.at[index, BULK_COL_NUMERO_PROJET] = value
    else:
        outputDf.at[index, BULK_COL_NUMERO_PROJET] = np.nan

# Only include non-empty tags in bulk
outputDf = outputDf.dropna(subset=[BULK_COL_NUMERO_PROJET])

# Remove extra whitespace from campaign name that breaks Preview
outputDf[RPT_COL_CAMPAIGN] = outputDf[RPT_COL_CAMPAIGN].str.strip()

if not outputDf.empty:
    print("outputDf", outputDf.head().to_string())
else:
    print("Empty outputDf")

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus