Script 1523: Script AutoTag Campaign ID Number
Purpose
The script extracts and tags a numerical identifier from campaign names based on a specific format.
To Elaborate
The Python script is designed to process a dataset containing campaign information, specifically focusing on extracting a numerical identifier from the campaign names. The identifier is expected to be enclosed in parentheses and located before a dash (‘-‘) in the campaign name. The script aims to tag this identifier as the “Campaign ID #” for each campaign entry. It ensures that only valid identifiers are tagged and removes any entries without a valid identifier. Additionally, it cleans up the campaign names by removing extra whitespace to prevent formatting issues in subsequent data handling processes.
Walking Through the Code
- Configurable Parameters:
- The script begins by defining a configurable parameter
PLACEMENT_KEY
, which is set to the dash (‘-‘) character. This key is used to identify the position in the campaign name where the numerical identifier is expected to appear.
- The script begins by defining a configurable parameter
- Data Preparation:
- The script retrieves the primary data source,
inputDf
, which contains the campaign data. It also defines the column names for both input and output data, focusing on the campaign name and the campaign ID number.
- The script retrieves the primary data source,
- Function Definition:
- A function
get_value_before_dash
is defined to extract the numerical identifier from the campaign name. It uses regular expressions to match and extract numbers enclosed in parentheses.
- A function
- Data Processing:
- The script creates a copy of the input data to
outputDf
for processing. It iterates over each row in the dataset, checking if the campaign name contains thePLACEMENT_KEY
. If the key is present, it attempts to extract the numerical identifier using the defined function.
- The script creates a copy of the input data to
- Tagging and Cleaning:
- If a valid identifier is found, it is tagged in the output data. The script ensures that only non-empty tags are retained by dropping rows without a valid identifier. It also trims any extra whitespace from the campaign names to maintain clean data formatting.
- Output Handling:
- Finally, the script checks if the output data is not empty and prints a preview of the processed data. If the output is empty, it prints a message indicating this state.
Vitals
- Script ID : 1523
- Client ID / Customer ID: 1306928453 / 60270539
- Action Type: Bulk Upload (Preview)
- Item Changed: Campaign
- Output Columns: Account, Campaign, Campaign ID #
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Grégory Pantaine (gpantaine@marinsoftware.com)
- Created by Grégory Pantaine on 2024-11-14 17:23
- Last Updated by Grégory Pantaine on 2024-11-14 17:23
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
##
## name: Script AutoTag - Campaign ID Number
## description: Tags the value before the first - in the campaign name as the number without the brackets,
## ie: 5802
##
## author: G Pantaine with help from ChatGPT & M Huang.
## created: 2024-11-14
##
# Configurable Params - START
PLACEMENT_KEY = '-'
# Primary data source and columns
inputDf = dataSourceDict["1"]
# Output columns and initial values
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_NUMERO_PROJET = 'Campaign ID #'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_NUMERO_PROJET = 'Campaign ID #'
# Function to extract the value before the first '-' and remove parentheses
def get_value_before_dash(campaign_name):
match = re.match(r"\((\d+)\)", campaign_name)
if match:
return match.group(1)
else:
print("Value not found: " + campaign_name)
return np.nan
# Copy all input rows to output
outputDf = inputDf.copy()
# Loop through all rows
for index, row in inputDf.iterrows():
campaign_name = row[RPT_COL_CAMPAIGN]
# Skip processing if campaign name does not contain the placement key
if PLACEMENT_KEY not in campaign_name:
continue
value = get_value_before_dash(campaign_name)
# Only tag if it's different than the existing tag
if pd.notna(value):
outputDf.at[index, BULK_COL_NUMERO_PROJET] = value
else:
outputDf.at[index, BULK_COL_NUMERO_PROJET] = np.nan
# Only include non-empty tags in bulk
outputDf = outputDf.dropna(subset=[BULK_COL_NUMERO_PROJET])
# Remove extra whitespace from campaign name that breaks Preview
outputDf[RPT_COL_CAMPAIGN] = outputDf[RPT_COL_CAMPAIGN].str.strip()
if not outputDf.empty:
print("outputDf", outputDf.head().to_string())
else:
print("Empty outputDf")
Post generated on 2024-11-27 06:58:46 GMT