Script 1773: Set GUID ID
Purpose:
The Python script extracts and updates GUIDs from campaign names in a dataset, ensuring only valid entries are retained.
To Elaborate
The Python script is designed to process a dataset containing campaign information, specifically focusing on extracting GUIDs from campaign names. Each campaign name may contain a GUID, which is the segment following the last occurrence of a specified delimiter (‘|’). The script iterates through each campaign name, extracts the GUID if present, and updates the dataset with this information. It ensures that only campaigns with valid GUIDs are included in the final output, removing any entries without a GUID. Additionally, the script trims any extra whitespace from campaign names to prevent potential issues in data presentation or further processing.
Walking Through the Code
- Configurable Parameters:
-
The script begins by defining a configurable parameter PLACEMENT_KEY
, which is set to the delimiter ‘’. This key is used to identify and extract the GUID from the campaign names.
-
- Data Source and Columns:
- The primary data source is assigned to
inputDf
, and several column names are defined for both input and output data structures. These columns include campaign details and GUIDs.
- The primary data source is assigned to
- GUID Extraction Function:
- A function
get_guid
is defined to extract the GUID from a campaign name. It splits the campaign name using thePLACEMENT_KEY
and returns the last segment if the delimiter is present. If not, it returnsNaN
.
- A function
- Data Processing:
- The script copies the input data to
outputDf
and iterates over each row. For each campaign name, it checks for the presence of the delimiter. If found, it extracts the GUID and updates the corresponding row inoutputDf
.
- The script copies the input data to
- Filtering and Cleaning:
- After processing, the script filters out rows without a valid GUID and removes any extra whitespace from campaign names to ensure data integrity.
- Output Handling:
- Finally, the script checks if the processed data is non-empty and prints the results. If the dataset is empty after processing, it outputs a corresponding message.
Vitals
- Script ID : 1773
- Client ID / Customer ID: 1306928593 / 60270593
- Action Type: Bulk Upload (Preview)
- Item Changed: Campaign
- Output Columns: Account, Campaign, GUID, Campaign ID
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
- Created by ascott@marinsoftware.com on 2025-02-28 19:48
- Last Updated by ascott@marinsoftware.com on 2025-02-28 20:07
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Configurable Params - START
PLACEMENT_KEY = '|'
# Primary data source and columns
inputDf = dataSourceDict["1"]
# Define primary data source columns
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_GUID = 'GUID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
# Define output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GUID = 'GUID'
BULK_COL_CAMPAIGN_ID = 'Campaign ID'
# Function to extract the GUID (value after the last '|')
def get_guid(campaign_name):
parts = campaign_name.split(PLACEMENT_KEY)
if len(parts) > 1:
return parts[-1].strip() # Extract and return the value after the last '|'
else:
return np.nan # Return NaN if no '|' exists
# Copy all input rows to output
outputDf = inputDf.copy()
# Loop through all rows to populate the GUID column
for index, row in inputDf.iterrows():
campaign_name = row[RPT_COL_CAMPAIGN]
# Skip processing if campaign name does not contain at least one placement key
if campaign_name.count(PLACEMENT_KEY) < 1:
continue
guid = get_guid(campaign_name)
# Only tag if it's different than the existing tag
if pd.notna(guid):
outputDf.at[index, BULK_COL_GUID] = guid
else:
outputDf.at[index, BULK_COL_GUID] = np.nan
# Only include non-empty GUID values in the bulk output
outputDf = outputDf.dropna(subset=[BULK_COL_GUID])
# Remove extra whitespace from campaign names that might break Preview
outputDf[RPT_COL_CAMPAIGN] = outputDf[RPT_COL_CAMPAIGN].str.strip()
# Display the results or handle empty DataFrame case
if not outputDf.empty:
print("Updated outputDf:\n", outputDf.head().to_string())
else:
print("The outputDf is empty after processing.")
Post generated on 2025-03-11 01:25:51 GMT