Script 1293: Extract Title from Group name
Purpose:
The Python script extracts a title from a “Group” column using a regex pattern and updates the “Title” column accordingly.
To Elaborate
The Python script is designed to process a DataFrame by extracting a specific part of the text from the “Group” column and assigning it to the “Title” column. This is achieved using a regular expression pattern that identifies the desired portion of the text. The script also standardizes the text format by converting any occurrence of “‘S” to “‘s” in the “Title” column. Additionally, it marks a “g-check” column with “YES” to indicate that the processing has been completed. This script is useful for data cleaning and preparation, particularly in scenarios where structured data is required for further analysis or reporting.
Walking Through the Code
- Data Source Initialization:
- The script begins by defining the primary data source,
inputDf
, which is a DataFrame containing various columns including “Group”, “Campaign”, “Account”, “Title”, “Impr.”, “Campaign Status”, “Group Status”, and “g-check”.
- The script begins by defining the primary data source,
- Function Definition:
- A function named
process
is defined to handle the transformation of the input DataFrame. - Within this function, a regex pattern
r'^(.*?)(?:[_]|(?= - Season))'
is specified to extract the title from the “Group” column.
- A function named
- DataFrame Processing:
- The input DataFrame is copied to
outputDf
to preserve the original data. - The regex pattern is applied to the “Group” column to extract the title, which is then assigned to the “Title” column.
- The script replaces any occurrence of “‘S” with “‘s” in the “Title” column to ensure consistency in text formatting.
- The input DataFrame is copied to
- Final Adjustments:
- The “g-check” column is set to “YES” for all rows, indicating that the processing step has been completed.
- The processed DataFrame is printed for debugging purposes.
- Execution:
- The
process
function is called withinputDf
as an argument, and the resulting DataFrame is stored inoutputDf
.
- The
Vitals
- Script ID : 1293
- Client ID / Customer ID: 1306912147 / 69058
- Action Type: Bulk Upload
- Item Changed: AdGroup
- Output Columns: Account, Campaign, Group, Title
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeremy Brown (jbrown@marinsoftware.com)
- Created by Jeremy Brown on 2024-07-26 10:55
- Last Updated by Jeremy Brown on 2024-07-31 11:49
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
##
## Name: Extract Title from Group name
## description: Use the regex pattern = r'^(.*?)(?:[_]|(?= - Season))' to extract the title from the "Group" column and insert into `Title` dimension column.
##
## author: Jeremy Brown
## created: 2024-07-26
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_GROUP = 'Group'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_TITLE = 'Title'
RPT_COL_IMPR = 'Impr.'
RPT_COL_STATUS = 'Campaign Status'
RPT_COL_STATUS = 'Group Status'
RPT_COL_CHECK = 'g-check'
# Function to process the input DataFrame and return the output DataFrame
def process(inputDf):
# Define the regex pattern to extract the title from the "Group" column
regex_pattern = r'^(.*?)(?:[_]|(?= - Season))'
# Copy the input DataFrame to the output DataFrame
outputDf = inputDf.copy()
# Extract the title using the regex pattern and assign it to the "Title" column
outputDf[RPT_COL_TITLE] = outputDf[RPT_COL_GROUP].apply(
lambda x: re.match(regex_pattern, x).group(1) if re.match(regex_pattern, x) else ""
)
# Convert any occurrence of "'S" to "'s" in the 'Title' column
outputDf[RPT_COL_TITLE] = outputDf[RPT_COL_TITLE].str.replace("'S", "'s")
# Set the 'g-check' column to "YES"
outputDf[RPT_COL_CHECK] = "YES"
# Print the changed data for debugging
print(outputDf)
return outputDf
# Trigger the main process
outputDf = process(inputDf)
Post generated on 2025-03-11 01:25:51 GMT