Script 1295: Extract Search Phrase from Keyword

Purpose:

The Python script processes a DataFrame to extract and replace search phrases from keywords based on a given title, creating a new column with the modified keyword.

To Elaborate

The script is designed to process a DataFrame containing marketing data, specifically focusing on keywords and titles. It aims to identify and extract search phrases from the ‘Keyword’ column by matching them with the ‘Title’ column. The script cleans both the keyword and title by removing special characters and converting them to lowercase for accurate comparison. If a match is found, the script replaces the matched portion of the keyword with a placeholder ‘[TITLE]’. This processed keyword is then stored in a new column called ‘Verb’. The script ensures that the original keyword’s special characters and casing are preserved in the final output. The processed data is then returned as a new DataFrame.

Walking Through the Code

Initialization: The script begins by defining a function clean_string to remove special characters and convert strings to lowercase. This function is used to prepare the ‘Keyword’ and ‘Title’ for comparison.
Processing Function: The process function is defined to handle the main logic. It initializes an empty list to store processed rows and iterates over each row in the input DataFrame.
Data Cleaning: For each row, the script cleans the ‘Keyword’ and ‘Title’ using the clean_string function. It also creates a version of the cleaned title without spaces for additional matching.
Keyword Matching: The script checks if the cleaned title (with or without spaces) is present in the cleaned keyword. If a match is found, it replaces the matched portion with ‘[TITLE]’ using regular expressions.
Constructing New Rows: A new dictionary is created for each row, containing all original columns and the newly created ‘Verb’ column. The ‘k-check’ column is set to “YES”.
DataFrame Creation: The list of dictionaries is converted into a new DataFrame, which includes the processed data with the ‘Verb’ column.
Output: The processed DataFrame is returned as the output of the function.

Vitals

Script ID : 1295
Client ID / Customer ID: 1306912147 / 69058
Action Type: Bulk Upload
Item Changed: Keyword
Output Columns: Account, Campaign, Group, Keyword, Match Type, Verb
Linked Datasource: M1 Report
Reference Datasource: None
Owner: Jeremy Brown (jbrown@marinsoftware.com)
Created by Jeremy Brown on 2024-07-26 13:04
Last Updated by Jeremy Brown on 2024-07-31 12:07

> See it in Action

Python Code

##
## Name: Extract Search Phrase from Keyword
## Description:
    ## 1. Initialize: Sets up an empty list to collect processed rows.
    ## 2. Special Character Removal: Uses a regular expression to remove specified special characters from the Title.
    ## 3. Iterate Rows: Loops through each row in the input DataFrame.
    ## 4. Title Processing: Converts the Title to lowercase and removes spaces and special characters for comparison.
    ## 5. Keyword Matching: Checks if the cleaned Title (with and without spaces) is present in the Keyword. Replaces the matching part with [TITLE].
    ## 6. Construct Rows: Creates a dictionary for each row with the processed Verb and appends it to the list.
    ## 7. Create DataFrame: Converts the list of dictionaries into a new DataFrame.
    ## 8. Print for Debugging: Outputs the changed DataFrame for verification.
    ## 9. Return: Returns the processed DataFrame.
    ## FYI This code also ensures that special characters in the Title are removed before matching and processing the Keyword.

## author: Jeremy Brown
## created: 2024-07-26 

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_GROUP = 'Group'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_TITLE = 'Title'
RPT_COL_VERB = 'Verb'
RPT_COL_STATUS = 'Status'
RPT_COL_IMPR = 'Impr.'
RPT_COL_CHECK = 'k-check'

def clean_string(s):
    """
    Clean the input string by removing special characters and converting to lowercase.
    """
    special_chars = r"[\\(\)\*,\/\\|]"
    return re.sub(special_chars, "", s).lower()

def process(inputDf):
    """
    Process the input DataFrame to populate the output DataFrame with specific columns
    and create a new 'Verb' column based on the 'Keyword' and 'Title'.
    """
    # Initialize a list to collect new rows
    rows = []
    
    # Iterate through each row in the input DataFrame
    for idx, row in inputDf.iterrows():
        original_keyword = row[RPT_COL_KEYWORD]
        keyword_lower = original_keyword.lower()
        title = str(row[RPT_COL_TITLE])  # Ensure Title is a string
        
        # Clean both keyword and title for comparison
        cleaned_keyword = clean_string(keyword_lower)
        cleaned_title = clean_string(title)
        cleaned_title_no_spaces = cleaned_title.replace(" ", "")
        
        # Check if the cleaned title or title without spaces is in the cleaned keyword
        if cleaned_title in cleaned_keyword:
            # Use regex to replace the cleaned title in the original keyword, ignoring case
            verb = re.sub(re.escape(cleaned_title), "[TITLE]", keyword_lower, flags=re.IGNORECASE).strip()
        elif cleaned_title_no_spaces in cleaned_keyword:
            # Use regex to replace the cleaned title without spaces in the original keyword, ignoring case
            verb = re.sub(re.escape(cleaned_title_no_spaces), "[TITLE]", keyword_lower, flags=re.IGNORECASE).strip()
        else:
            verb = original_keyword  # No match found; use the original keyword

        # Replace with the original keyword's special characters and casing preserved
        verb = re.sub(re.escape(clean_string(original_keyword)), verb, original_keyword, flags=re.IGNORECASE)

        # Capitalize the first letter of the verb if the original keyword started with a capital letter
        if original_keyword[0].isupper():
            verb = verb.capitalize()

        # Construct the new row as a dictionary
        new_row = {
            RPT_COL_KEYWORD: row[RPT_COL_KEYWORD],
            RPT_COL_MATCH_TYPE: row[RPT_COL_MATCH_TYPE],
            RPT_COL_GROUP: row[RPT_COL_GROUP],
            RPT_COL_CAMPAIGN: row[RPT_COL_CAMPAIGN],
            RPT_COL_ACCOUNT: row[RPT_COL_ACCOUNT],
            RPT_COL_TITLE: row[RPT_COL_TITLE],
            RPT_COL_VERB: verb,
            RPT_COL_CHECK: "YES"  # Set 'k-check' to "YES"
        }

        # Add the new row to the list
        rows.append(new_row)
    
    # Create a DataFrame from the list of rows
    df_out = pd.DataFrame(rows, columns=[
        RPT_COL_KEYWORD,
        RPT_COL_MATCH_TYPE,
        RPT_COL_GROUP,
        RPT_COL_CAMPAIGN,
        RPT_COL_ACCOUNT,
        RPT_COL_TITLE,
        RPT_COL_VERB,
        RPT_COL_CHECK
    ])
    
    # Print the data changed for debugging purposes
    print("Data changed:")
    print(df_out)
    
    return df_out

# Trigger the main process
outputDf = process(inputDf)

Post generated on 2025-03-11 01:25:51 GMT

26 Jul 2024

« Script 1293: Extract Title from Group name Script 1299: Benchmark Assignments from Gsheet »

MarinOne Scripts Creator's Corner