Script 565: Pacing Start & End Date

Purpose:

The Python script extracts and updates start and end dates from campaign names in a dataset, ensuring they are in a standardized format.

To Elaborate

The script is designed to parse campaign names in a dataset to extract start and end dates, which are then formatted into a standardized ISO format (YYYY-MM-DD). This is particularly useful for campaigns where the dates are embedded within the campaign name string. The script identifies these dates using regular expressions and updates the dataset with these parsed dates. It also checks for any changes in the campaign dates compared to existing records and outputs only the modified entries. This process helps maintain accurate and up-to-date records of campaign durations, which is crucial for effective campaign management and reporting.

Walking Through the Code

Initialization and Setup:
- The script begins by defining constants for column names used in the input and output dataframes.
- Temporary columns are created in the input dataframe to store parsed start and end dates.
Date Extraction Function:
- A function get_dates_from_campaign_name is defined to extract start and end dates from the ‘Campaign’ column using regular expressions.
- The function attempts to parse dates in both four-digit and two-digit year formats, defaulting to NaN if parsing fails.
Testing the Date Extraction:
- A test function test_get_dates_from_campaign_name is implemented to verify the date extraction logic against various campaign name formats.
- The test ensures that the function correctly identifies and formats the dates.
Parsing and Updating Dataframe:
- The script iterates over each row in the input dataframe, applying the date extraction function to populate the temporary date columns.
- It identifies rows where the parsed dates differ from existing records, marking them as changed.
Output Preparation:
- Only rows with changed dates are selected and copied to the output dataframe.
- The temporary columns are renamed to match the output column names, and the output dataframe is printed or initialized as empty if no changes are detected.

Vitals

Script ID : 565
Client ID / Customer ID: 1306927187 / 60270139
Action Type: Bulk Upload
Item Changed: Campaign
Output Columns: Account, Campaign, Pacing - End Date, Pacing - Start Date
Linked Datasource: M1 Report
Reference Datasource: None
Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
Created by ascott@marinsoftware.com on 2023-12-04 16:18
Last Updated by ascott@marinsoftware.com on 2024-07-12 14:11

> See it in Action

Python Code

# 
# Add Dimensions Tag based on Campaign Name
# * `Pacing - Start Date`
# * `Pacing - End Date`
# Example: `Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)`
#
# Author: Michael S. Huang
# Created: 2023-11-25
#

RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PACING_START_DATE = 'Pacing - Start Date'
BULK_COL_PACING_END_DATE = 'Pacing - End Date'

outputDf[BULK_COL_PACING_START_DATE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_PACING_END_DATE] = "<<YOUR VALUE>>"

# create temp column to store new dim tag and default to empty
TMP_START_DATE = RPT_COL_PACING_START_DATE + '_'
TMP_END_DATE = RPT_COL_PACING_END_DATE + '_'
inputDf[TMP_START_DATE] = np.nan
inputDf[TMP_END_DATE] = np.nan

# define function to parse out Start Date and End Date from `Campaign` column of row
# output should be in ISO format YYYY-MM-DD
def get_dates_from_campaign_name(row):
    campaign_name = row['Campaign']
    # Updated regex to handle optional spaces, parentheses, and missing closing parenthesis
    date_pattern_full = r'(\d{1,2}/\d{1,2}/\d{2,4})\s*-\s*?(\d{1,2}/\d{1,2}/\d{2,4})'
    match_full = re.search(date_pattern_full, campaign_name)
    if match_full:
        start_date_str, end_date_str = match_full.groups()
        # Try parsing with four-digit year format
        try:
            start_date = datetime.datetime.strptime(start_date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
            end_date = datetime.datetime.strptime(end_date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
        except ValueError:
            # If it fails, try parsing with two-digit year format
            try:
                start_date = datetime.datetime.strptime(start_date_str, '%m/%d/%y').strftime('%Y-%m-%d')
                end_date = datetime.datetime.strptime(end_date_str, '%m/%d/%y').strftime('%Y-%m-%d')
            except ValueError:
                # If both fail, set to NaN
                start_date, end_date = np.nan, np.nan
    else:
        start_date, end_date = np.nan, np.nan
    return start_date, end_date

def test_get_dates_from_campaign_name():
    campaign_formats = [
        "Traffic_Array Variety Show_3/1/2024-3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024-3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024-3/31/2024",
        "Traffic_Array Variety Show_3/1/2024-3/31/2024)",
        "Traffic_Array Variety Show_3/1/2024 - 3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024 - 3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024 - 3/31/2024",
        "Traffic_Array Variety Show_3/1/2024 - 3/31/2024)",
        "Traffic_Array Variety Show_3/1/2024- 3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024- 3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024- 3/31/2024",
        "Traffic_Array Variety Show_3/1/2024- 3/31/2024)",
        "Traffic_Array Variety Show_3/1/2024 -3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024 -3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024 -3/31/2024",
        "Traffic_Array Variety Show_3/1/2024 -3/31/2024)"
    ]

    expected_result = ('2024-03-01', '2024-03-31')
    for campaign in campaign_formats:
        row = pd.Series({'Campaign': campaign})
        assert get_dates_from_campaign_name(row) == expected_result, f"Test failed for format: {campaign}"

# Run the unit test
test_get_dates_from_campaign_name()

# parse out Start/End Date from Campaign name
for index, row in inputDf.iterrows():
    start_date, end_date = get_dates_from_campaign_name(row)
    inputDf.at[index, TMP_START_DATE] = start_date
    inputDf.at[index, TMP_END_DATE] = end_date

print("inputDf with parsed tags", tableize(inputDf))

# find changed campaigns
changed = ((inputDf[TMP_START_DATE].notnull() & (inputDf[RPT_COL_PACING_START_DATE] != inputDf[TMP_START_DATE])) | \
          (inputDf[TMP_END_DATE].notnull() & (inputDf[RPT_COL_PACING_END_DATE] != inputDf[TMP_END_DATE])))

print("== Campaigns with Changed Tag ==", tableize(inputDf.loc[changed]))

# only select changed rows
cols = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, TMP_START_DATE, TMP_END_DATE]
outputDf = inputDf.loc[ changed, cols ].copy() \
                  .rename(columns = { \
                    TMP_START_DATE: BULK_COL_PACING_START_DATE, \
                    TMP_END_DATE: BULK_COL_PACING_END_DATE \
                  })

if not outputDf.empty:
  print("outputDf", tableize(outputDf))
else:
  print("Empty outputDf")
  outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_PACING_START_DATE, BULK_COL_PACING_END_DATE])

Post generated on 2025-03-11 01:25:51 GMT

04 Dec 2023

« Script 563: Pacing Start & End Date Script 567: Scripts Campaign Anomaly Detection by Hotel with Summary »

MarinOne Scripts Creator's Corner