Script 565: Pacing Start & End Date

Purpose

The Python script extracts and updates pacing start and end dates from campaign names in a dataset.

To Elaborate

The script is designed to parse campaign names to extract start and end dates, which are then formatted into ISO format (YYYY-MM-DD). It identifies campaigns with changed pacing dates by comparing the extracted dates with existing ones in the dataset. If changes are detected, the script updates the dataset with the new dates. This process ensures that the pacing dates in the dataset are accurate and up-to-date, which is crucial for effective campaign management and reporting.

Walking Through the Code

  1. Initialization and Setup
    • The script begins by defining constants for column names used in the dataset.
    • Temporary columns are created in the input DataFrame to store the parsed start and end dates.
  2. Date Extraction Function
    • A function get_dates_from_campaign_name is defined to extract dates from the campaign name using regular expressions.
    • The function handles various date formats and converts them into ISO format. If parsing fails, it returns NaN.
  3. Testing the Date Extraction
    • A test function test_get_dates_from_campaign_name is implemented to verify the date extraction logic against multiple campaign name formats.
  4. Parsing and Updating Dates
    • The script iterates over each row in the input DataFrame, applying the date extraction function.
    • It updates the temporary columns with the parsed dates.
  5. Identifying and Processing Changes
    • The script identifies campaigns with changed pacing dates by comparing the parsed dates with existing ones.
    • It selects only the rows with changes and updates the output DataFrame with the new pacing dates.
  6. Output Handling
    • The script checks if there are any changes and prints the updated DataFrame. If no changes are detected, it initializes an empty DataFrame with the necessary columns.

Vitals

  • Script ID : 565
  • Client ID / Customer ID: 1306927187 / 60270139
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Pacing - End Date, Pacing - Start Date
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
  • Created by ascott@marinsoftware.com on 2023-12-04 16:18
  • Last Updated by ascott@marinsoftware.com on 2024-07-12 14:11
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# 
# Add Dimensions Tag based on Campaign Name
# * `Pacing - Start Date`
# * `Pacing - End Date`
# Example: `Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)`
#
# Author: Michael S. Huang
# Created: 2023-11-25
#

RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PACING_START_DATE = 'Pacing - Start Date'
BULK_COL_PACING_END_DATE = 'Pacing - End Date'

outputDf[BULK_COL_PACING_START_DATE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_PACING_END_DATE] = "<<YOUR VALUE>>"

# create temp column to store new dim tag and default to empty
TMP_START_DATE = RPT_COL_PACING_START_DATE + '_'
TMP_END_DATE = RPT_COL_PACING_END_DATE + '_'
inputDf[TMP_START_DATE] = np.nan
inputDf[TMP_END_DATE] = np.nan

# define function to parse out Start Date and End Date from `Campaign` column of row
# output should be in ISO format YYYY-MM-DD
def get_dates_from_campaign_name(row):
    campaign_name = row['Campaign']
    # Updated regex to handle optional spaces, parentheses, and missing closing parenthesis
    date_pattern_full = r'(\d{1,2}/\d{1,2}/\d{2,4})\s*-\s*?(\d{1,2}/\d{1,2}/\d{2,4})'
    match_full = re.search(date_pattern_full, campaign_name)
    if match_full:
        start_date_str, end_date_str = match_full.groups()
        # Try parsing with four-digit year format
        try:
            start_date = datetime.datetime.strptime(start_date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
            end_date = datetime.datetime.strptime(end_date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
        except ValueError:
            # If it fails, try parsing with two-digit year format
            try:
                start_date = datetime.datetime.strptime(start_date_str, '%m/%d/%y').strftime('%Y-%m-%d')
                end_date = datetime.datetime.strptime(end_date_str, '%m/%d/%y').strftime('%Y-%m-%d')
            except ValueError:
                # If both fail, set to NaN
                start_date, end_date = np.nan, np.nan
    else:
        start_date, end_date = np.nan, np.nan
    return start_date, end_date

def test_get_dates_from_campaign_name():
    campaign_formats = [
        "Traffic_Array Variety Show_3/1/2024-3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024-3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024-3/31/2024",
        "Traffic_Array Variety Show_3/1/2024-3/31/2024)",
        "Traffic_Array Variety Show_3/1/2024 - 3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024 - 3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024 - 3/31/2024",
        "Traffic_Array Variety Show_3/1/2024 - 3/31/2024)",
        "Traffic_Array Variety Show_3/1/2024- 3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024- 3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024- 3/31/2024",
        "Traffic_Array Variety Show_3/1/2024- 3/31/2024)",
        "Traffic_Array Variety Show_3/1/2024 -3/31/2024",
        "Traffic_Array Variety Show_(3/1/2024 -3/31/2024)",
        "Traffic_Array Variety Show_(3/1/2024 -3/31/2024",
        "Traffic_Array Variety Show_3/1/2024 -3/31/2024)"
    ]

    expected_result = ('2024-03-01', '2024-03-31')
    for campaign in campaign_formats:
        row = pd.Series({'Campaign': campaign})
        assert get_dates_from_campaign_name(row) == expected_result, f"Test failed for format: {campaign}"

# Run the unit test
test_get_dates_from_campaign_name()

# parse out Start/End Date from Campaign name
for index, row in inputDf.iterrows():
    start_date, end_date = get_dates_from_campaign_name(row)
    inputDf.at[index, TMP_START_DATE] = start_date
    inputDf.at[index, TMP_END_DATE] = end_date

print("inputDf with parsed tags", tableize(inputDf))

# find changed campaigns
changed = ((inputDf[TMP_START_DATE].notnull() & (inputDf[RPT_COL_PACING_START_DATE] != inputDf[TMP_START_DATE])) | \
          (inputDf[TMP_END_DATE].notnull() & (inputDf[RPT_COL_PACING_END_DATE] != inputDf[TMP_END_DATE])))

print("== Campaigns with Changed Tag ==", tableize(inputDf.loc[changed]))

# only select changed rows
cols = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, TMP_START_DATE, TMP_END_DATE]
outputDf = inputDf.loc[ changed, cols ].copy() \
                  .rename(columns = { \
                    TMP_START_DATE: BULK_COL_PACING_START_DATE, \
                    TMP_END_DATE: BULK_COL_PACING_END_DATE \
                  })

if not outputDf.empty:
  print("outputDf", tableize(outputDf))
else:
  print("Empty outputDf")
  outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_PACING_START_DATE, BULK_COL_PACING_END_DATE])


Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus