Script 561: Pacing Start & End Date
Purpose:
The Python script extracts and updates pacing start and end dates from campaign names in a dataset.
To Elaborate
The Python script is designed to parse campaign names to extract pacing start and end dates, which are then formatted into ISO date format (YYYY-MM-DD). It identifies campaigns with changed pacing dates and updates the dataset accordingly. The script handles various campaign name formats, including those with optional spaces and parentheses, ensuring robust date extraction. The primary goal is to maintain accurate pacing information for campaigns by detecting changes and updating the relevant fields in the dataset.
Walking Through the Code
- Initialization and Setup:
- The script begins by defining constants for column names related to campaigns and pacing dates.
- Temporary columns are created in the input DataFrame to store parsed start and end dates, initialized with NaN values.
- Date Extraction Function:
- A function
get_dates_from_campaign_name
is defined to extract start and end dates from the campaign name using regular expressions. - The function attempts to parse dates in both four-digit and two-digit year formats, defaulting to NaN if parsing fails.
- A function
- Testing Date Extraction:
- A test function
test_get_dates_from_campaign_name
is implemented to verify the date extraction logic against various campaign name formats. - The test ensures that the function correctly parses dates and handles different formatting scenarios.
- A test function
- Parsing and Updating DataFrame:
- The script iterates over each row in the input DataFrame, applying the date extraction function to populate the temporary columns with parsed dates.
- It identifies campaigns with changed pacing dates by comparing parsed dates with existing ones.
- Output Preparation:
- Changed campaigns are selected, and their relevant columns are copied to a new DataFrame, with temporary columns renamed to bulk column names.
- The script checks if the output DataFrame is empty and prints the results accordingly.
Vitals
- Script ID : 561
- Client ID / Customer ID: 1306927179 / 60270139
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Pacing - End Date, Pacing - Start Date
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
- Created by ascott@marinsoftware.com on 2023-12-04 16:14
- Last Updated by ascott@marinsoftware.com on 2024-07-18 13:55
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
#
# Add Dimensions Tag based on Campaign Name
# * `Pacing - Start Date`
# * `Pacing - End Date`
# Example: `Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)`
#
# Author: Michael S. Huang
# Created: 2023-11-25
#
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PACING_START_DATE = 'Pacing - Start Date'
BULK_COL_PACING_END_DATE = 'Pacing - End Date'
outputDf[BULK_COL_PACING_START_DATE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_PACING_END_DATE] = "<<YOUR VALUE>>"
# create temp column to store new dim tag and default to empty
TMP_START_DATE = RPT_COL_PACING_START_DATE + '_'
TMP_END_DATE = RPT_COL_PACING_END_DATE + '_'
inputDf[TMP_START_DATE] = np.nan
inputDf[TMP_END_DATE] = np.nan
# define function to parse out Start Date and End Date from `Campaign` column of row
# output should be in ISO format YYYY-MM-DD
def get_dates_from_campaign_name(row):
campaign_name = row['Campaign']
# Updated regex to handle optional spaces, parentheses, and missing closing parenthesis
date_pattern_full = r'(\d{1,2}/\d{1,2}/\d{2,4})\s*-\s*?(\d{1,2}/\d{1,2}/\d{2,4})'
match_full = re.search(date_pattern_full, campaign_name)
if match_full:
start_date_str, end_date_str = match_full.groups()
# Try parsing with four-digit year format
try:
start_date = datetime.datetime.strptime(start_date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
end_date = datetime.datetime.strptime(end_date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
except ValueError:
# If it fails, try parsing with two-digit year format
try:
start_date = datetime.datetime.strptime(start_date_str, '%m/%d/%y').strftime('%Y-%m-%d')
end_date = datetime.datetime.strptime(end_date_str, '%m/%d/%y').strftime('%Y-%m-%d')
except ValueError:
# If both fail, set to NaN
start_date, end_date = np.nan, np.nan
else:
start_date, end_date = np.nan, np.nan
return start_date, end_date
def test_get_dates_from_campaign_name():
campaign_formats = [
"Traffic_Array Variety Show_3/1/2024-3/31/2024",
"Traffic_Array Variety Show_(3/1/2024-3/31/2024)",
"Traffic_Array Variety Show_(3/1/2024-3/31/2024",
"Traffic_Array Variety Show_3/1/2024-3/31/2024)",
"Traffic_Array Variety Show_3/1/2024 - 3/31/2024",
"Traffic_Array Variety Show_(3/1/2024 - 3/31/2024)",
"Traffic_Array Variety Show_(3/1/2024 - 3/31/2024",
"Traffic_Array Variety Show_3/1/2024 - 3/31/2024)",
"Traffic_Array Variety Show_3/1/2024- 3/31/2024",
"Traffic_Array Variety Show_(3/1/2024- 3/31/2024)",
"Traffic_Array Variety Show_(3/1/2024- 3/31/2024",
"Traffic_Array Variety Show_3/1/2024- 3/31/2024)",
"Traffic_Array Variety Show_3/1/2024 -3/31/2024",
"Traffic_Array Variety Show_(3/1/2024 -3/31/2024)",
"Traffic_Array Variety Show_(3/1/2024 -3/31/2024",
"Traffic_Array Variety Show_3/1/2024 -3/31/2024)"
]
expected_result = ('2024-03-01', '2024-03-31')
for campaign in campaign_formats:
row = pd.Series({'Campaign': campaign})
assert get_dates_from_campaign_name(row) == expected_result, f"Test failed for format: {campaign}"
# Run the unit test
test_get_dates_from_campaign_name()
# parse out Start/End Date from Campaign name
for index, row in inputDf.iterrows():
start_date, end_date = get_dates_from_campaign_name(row)
inputDf.at[index, TMP_START_DATE] = start_date
inputDf.at[index, TMP_END_DATE] = end_date
print("inputDf with parsed tags", tableize(inputDf))
# find changed campaigns
changed = ((inputDf[TMP_START_DATE].notnull() & (inputDf[RPT_COL_PACING_START_DATE] != inputDf[TMP_START_DATE])) | \
(inputDf[TMP_END_DATE].notnull() & (inputDf[RPT_COL_PACING_END_DATE] != inputDf[TMP_END_DATE])))
print("== Campaigns with Changed Tag ==", tableize(inputDf.loc[changed]))
# only select changed rows
cols = [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, TMP_START_DATE, TMP_END_DATE]
outputDf = inputDf.loc[ changed, cols ].copy() \
.rename(columns = { \
TMP_START_DATE: BULK_COL_PACING_START_DATE, \
TMP_END_DATE: BULK_COL_PACING_END_DATE \
})
if not outputDf.empty:
print("outputDf", tableize(outputDf))
else:
print("Empty outputDf")
outputDf = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_PACING_START_DATE, BULK_COL_PACING_END_DATE])
Post generated on 2025-03-11 01:25:51 GMT