Script 537: Tag Pacing Start and End Dates from Campaign Name
Purpose
Parse out and populate Pacing - Start Date
and Pacing - End Date
in ISO format from the Campaign
column of a DataFrame.
To Elaborate
The Python script aims to extract the start and end dates from the campaign names in the Campaign
column of a DataFrame. It then populates the Pacing - Start Date
and Pacing - End Date
columns with the extracted dates in ISO format (YYYY-MM-DD). The script also identifies campaigns with changed tags and creates a new DataFrame with the relevant information.
Walking Through the Code
- The script starts by setting up the local mode configuration, including the flag to download the preview input and the path to the pickle file containing the
dataSourceDict
object. - It checks whether the code is executing on the server or locally and loads the
dataSourceDict
object if running locally. - The script imports the necessary libraries and sets up the required constants and column names.
- It creates a temporary column in the input DataFrame to store the new dimension tags and initializes it with NaN values.
- The script defines a function
get_dates_from_campaign_name
to parse the start and end dates from the campaign name using regular expressions. - A unit test is performed on the
get_dates_from_campaign_name
function to ensure its correctness. - The script iterates over each row in the input DataFrame and calls the
get_dates_from_campaign_name
function to extract the dates. It then updates the temporary columns with the extracted dates. - The input DataFrame with the parsed tags is displayed.
- The script identifies campaigns with changed tags by comparing the values in the temporary columns with the existing
Pacing - Start Date
andPacing - End Date
columns. - The campaigns with changed tags are selected, and the relevant columns are copied to the output DataFrame, renaming the temporary columns to the desired column names.
- If the output DataFrame is not empty, it is displayed; otherwise, an empty DataFrame with the required columns is created.
Vitals
- Script ID : 537
- Client ID / Customer ID: 1306926843 / 60270139
- Action Type: Bulk Upload (Preview)
- Item Changed: Campaign
- Output Columns: Account, Campaign, Pacing - Start Date, Pacing - End Date
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
- Created by ascott@marinsoftware.com on 2023-11-20 18:21
- Last Updated by ascott@marinsoftware.com on 2024-03-07 18:34
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
#
# Add Dimensions Tag based on Campaign Name
# * `Pacing - Start Date`
# * `Pacing - End Date`
# Examples:
# * `Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)`
# * `Search - Casino Gambling - General Gambling (10/30/2023-12/31/2023)`
# * `TrueView - NUPAC - Quit Now (10/3/2023 -4/28/2024)`
#
# Author: Michael S. Huang
# Created: 2023-11-25
#
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=True
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/infinite_digital_datasource_dict_1701749926621.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
# print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
outputDf = dataSourceDict["1"].copy()
# setup timezone
import datetime
# Chicago Timezone is GMT-5. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))
# import pandas
import pandas as pd
import numpy as np
import re
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
inputDf = dataSourceDict["1"].copy()
RPT_COL_CLIENT = 'Client'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PACING_START_DATE = 'Pacing - Start Date'
BULK_COL_PACING_END_DATE = 'Pacing - End Date'
outputDf[BULK_COL_PACING_START_DATE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_PACING_END_DATE] = "<<YOUR VALUE>>"
# create temp column to store new dim tag and default to empty
TMP_START_DATE = RPT_COL_PACING_START_DATE + '_'
TMP_END_DATE = RPT_COL_PACING_END_DATE + '_'
inputDf[TMP_START_DATE] = np.nan
inputDf[TMP_END_DATE] = np.nan
# define function to parse out Start Date and End Date from `Campaign` column of row
# output should be in ISO format YYYY-MM-DD
# Examples Campaign Names:
# * `Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)`
# * `Search - Casino Gambling - General Gambling (10/30/2023-12/31/2023)`
# * `TrueView - NUPAC - Quit Now (10/3/2023 -4/28/2024)`
def get_dates_from_campaign_name(row):
campaign_name = row[RPT_COL_CAMPAIGN]
# Extract dates in the format (MM/DD/YYYY - MM/DD/YYYY), where spaces before and after hyphen are optional
date_pattern_full = r'\((\d{1,2}/\d{1,2}/\d{4})\s*-\s*(\d{1,2}/\d{1,2}/\d{4})\)'
date_pattern_partial = r'\((\d{1,2}/\d{1,2}/\d{4})\)'
match_full = re.search(date_pattern_full, campaign_name)
match_partial = re.search(date_pattern_partial, campaign_name)
if match_full:
start_date, end_date = match_full.groups()
# Convert dates to ISO format
start_date = datetime.datetime.strptime(start_date, '%m/%d/%Y').strftime('%Y-%m-%d')
end_date = datetime.datetime.strptime(end_date, '%m/%d/%Y').strftime('%Y-%m-%d')
elif match_partial:
start_date = match_partial.group(1)
# Convert date to ISO format
start_date = datetime.datetime.strptime(start_date, '%m/%d/%Y').strftime('%Y-%m-%d')
end_date = np.nan
else:
start_date, end_date = np.nan, np.nan
return start_date, end_date
# Unit test for the function `get_dates_from_campaign_name`
def test_get_dates_from_campaign_name():
# Test case: Campaign name with dates containing spaces
row = pd.Series({'Campaign': 'Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)'})
assert get_dates_from_campaign_name(row) == ('2023-02-01', '2024-01-31')
# Test case: Campaign name with dates without spaces
row = pd.Series({'Campaign': 'Melanie McLendon - SBD - (2/16/2023-7/31/2023)'})
assert get_dates_from_campaign_name(row) == ('2023-02-16', '2023-07-31')
# Test case: Campaign name with only start date
row = pd.Series({'Campaign': 'Search - Display Campaign & Google Ad Words (2/1/2023)'})
assert get_dates_from_campaign_name(row) == ('2023-02-01', np.nan)
# Test case: Campaign name without dates
row = pd.Series({'Campaign': 'Search - Display Campaign & Google Ad Words'})
assert get_dates_from_campaign_name(row) == (np.nan, np.nan)
# Run the unit test
test_get_dates_from_campaign_name()
# parse out Start/End Date from Campaign name
for index, row in inputDf.iterrows():
start_date, end_date = get_dates_from_campaign_name(row)
inputDf.at[index, TMP_START_DATE] = start_date
inputDf.at[index, TMP_END_DATE] = end_date
print("inputDf with parsed tags", tableize(inputDf.tail(5)))
# find changed campaigns
changed = ((inputDf[TMP_START_DATE].notnull() & (inputDf[RPT_COL_PACING_START_DATE] != inputDf[TMP_START_DATE])) | \
(inputDf[TMP_END_DATE].notnull() & (inputDf[RPT_COL_PACING_END_DATE] != inputDf[TMP_END_DATE])))
print("== Campaigns with Changed Tag ==", tableize(inputDf.loc[changed].head(5)))
# only select changed rows
cols = [RPT_COL_CLIENT, RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, TMP_START_DATE, TMP_END_DATE]
outputDf = inputDf.loc[ changed, cols ].copy() \
.rename(columns = { \
TMP_START_DATE: BULK_COL_PACING_START_DATE, \
TMP_END_DATE: BULK_COL_PACING_END_DATE \
})
if not outputDf.empty:
print("outputDf", tableize(outputDf.head(5)))
else:
print("Empty outputDf")
outputDf = pd.DataFrame(columns=[RPT_COL_CLIENT, BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_PACING_START_DATE, BULK_COL_PACING_END_DATE])
Post generated on 2024-03-10 06:34:12 GMT