Script 537: Tag Pacing Start and End Dates from Campaign Name

Purpose

Parse out and populate Pacing - Start Date and Pacing - End Date in ISO format from the Campaign column of a DataFrame.

To Elaborate

The Python script aims to extract the start and end dates from the campaign names in the Campaign column of a DataFrame. It then populates the Pacing - Start Date and Pacing - End Date columns with the extracted dates in ISO format (YYYY-MM-DD). The script also identifies campaigns with changed tags and creates a new DataFrame with the relevant information.

Walking Through the Code

  1. The script starts by setting up the local mode configuration, including the flag to download the preview input and the path to the pickle file containing the dataSourceDict object.
  2. It checks whether the code is executing on the server or locally and loads the dataSourceDict object if running locally.
  3. The script imports the necessary libraries and sets up the required constants and column names.
  4. It creates a temporary column in the input DataFrame to store the new dimension tags and initializes it with NaN values.
  5. The script defines a function get_dates_from_campaign_name to parse the start and end dates from the campaign name using regular expressions.
  6. A unit test is performed on the get_dates_from_campaign_name function to ensure its correctness.
  7. The script iterates over each row in the input DataFrame and calls the get_dates_from_campaign_name function to extract the dates. It then updates the temporary columns with the extracted dates.
  8. The input DataFrame with the parsed tags is displayed.
  9. The script identifies campaigns with changed tags by comparing the values in the temporary columns with the existing Pacing - Start Date and Pacing - End Date columns.
  10. The campaigns with changed tags are selected, and the relevant columns are copied to the output DataFrame, renaming the temporary columns to the desired column names.
  11. If the output DataFrame is not empty, it is displayed; otherwise, an empty DataFrame with the required columns is created.

Vitals

  • Script ID : 537
  • Client ID / Customer ID: 1306926843 / 60270139
  • Action Type: Bulk Upload (Preview)
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Pacing - Start Date, Pacing - End Date
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
  • Created by ascott@marinsoftware.com on 2023-11-20 18:21
  • Last Updated by ascott@marinsoftware.com on 2024-03-07 18:34
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
#
# Add Dimensions Tag based on Campaign Name
# * `Pacing - Start Date`
# * `Pacing - End Date`
# Examples: 
# * `Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)`
# * `Search - Casino Gambling - General Gambling (10/30/2023-12/31/2023)`
# * `TrueView -  NUPAC - Quit Now (10/3/2023 -4/28/2024)`
#
# Author: Michael S. Huang
# Created: 2023-11-25
#

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=True
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/infinite_digital_datasource_dict_1701749926621.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    outputDf = dataSourceDict["1"].copy()

    # setup timezone
    import datetime
    # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np
    import re

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

inputDf = dataSourceDict["1"].copy()
RPT_COL_CLIENT = 'Client'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PACING_START_DATE = 'Pacing - Start Date'
RPT_COL_PACING_END_DATE = 'Pacing - End Date'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PACING_START_DATE = 'Pacing - Start Date'
BULK_COL_PACING_END_DATE = 'Pacing - End Date'

outputDf[BULK_COL_PACING_START_DATE] = "<<YOUR VALUE>>"
outputDf[BULK_COL_PACING_END_DATE] = "<<YOUR VALUE>>"

# create temp column to store new dim tag and default to empty
TMP_START_DATE = RPT_COL_PACING_START_DATE + '_'
TMP_END_DATE = RPT_COL_PACING_END_DATE + '_'
inputDf[TMP_START_DATE] = np.nan
inputDf[TMP_END_DATE] = np.nan

# define function to parse out Start Date and End Date from `Campaign` column of row
# output should be in ISO format YYYY-MM-DD
# Examples Campaign Names:
# * `Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)`
# * `Search - Casino Gambling - General Gambling (10/30/2023-12/31/2023)`
# * `TrueView -  NUPAC - Quit Now (10/3/2023 -4/28/2024)`
def get_dates_from_campaign_name(row):
    campaign_name = row[RPT_COL_CAMPAIGN]
    # Extract dates in the format (MM/DD/YYYY - MM/DD/YYYY), where spaces before and after hyphen are optional
    date_pattern_full = r'\((\d{1,2}/\d{1,2}/\d{4})\s*-\s*(\d{1,2}/\d{1,2}/\d{4})\)'
    date_pattern_partial = r'\((\d{1,2}/\d{1,2}/\d{4})\)'
    match_full = re.search(date_pattern_full, campaign_name)
    match_partial = re.search(date_pattern_partial, campaign_name)
    if match_full:
        start_date, end_date = match_full.groups()
        # Convert dates to ISO format
        start_date = datetime.datetime.strptime(start_date, '%m/%d/%Y').strftime('%Y-%m-%d')
        end_date = datetime.datetime.strptime(end_date, '%m/%d/%Y').strftime('%Y-%m-%d')
    elif match_partial:
        start_date = match_partial.group(1)
        # Convert date to ISO format
        start_date = datetime.datetime.strptime(start_date, '%m/%d/%Y').strftime('%Y-%m-%d')
        end_date = np.nan
    else:
        start_date, end_date = np.nan, np.nan
    return start_date, end_date

# Unit test for the function `get_dates_from_campaign_name`
def test_get_dates_from_campaign_name():
    # Test case: Campaign name with dates containing spaces
    row = pd.Series({'Campaign': 'Search - Display Campaign & Google Ad Words (2/1/2023 - 1/31/2024)'})
    assert get_dates_from_campaign_name(row) == ('2023-02-01', '2024-01-31')

    # Test case: Campaign name with dates without spaces
    row = pd.Series({'Campaign': 'Melanie McLendon - SBD - (2/16/2023-7/31/2023)'})
    assert get_dates_from_campaign_name(row) == ('2023-02-16', '2023-07-31')

    # Test case: Campaign name with only start date
    row = pd.Series({'Campaign': 'Search - Display Campaign & Google Ad Words (2/1/2023)'})
    assert get_dates_from_campaign_name(row) == ('2023-02-01', np.nan)

    # Test case: Campaign name without dates
    row = pd.Series({'Campaign': 'Search - Display Campaign & Google Ad Words'})
    assert get_dates_from_campaign_name(row) == (np.nan, np.nan)


# Run the unit test
test_get_dates_from_campaign_name()

# parse out Start/End Date from Campaign name
for index, row in inputDf.iterrows():
    start_date, end_date = get_dates_from_campaign_name(row)
    inputDf.at[index, TMP_START_DATE] = start_date
    inputDf.at[index, TMP_END_DATE] = end_date

print("inputDf with parsed tags", tableize(inputDf.tail(5)))

# find changed campaigns
changed = ((inputDf[TMP_START_DATE].notnull() & (inputDf[RPT_COL_PACING_START_DATE] != inputDf[TMP_START_DATE])) | \
          (inputDf[TMP_END_DATE].notnull() & (inputDf[RPT_COL_PACING_END_DATE] != inputDf[TMP_END_DATE])))

print("== Campaigns with Changed Tag ==", tableize(inputDf.loc[changed].head(5)))

# only select changed rows
cols = [RPT_COL_CLIENT, RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, TMP_START_DATE, TMP_END_DATE]
outputDf = inputDf.loc[ changed, cols ].copy() \
                  .rename(columns = { \
                    TMP_START_DATE: BULK_COL_PACING_START_DATE, \
                    TMP_END_DATE: BULK_COL_PACING_END_DATE \
                  })

if not outputDf.empty:
  print("outputDf", tableize(outputDf.head(5)))
else:
  print("Empty outputDf")
  outputDf = pd.DataFrame(columns=[RPT_COL_CLIENT, BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_PACING_START_DATE, BULK_COL_PACING_END_DATE])


Post generated on 2024-03-10 06:34:12 GMT

comments powered by Disqus