Script 1045: Auto Tag Campaigns with Program Manager Dim
Purpose
The Python script automates the process of tagging campaigns with program manager information using data from Google Sheets.
To Elaborate
The script is designed to automate the tagging of campaigns with program manager information at the campaign level. It utilizes data from a Google Sheet, specifically from a sheet named “Dynamic,” to extract program manager names and school programs. The script processes this data to ensure that each school program is associated with a single program manager, removing any duplicate assignments. It then merges this cleaned data with an existing dataset of campaigns, updating the campaign records with the appropriate program manager information. This process helps streamline the management of campaign data by ensuring that each campaign is accurately tagged with the relevant program manager, facilitating better organization and oversight.
Walking Through the Code
- Initialization and Setup
- The script begins by determining whether it is running on a server or locally. If running locally, it loads a pickled data source dictionary (
dataSourceDict
) from a specified file path. - The script imports necessary libraries such as
pandas
,numpy
, and utility functions frommarin_scripts_utils
.
- The script begins by determining whether it is running on a server or locally. If running locally, it loads a pickled data source dictionary (
- Data Extraction and Cleanup
- The script extracts program manager and school program data from the “Dynamic” sheet in the Google Sheets data source.
- It cleans the data by removing rows with missing values and standardizing the school program names to uppercase.
- Duplicate Handling
- The script identifies and handles duplicate assignments where a school program is associated with multiple program managers, ensuring only the first assignment is kept.
- Data Merging and Output Preparation
- The cleaned program manager data is merged with the input campaign data to update the campaign records with the correct program manager information.
- The script uses the
select_changed
function to identify changes and prepare the final output dataset.
- Local Development and Debugging
- If running in local development mode, the script writes the output and debug data to CSV files for further inspection.
Vitals
- Script ID : 1045
- Client ID / Customer ID: 1306926629 / 60270083
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Program Manager, Campaign ID
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-05-02 21:08
- Last Updated by Michael Huang on 2024-11-26 12:38
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
##
## name: Program Manager Tagging - GSheets - All Campus
## description:
## Use "DMM" column from Sheet "Dynamic" to tag "Program Manager" at Campaign-level
##
## author: Michael S. Huang, Dana Waidhas
## created: 2024-07-24
##
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/allcampus_campaign_program_managaer_tagging_20241126.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
outputDf = dataSourceDict["1"].copy()
# setup timezone
import datetime
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))
# import pandas
import pandas as pd
import numpy as np
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
# dial forward to preview next month budgets
# CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))
# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SCHOOL_PROGRAM = 'School_Program'
RPT_COL_PROGRAM_MANAGER = 'Program Manager'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PROGRAM_MANAGER = 'Program Manager'
BULK_COL_CAMPAIGN_ID = 'Campaign ID'
########### User Code Starts Here ###########
# "Dynamic" is the second sheet, so datasource key is 2_2
# "DIMM" is column D, which holds Program Manager names
# "Digital Ref" is column E, which holds School Program
program_managers = dataSourceDict['2_2'] \
.loc[0:, ['E', 'D']] \
.rename(columns={ \
'E' : RPT_COL_SCHOOL_PROGRAM, \
'D' : RPT_COL_PROGRAM_MANAGER \
})
# print("program_managers.info", program_managers.info())
print("program_managers.shape", program_managers.shape)
### cleanup values from GSheets
# remove rows without School Program or Program Manager
program_managers = program_managers.loc[program_managers[RPT_COL_SCHOOL_PROGRAM].notnull() & program_managers[RPT_COL_PROGRAM_MANAGER].notnull()]
# make School Program all caps and strip extra whitespace
program_managers[RPT_COL_SCHOOL_PROGRAM] = program_managers[RPT_COL_SCHOOL_PROGRAM].str.strip().str.upper()
print("program_managers.shape after cleanup", program_managers.shape)
## find dupe and remove
# Find rows where the same SCHOOL_PROGRAM is assigned to more than one PROGRAM_MANAGER
same_assignment_dupes = program_managers.duplicated(subset=[RPT_COL_SCHOOL_PROGRAM,RPT_COL_PROGRAM_MANAGER], keep='first')
deduped = program_managers[~same_assignment_dupes]
school_program_multiple_managers = deduped[deduped.duplicated(subset=[RPT_COL_SCHOOL_PROGRAM], keep=False)]
# Print out rows where the same SCHOOL_PROGRAM is assigned to more than one PROGRAM_MANAGER
if not school_program_multiple_managers.empty:
print("Same SCHOOL_PROGRAM assigned to more than one PROGRAM_MANAGER:")
print(school_program_multiple_managers)
# actually dedupe; if dupe, keep first assignment only
assignment_dupes = program_managers.duplicated(subset=[RPT_COL_SCHOOL_PROGRAM], keep='first')
program_managers_dedupe = program_managers.loc[~assignment_dupes]
print("after cleanup gsheets")
print("program_managers_dedupe.shape", program_managers_dedupe.shape)
print("program_managers_dedupe sample", program_managers_dedupe.head(5))
### Cleanup input
# only keep rows with both School Program and Program Manager
inputDf = inputDf.loc[inputDf[RPT_COL_SCHOOL_PROGRAM].notnull()]
# make School Program all caps
inputDf[RPT_COL_SCHOOL_PROGRAM] = inputDf[RPT_COL_SCHOOL_PROGRAM].str.strip().str.upper()
print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
# make copy of input for use with select_changed
originalDf = inputDf.copy()
### Merge inputDf with program_managers
mergedDf = inputDf.merge(program_managers_dedupe, on=RPT_COL_SCHOOL_PROGRAM, how='left', suffixes=('_old', '')) \
.fillna(value={RPT_COL_PROGRAM_MANAGER: ''})
print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5))
outputDf, debugDf = select_changed(mergedDf,
originalDf,
diff_cols=[RPT_COL_PROGRAM_MANAGER],
select_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_PROGRAM_MANAGER, RPT_COL_CAMPAIGN_ID],
merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN_ID]
)
print("outputDf.shape", outputDf.shape)
print("outputDf sample")
print(outputDf.tail(10))
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
Post generated on 2024-11-27 06:58:46 GMT