Script 1045: Auto Tag Campaigns with Program Manager Dim
Purpose:
The Python script automates the process of tagging campaigns with the appropriate program manager based on data from Google Sheets.
To Elaborate
The script is designed to automate the tagging of campaigns with the correct program manager by utilizing data from a Google Sheets document. It extracts program manager information from a specific sheet and column, cleans the data by removing duplicates and ensuring consistency in formatting, and then merges this information with existing campaign data. The goal is to ensure that each campaign is accurately tagged with the corresponding program manager, facilitating better management and reporting. The script handles potential data issues such as duplicate assignments and missing values, ensuring that the final output is clean and reliable.
Walking Through the Code
- Initialization and Configuration:
- The script begins by determining whether it is running on a server or locally. If running locally, it loads data from a specified pickle file.
- It imports necessary libraries such as
pandas
,numpy
, and utility functions frommarin_scripts_utils
.
- Data Extraction and Cleaning:
- The script extracts program manager data from a specific sheet and columns in the Google Sheets document.
- It cleans the data by removing rows with missing values and standardizing the format of the ‘School Program’ column to uppercase.
- Duplicate entries are identified and removed, ensuring each school program is assigned to only one program manager.
- Data Merging and Processing:
- The cleaned program manager data is merged with the campaign data based on the ‘School Program’ column.
- The script uses the
select_changed
function to identify changes and prepare the final output, ensuring that only relevant columns are included.
- Output Generation:
- If running locally, the script writes the output and debug data to CSV files for further analysis or debugging.
Vitals
- Script ID : 1045
- Client ID / Customer ID: 1306926629 / 60270083
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Program Manager, Campaign ID
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-05-02 21:08
- Last Updated by Michael Huang on 2024-11-26 12:38
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
##
## name: Program Manager Tagging - GSheets - All Campus
## description:
## Use "DMM" column from Sheet "Dynamic" to tag "Program Manager" at Campaign-level
##
## author: Michael S. Huang, Dana Waidhas
## created: 2024-07-24
##
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/allcampus_campaign_program_managaer_tagging_20241126.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
outputDf = dataSourceDict["1"].copy()
# setup timezone
import datetime
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))
# import pandas
import pandas as pd
import numpy as np
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
# dial forward to preview next month budgets
# CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))
# today in client timezone
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SCHOOL_PROGRAM = 'School_Program'
RPT_COL_PROGRAM_MANAGER = 'Program Manager'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PROGRAM_MANAGER = 'Program Manager'
BULK_COL_CAMPAIGN_ID = 'Campaign ID'
########### User Code Starts Here ###########
# "Dynamic" is the second sheet, so datasource key is 2_2
# "DIMM" is column D, which holds Program Manager names
# "Digital Ref" is column E, which holds School Program
program_managers = dataSourceDict['2_2'] \
.loc[0:, ['E', 'D']] \
.rename(columns={ \
'E' : RPT_COL_SCHOOL_PROGRAM, \
'D' : RPT_COL_PROGRAM_MANAGER \
})
# print("program_managers.info", program_managers.info())
print("program_managers.shape", program_managers.shape)
### cleanup values from GSheets
# remove rows without School Program or Program Manager
program_managers = program_managers.loc[program_managers[RPT_COL_SCHOOL_PROGRAM].notnull() & program_managers[RPT_COL_PROGRAM_MANAGER].notnull()]
# make School Program all caps and strip extra whitespace
program_managers[RPT_COL_SCHOOL_PROGRAM] = program_managers[RPT_COL_SCHOOL_PROGRAM].str.strip().str.upper()
print("program_managers.shape after cleanup", program_managers.shape)
## find dupe and remove
# Find rows where the same SCHOOL_PROGRAM is assigned to more than one PROGRAM_MANAGER
same_assignment_dupes = program_managers.duplicated(subset=[RPT_COL_SCHOOL_PROGRAM,RPT_COL_PROGRAM_MANAGER], keep='first')
deduped = program_managers[~same_assignment_dupes]
school_program_multiple_managers = deduped[deduped.duplicated(subset=[RPT_COL_SCHOOL_PROGRAM], keep=False)]
# Print out rows where the same SCHOOL_PROGRAM is assigned to more than one PROGRAM_MANAGER
if not school_program_multiple_managers.empty:
print("Same SCHOOL_PROGRAM assigned to more than one PROGRAM_MANAGER:")
print(school_program_multiple_managers)
# actually dedupe; if dupe, keep first assignment only
assignment_dupes = program_managers.duplicated(subset=[RPT_COL_SCHOOL_PROGRAM], keep='first')
program_managers_dedupe = program_managers.loc[~assignment_dupes]
print("after cleanup gsheets")
print("program_managers_dedupe.shape", program_managers_dedupe.shape)
print("program_managers_dedupe sample", program_managers_dedupe.head(5))
### Cleanup input
# only keep rows with both School Program and Program Manager
inputDf = inputDf.loc[inputDf[RPT_COL_SCHOOL_PROGRAM].notnull()]
# make School Program all caps
inputDf[RPT_COL_SCHOOL_PROGRAM] = inputDf[RPT_COL_SCHOOL_PROGRAM].str.strip().str.upper()
print("after cleanup inputDf")
print("inputDf.shape", inputDf.shape)
# make copy of input for use with select_changed
originalDf = inputDf.copy()
### Merge inputDf with program_managers
mergedDf = inputDf.merge(program_managers_dedupe, on=RPT_COL_SCHOOL_PROGRAM, how='left', suffixes=('_old', '')) \
.fillna(value={RPT_COL_PROGRAM_MANAGER: ''})
print("mergedDf shape", mergedDf.shape)
print("mergedDf", mergedDf.tail(5))
outputDf, debugDf = select_changed(mergedDf,
originalDf,
diff_cols=[RPT_COL_PROGRAM_MANAGER],
select_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_PROGRAM_MANAGER, RPT_COL_CAMPAIGN_ID],
merged_cols=[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN_ID]
)
print("outputDf.shape", outputDf.shape)
print("outputDf sample")
print(outputDf.tail(10))
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
Post generated on 2025-03-11 01:25:51 GMT