Script 1299: Benchmark Assignments from Gsheet
Purpose
The Python script updates benchmark dimensions in Marin by copying benchmarks from a staging Google Sheet, matching them with strategies using the ‘Abbreviation’ column.
To Elaborate
The script is designed to facilitate the process of updating benchmark dimensions in Marin by extracting data from a staging Google Sheet. It specifically uses the ‘Abbreviation’ column in the Google Sheet to match with the ‘Strategy’ column in the data source. The script processes data locally or on a server, depending on the execution environment, and requires the user to configure certain parameters for local execution. The primary task is to match strategies from the input data with those in the Google Sheet and update the corresponding benchmark values. If a strategy does not have a matching benchmark, the script handles this by either assigning default values or leaving them as NaN. This ensures that the benchmark data in Marin is accurately updated based on the latest information from the Google Sheet.
Walking Through the Code
- Local Mode Configuration:
- The script begins with a configuration section for local execution, where the user can set a flag to download preview input and specify the path to a pickle file containing data source information.
- It checks if the script is running on a server or locally by attempting to access a specific object,
dataSourceDict
.
- Data Loading and Initialization:
- If running locally, the script loads the
dataSourceDict
from a specified pickle file and prints the shape and first few rows of each data entry. - The script sets up the output DataFrame (
outputDf
) to be a copy of the input DataFrame (inputDf
) and initializes benchmark columns with placeholder values.
- If running locally, the script loads the
- Benchmark Data Processing:
- The script extracts relevant columns from the ‘Benchmark Upload’ sheet in the Google Sheet, renaming them to match the expected column names.
- It converts these columns to numeric types, handling any conversion errors gracefully.
- Benchmark Assignment:
- For each row in the input DataFrame, the script matches the ‘Strategy’ with the corresponding row in the benchmark data.
- If a match is found, it updates the output DataFrame with the benchmark values, rounding them to two decimal places.
- If no match is found, it logs a message and assigns NaN or default values to the benchmark columns.
- Output and Debugging:
- The script prints the first few rows of the output DataFrame for verification.
- If running in local development mode, it writes the output and debug data to CSV files for further inspection.
Vitals
- Script ID : 1299
- Client ID / Customer ID: 1306926629 / 60270083
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Gross Lead - Benchmark - High, Gross Lead - Benchmark - Medium, Gross Lead - Benchmark - Low, Intv Rate - Benchmark, CPL - Benchmark, Campaign ID
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-07-26 19:50
- Last Updated by dwaidhas@marinsoftware.com on 2024-08-20 15:40
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
##
## name: Benchmark Assignments from Gsheet
## description:
## Copy Benchmarks from staging GSheets to update Benchmark Dimensions in Marin.
## Use GSheets 'Abbrevation' column to match with 'Strategy'.
##
## author: Dana Waidhas
## created: 2024-07-26
##
########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/dwaidhas/Downloads/pickle/allcampus_'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict
# determine if code is running on server or locally
def is_executing_on_server():
try:
# Attempt to access a known restricted builtin
dict_items = dataSourceDict.items()
return True
except NameError:
# NameError: dataSourceDict object is missing (indicating not on server)
return False
local_dev = False
if is_executing_on_server():
print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
print("Code is NOT executing on server. Doing init.")
local_dev = True
# load dataSourceDict via pickled file
import pickle
dataSourceDict = pickle.load(open(pickle_path, 'rb'))
# print shape and first 5 rows for each entry in dataSourceDict
for key, value in dataSourceDict.items():
print(f"Shape of dataSourceDict[{key}]: {value.shape}")
print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")
# set outputDf same as inputDf
outputDf = dataSourceDict["1"].copy()
# setup timezone
import datetime
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=+8))
# import pandas
import pandas as pd
import numpy as np
# import Marin util functions
from marin_scripts_utils import tableize, select_changed
else:
print("Running locally but no pickle path defined. dataSourceDict not loaded.")
exit(1)
########### END - Local Mode Setup ###########
# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_GROSS_LEAD__BENCHMARK__HIGH = 'Gross Lead - Benchmark - High'
RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM = 'Gross Lead - Benchmark - Medium'
RPT_COL_GROSS_LEAD__BENCHMARK__LOW = 'Gross Lead - Benchmark - Low'
RPT_COL_CPL__BENCHMARK = 'CPL - Benchmark'
RPT_COL_INTV_RATE__BENCHMARK = 'Intv Rate - Benchmark'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_CAMPAIGN_ID = 'Campaign ID'
BULK_COL_CPL__BENCHMARK = 'CPL - Benchmark'
BULK_COL_GROSS_LEAD__BENCHMARK__HIGH = 'Gross Lead - Benchmark - High'
BULK_COL_GROSS_LEAD__BENCHMARK__LOW = 'Gross Lead - Benchmark - Low'
BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM = 'Gross Lead - Benchmark - Medium'
BULK_COL_INTV_RATE__BENCHMARK = 'Intv Rate - Benchmark'
outputDf[BULK_COL_CPL__BENCHMARK] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__HIGH] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__LOW] = "<<YOUR VALUE>>"
outputDf[BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM] = "<<YOUR VALUE>>"
outputDf[BULK_COL_INTV_RATE__BENCHMARK] = "<<YOUR VALUE>>"
########### User Code Starts Here ###########
# "Benchmark Upload" is the fourth sheet, so datasource key is 2_4
# "DIM1" is column B, which holds Gross Lead - Benchmark - High
# "DIM2" is column C, which holds Gross Lead - Benchmark - Medium
# "DIM3" is column D, which holds Gross Lead - Benchmark - Low
# "DIM4" is column E, which holds CPL Target
# "DIM5" is column F, which holds Intv Rate Target
# "Digital Ref" is column A, which holds Strategy
Benchmark_Upload = dataSourceDict['2_4'] \
.loc[0:, ['A', 'B', 'C', 'D', 'E', 'F']] \
.rename(columns={
'A': RPT_COL_STRATEGY,
'B': RPT_COL_GROSS_LEAD__BENCHMARK__HIGH,
'C': RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM,
'D': RPT_COL_GROSS_LEAD__BENCHMARK__LOW,
'E': RPT_COL_CPL__BENCHMARK,
'F': RPT_COL_INTV_RATE__BENCHMARK
})
# Convert columns to numeric, handling any errors
Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH] = pd.to_numeric(Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH], errors='coerce')
Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM] = pd.to_numeric(Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM], errors='coerce')
Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__LOW] = pd.to_numeric(Benchmark_Upload[RPT_COL_GROSS_LEAD__BENCHMARK__LOW], errors='coerce')
Benchmark_Upload[RPT_COL_CPL__BENCHMARK] = pd.to_numeric(Benchmark_Upload[RPT_COL_CPL__BENCHMARK], errors='coerce')
Benchmark_Upload[RPT_COL_INTV_RATE__BENCHMARK] = pd.to_numeric(Benchmark_Upload[RPT_COL_INTV_RATE__BENCHMARK], errors='coerce')
# Match 'Strategy' from inputDf with 'Strategy' in Benchmark_Upload and assign benchmarks
for index, row in inputDf.iterrows():
strategy = row[RPT_COL_STRATEGY]
matching_row = Benchmark_Upload[Benchmark_Upload[RPT_COL_STRATEGY] == strategy]
if not matching_row.empty:
outputDf.loc[index] = {
BULK_COL_ACCOUNT: row[RPT_COL_ACCOUNT],
BULK_COL_CAMPAIGN: row[RPT_COL_CAMPAIGN],
BULK_COL_CAMPAIGN_ID: row[RPT_COL_CAMPAIGN_ID],
BULK_COL_CPL__BENCHMARK: round(matching_row[RPT_COL_CPL__BENCHMARK].values[0], 2),
BULK_COL_GROSS_LEAD__BENCHMARK__HIGH: round(matching_row[RPT_COL_GROSS_LEAD__BENCHMARK__HIGH].values[0], 2),
BULK_COL_GROSS_LEAD__BENCHMARK__LOW: round(matching_row[RPT_COL_GROSS_LEAD__BENCHMARK__LOW].values[0], 2),
BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM: round(matching_row[RPT_COL_GROSS_LEAD__BENCHMARK__MEDIUM].values[0], 2),
BULK_COL_INTV_RATE__BENCHMARK: round(matching_row[RPT_COL_INTV_RATE__BENCHMARK].values[0], 2)
}
else:
print(f"No matching benchmark found for strategy: {strategy}")
# Optionally, you can add default values or handle it differently
outputDf.loc[index] = {
BULK_COL_ACCOUNT: row[RPT_COL_ACCOUNT],
BULK_COL_CAMPAIGN: row[RPT_COL_CAMPAIGN],
BULK_COL_CAMPAIGN_ID: row[RPT_COL_CAMPAIGN_ID],
BULK_COL_CPL__BENCHMARK: np.nan, # or some default value
BULK_COL_GROSS_LEAD__BENCHMARK__HIGH: np.nan, # or some default value
BULK_COL_GROSS_LEAD__BENCHMARK__LOW: np.nan, # or some default value
BULK_COL_GROSS_LEAD__BENCHMARK__MEDIUM: np.nan, # or some default value
BULK_COL_INTV_RATE__BENCHMARK: np.nan # or some default value
}
# Display the first few rows of the output DataFrame
print(outputDf.head())
## local debug
if local_dev:
output_filename = 'outputDf.csv'
outputDf.to_csv(output_filename, index=False)
print(f"Local Dev: Output written to: {output_filename}")
debug_filename = 'debugDf.csv'
debugDf.to_csv(debug_filename, index=False)
print(f"Local Dev: Debug written to: {debug_filename}")
Post generated on 2024-11-27 06:58:46 GMT