Script 1023: Total Pub Cost & Pub Cost Percentage Column
Purpose
Python script to calculate the total pub cost for each ‘SBA Strategy’ and ‘Campaign’ group, and compute the percentage of ‘Pub. Cost $’ for each row relative to the total cost within its group.
To Elaborate
The Python script solves the problem of calculating the total pub cost and percentage of pub cost for each row in a dataset. The key business rules are as follows:
- The script clears existing data in the ‘Total $ Pub. Cost’ and ‘% of Total Pub. Cost’ columns.
- It calculates the total pub cost for each ‘SBA Strategy’ group and ‘Campaign’ group.
- It computes the percentage of ‘Pub. Cost $’ for each row relative to the total cost within its group.
- The script rounds the percentage to two decimal places.
- It replaces NaN values in the ‘% of Total Pub. Cost’ column with “N/A”.
- The script converts the ‘% of Total Pub. Cost’ column to strings.
- It appends a ‘%’ symbol to non-“N/A” values in the ‘% of Total Pub. Cost’ column.
- The output dataframe is displayed.
Walking Through the Code
- The script sets up the client timezone as Chicago Timezone (GMT-5).
- The input data is loaded.
- Column names are defined.
- The ‘Total $ Pub. Cost’ and ‘% of Total Pub. Cost’ columns are cleared.
- The script calculates the total pub cost for each ‘SBA Strategy’ group using the
groupby
function. - The percentage of ‘Pub. Cost $’ for each row is calculated by dividing the ‘Pub. Cost $’ column by the ‘Total $ Pub. Cost’ column and multiplying by 100.
- The ‘% of Total Pub. Cost’ column is rounded to two decimal places.
- Output column names are defined.
- Required columns are selected for the output dataframe.
- Column names are renamed.
- NaN values in the ‘% of Total Pub. Cost’ column are replaced with “N/A”.
- The ‘% of Total Pub. Cost’ column is converted to strings.
- The ‘%’ symbol is appended to non-“N/A” values in the ‘% of Total Pub. Cost’ column.
- The output dataframe is displayed.
Vitals
- Script ID : 1023
- Client ID / Customer ID: 1306927027 / 60270153
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, % of Total Pub. Cost, Total $ Pub. Cost, SBA Strategy
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-04-29 20:04
- Last Updated by dwaidhas@marinsoftware.com on 2024-04-29 20:15
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
##
## name: Total Pub Cost & Pub Cost Percentage Column
## description:
## The script clears existing data, calculates the total pub cost for each 'SBA Strategy' and 'Campaign' group, and computes the percentage of 'Pub. Cost $' for each row relative to the total cost within its group.
##
## author: Dana Waidhas
## created: 2024-03-18
##
# setup timezone
# Chicago Timezone is GMT-5. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))
# Load input data
inputDf = dataSourceDict["1"]
# Define column names
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_TOTAL_PUB_COST = 'Total $ Pub. Cost'
RPT_COL_PCT_OF_TOTAL_PUB_COST = '% of Total Pub. Cost'
# Clear 'Total $ Pub. Cost' and '% of Total Pub. Cost'
inputDf[RPT_COL_TOTAL_PUB_COST] = 0
inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST] = 0
# Calculate Total $ Pub. Cost for each SBA Strategy group
inputDf[RPT_COL_TOTAL_PUB_COST] = round(inputDf.groupby(RPT_COL_SBA_STRATEGY)[RPT_COL_PUB_COST].transform('sum'), 2)
# Calculate % of Total Pub. Cost for each row
inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST] = (inputDf[RPT_COL_PUB_COST] / inputDf[RPT_COL_TOTAL_PUB_COST]) * 100
# Round the '% of Total Pub. Cost' column to two decimal places
inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST] = round(inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST], 2)
# Define output column names
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_SBA_STRATEGY = 'SBA Strategy'
BULK_COL_TOTAL_PUB_COST = 'Total $ Pub. Cost'
BULK_COL_PCT_OF_TOTAL_PUB_COST = '% of Total Pub. Cost'
# Select required columns for output
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT, RPT_COL_SBA_STRATEGY, RPT_COL_TOTAL_PUB_COST, RPT_COL_PCT_OF_TOTAL_PUB_COST]]
# Rename columns
outputDf.columns = [BULK_COL_CAMPAIGN, BULK_COL_ACCOUNT, BULK_COL_SBA_STRATEGY, BULK_COL_TOTAL_PUB_COST, BULK_COL_PCT_OF_TOTAL_PUB_COST]
# Replace NaN values with "N/A" in the '% of Total Pub. Cost' column
outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST].fillna('N/A', inplace=True)
# Convert the '% of Total Pub. Cost' column to strings
outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST] = outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST].astype(str)
# Append '%' to non-"N/A" values
outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST] = outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST].apply(lambda x: x + '%' if x != 'N/A' else 'N/A')
# Display the output dataframe
print(outputDf)
Post generated on 2024-05-15 07:44:05 GMT