Script 1023: Total Pub Cost & Pub Cost Percentage Column
Purpose:
The Python script calculates the total publication cost for each ‘SBA Strategy’ and ‘Campaign’ group and determines the percentage of ‘Pub. Cost $’ for each row relative to the total cost within its group.
To Elaborate
The script is designed to process a dataset containing publication costs associated with different marketing strategies and campaigns. It begins by clearing existing data in specific columns related to total publication costs and their percentages. The script then calculates the total publication cost for each group defined by the ‘SBA Strategy’ and ‘Campaign’ columns. Subsequently, it computes the percentage of the publication cost for each row relative to the total cost within its respective group. This percentage is rounded to two decimal places for precision. The script also handles missing values by replacing them with “N/A” and formats the percentage values as strings with a ‘%’ symbol appended. The final output is a cleaned and formatted dataframe ready for further analysis or reporting.
Walking Through the Code
- Initial Setup and Data Loading
- The script begins by loading input data into a dataframe named
inputDf
. - It defines several column names that will be used throughout the script for calculations and output formatting.
- The script begins by loading input data into a dataframe named
- Data Clearing and Calculation
- The script clears existing values in the ‘Total $ Pub. Cost’ and ‘% of Total Pub. Cost’ columns by setting them to zero.
- It calculates the total publication cost for each ‘SBA Strategy’ group using the
groupby
andtransform
functions, rounding the results to two decimal places.
- Percentage Calculation
- The script computes the percentage of publication cost for each row relative to the total cost within its group.
- It rounds these percentage values to two decimal places for consistency.
- Output Preparation
- The script selects the necessary columns for output and renames them for clarity.
- It replaces any NaN values in the percentage column with “N/A” and converts the column to string format.
- A ‘%’ symbol is appended to all non-“N/A” values to indicate percentage.
- Final Output
- The script prints the final output dataframe, which includes the calculated total publication costs and their respective percentages formatted for easy interpretation.
Vitals
- Script ID : 1023
- Client ID / Customer ID: 1306927027 / 60270153
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, % of Total Pub. Cost, Total $ Pub. Cost, SBA Strategy
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-04-29 20:04
- Last Updated by dwaidhas@marinsoftware.com on 2024-04-29 20:15
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
##
## name: Total Pub Cost & Pub Cost Percentage Column
## description:
## The script clears existing data, calculates the total pub cost for each 'SBA Strategy' and 'Campaign' group, and computes the percentage of 'Pub. Cost $' for each row relative to the total cost within its group.
##
## author: Dana Waidhas
## created: 2024-03-18
##
# setup timezone
# Chicago Timezone is GMT-5. Adjust as needed.
CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))
# Load input data
inputDf = dataSourceDict["1"]
# Define column names
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_TOTAL_PUB_COST = 'Total $ Pub. Cost'
RPT_COL_PCT_OF_TOTAL_PUB_COST = '% of Total Pub. Cost'
# Clear 'Total $ Pub. Cost' and '% of Total Pub. Cost'
inputDf[RPT_COL_TOTAL_PUB_COST] = 0
inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST] = 0
# Calculate Total $ Pub. Cost for each SBA Strategy group
inputDf[RPT_COL_TOTAL_PUB_COST] = round(inputDf.groupby(RPT_COL_SBA_STRATEGY)[RPT_COL_PUB_COST].transform('sum'), 2)
# Calculate % of Total Pub. Cost for each row
inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST] = (inputDf[RPT_COL_PUB_COST] / inputDf[RPT_COL_TOTAL_PUB_COST]) * 100
# Round the '% of Total Pub. Cost' column to two decimal places
inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST] = round(inputDf[RPT_COL_PCT_OF_TOTAL_PUB_COST], 2)
# Define output column names
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_SBA_STRATEGY = 'SBA Strategy'
BULK_COL_TOTAL_PUB_COST = 'Total $ Pub. Cost'
BULK_COL_PCT_OF_TOTAL_PUB_COST = '% of Total Pub. Cost'
# Select required columns for output
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT, RPT_COL_SBA_STRATEGY, RPT_COL_TOTAL_PUB_COST, RPT_COL_PCT_OF_TOTAL_PUB_COST]]
# Rename columns
outputDf.columns = [BULK_COL_CAMPAIGN, BULK_COL_ACCOUNT, BULK_COL_SBA_STRATEGY, BULK_COL_TOTAL_PUB_COST, BULK_COL_PCT_OF_TOTAL_PUB_COST]
# Replace NaN values with "N/A" in the '% of Total Pub. Cost' column
outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST].fillna('N/A', inplace=True)
# Convert the '% of Total Pub. Cost' column to strings
outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST] = outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST].astype(str)
# Append '%' to non-"N/A" values
outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST] = outputDf[BULK_COL_PCT_OF_TOTAL_PUB_COST].apply(lambda x: x + '%' if x != 'N/A' else 'N/A')
# Display the output dataframe
print(outputDf)
Post generated on 2025-03-11 01:25:51 GMT