Script 825: Percentage Difference Column
Purpose
Calculates the difference between SBA allocation and the percentage of total publication cost.
To Elaborate
The Python script is designed to compute the difference between the Structured Budget Allocation (SBA) and the percentage of total publication cost for various campaigns and accounts. It processes input data to ensure that the values are clean and numeric, checks for any missing data in critical columns, and then calculates the percentage difference only if all necessary data is present. The script outputs the results in a structured format, making it easier to analyze the financial performance of campaigns in relation to their budget allocations.
Walking Through the Code
- Data Loading and Initialization
- The script begins by loading input data into a DataFrame from a specified data source.
- It defines constants for column names related to campaigns, accounts, SBA strategy, SBA allocation, and percentage of total publication cost.
- Data Validation and Cleaning
- The script checks for unique values in the ‘Percentage Difference’ column and prints them.
- It identifies any missing values in the ‘SBA Allocation’ and ‘% of Total Pub. Cost’ columns, issuing a warning if any are found.
- A function is defined to clean and convert values to floats, removing non-numeric characters.
- Calculation of Percentage Difference
- If there are no missing values, the script calculates the percentage difference between the ‘% of Total Pub. Cost’ and ‘SBA Allocation’.
- The calculated differences are rounded to two decimal places for precision.
- Output Preparation
- The script selects the necessary columns for output and renames them for clarity.
- It formats the ‘Percentage Difference’ column to display values with a percentage symbol.
- Finally, the script prints the output DataFrame, showing the calculated differences for each campaign and account.
Vitals
- Script ID : 825
- Client ID / Customer ID: 1306926629 / 60270083
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Percentage Difference
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-03-19 20:47
- Last Updated by dwaidhas@marinsoftware.com on 2024-04-22 17:09
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
##
## name: Difference Column Percentages
## description:
## Calculates the difference between SBA allocation and % of total pub cost
##
## author: Dana Waidhas
## created: 2024-03-19
##
# Load input data into DataFrame
inputDf = dataSourceDict["1"]
# primary data source and columns
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_OF_TOTAL_PUB_COST = '% of Total Pub. Cost'
RPT_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'
# user code start here
# Check unique values in 'Percentage Difference' column
unique_values = inputDf[RPT_COL_PERCENTAGE_DIFFERENCE].unique()
print("Unique values in 'Percentage Difference' column:", unique_values)
# Check for missing values in 'SBA Allocation' and '% of Total Pub. Cost' columns
missing_values_allocation = inputDf[RPT_COL_SBA_ALLOCATION].isnull().sum()
missing_values_pub_cost = inputDf[RPT_COL_OF_TOTAL_PUB_COST].isnull().sum()
if missing_values_allocation > 0 or missing_values_pub_cost > 0:
print("Warning: There are missing values in either 'SBA Allocation' or '% of Total Pub. Cost' column. Please handle them before continuing.")
# Define a function to clean and convert values to float
def clean_and_convert(value):
try:
# Remove non-numeric characters and convert to float
cleaned_value = float(''.join(filter(lambda x: x.isdigit() or x == '.' or x == '-', str(value))))
return cleaned_value
except ValueError:
return float('nan')
# Remove non-numeric characters and convert 'Percentage Difference' column to float
inputDf[RPT_COL_PERCENTAGE_DIFFERENCE] = inputDf[RPT_COL_PERCENTAGE_DIFFERENCE].apply(clean_and_convert)
# Calculate difference only if there are no missing values
if missing_values_allocation == 0 and missing_values_pub_cost == 0:
# Calculate difference
inputDf[BULK_COL_PERCENTAGE_DIFFERENCE] = (inputDf[RPT_COL_OF_TOTAL_PUB_COST] - inputDf[RPT_COL_SBA_ALLOCATION]) * 100
# Round the 'Percentage Difference' column to two decimal places
inputDf[RPT_COL_PERCENTAGE_DIFFERENCE] = round(inputDf[RPT_COL_PERCENTAGE_DIFFERENCE], 2)
# Define output column names
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'
# Select required columns for output
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT, RPT_COL_PERCENTAGE_DIFFERENCE]]
# Rename columns
outputDf.columns = [BULK_COL_CAMPAIGN, BULK_COL_ACCOUNT, BULK_COL_PERCENTAGE_DIFFERENCE]
# Format 'Percentage Difference' column to display with a percentage symbol
outputDf[BULK_COL_PERCENTAGE_DIFFERENCE] = outputDf[BULK_COL_PERCENTAGE_DIFFERENCE].astype(str) + '%'
# Output
print(outputDf)
Post generated on 2024-11-27 06:58:46 GMT