Script 825: Percentage Difference Column
Purpose
Calculates the difference between SBA allocation and % of total pub cost.
To Elaborate
This Python script solves the problem of calculating the difference between the SBA allocation and the percentage of total publication cost. It takes input data from a DataFrame and performs the following tasks:
- Checks for unique values in the “Percentage Difference” column.
- Checks for missing values in the “SBA Allocation” and “% of Total Pub. Cost” columns.
- Cleans and converts values in the “Percentage Difference” column to float.
- Calculates the difference between the “SBA Allocation” and “% of Total Pub. Cost” columns, only if there are no missing values.
- Rounds the “Percentage Difference” column to two decimal places.
- Selects the required columns for output.
- Renames the columns in the output DataFrame.
- Formats the “Percentage Difference” column to display with a percentage symbol.
- Outputs the final DataFrame.
Walking Through the Code
- The input data is loaded into a DataFrame named
inputDf
. - Constants for column names are defined.
- Unique values in the “Percentage Difference” column are checked and printed.
- Missing values in the “SBA Allocation” and “% of Total Pub. Cost” columns are checked.
- A function named
clean_and_convert
is defined to remove non-numeric characters and convert values to float. - The “Percentage Difference” column in
inputDf
is cleaned and converted using theclean_and_convert
function. - If there are no missing values in the “SBA Allocation” and “% of Total Pub. Cost” columns, the difference between the two columns is calculated and stored in the “Percentage Difference” column of
inputDf
. - The “Percentage Difference” column in
inputDf
is rounded to two decimal places. - Output column names are defined.
- The required columns are selected from
inputDf
and stored inoutputDf
. - The columns in
outputDf
are renamed. - The “Percentage Difference” column in
outputDf
is formatted to display with a percentage symbol. - The final DataFrame
outputDf
is printed.
Vitals
- Script ID : 825
- Client ID / Customer ID: 1306926629 / 60270083
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Percentage Difference
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-03-19 20:47
- Last Updated by dwaidhas@marinsoftware.com on 2024-04-22 17:09
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
##
## name: Difference Column Percentages
## description:
## Calculates the difference between SBA allocation and % of total pub cost
##
## author: Dana Waidhas
## created: 2024-03-19
##
# Load input data into DataFrame
inputDf = dataSourceDict["1"]
# primary data source and columns
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_OF_TOTAL_PUB_COST = '% of Total Pub. Cost'
RPT_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'
# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'
# user code start here
# Check unique values in 'Percentage Difference' column
unique_values = inputDf[RPT_COL_PERCENTAGE_DIFFERENCE].unique()
print("Unique values in 'Percentage Difference' column:", unique_values)
# Check for missing values in 'SBA Allocation' and '% of Total Pub. Cost' columns
missing_values_allocation = inputDf[RPT_COL_SBA_ALLOCATION].isnull().sum()
missing_values_pub_cost = inputDf[RPT_COL_OF_TOTAL_PUB_COST].isnull().sum()
if missing_values_allocation > 0 or missing_values_pub_cost > 0:
print("Warning: There are missing values in either 'SBA Allocation' or '% of Total Pub. Cost' column. Please handle them before continuing.")
# Define a function to clean and convert values to float
def clean_and_convert(value):
try:
# Remove non-numeric characters and convert to float
cleaned_value = float(''.join(filter(lambda x: x.isdigit() or x == '.' or x == '-', str(value))))
return cleaned_value
except ValueError:
return float('nan')
# Remove non-numeric characters and convert 'Percentage Difference' column to float
inputDf[RPT_COL_PERCENTAGE_DIFFERENCE] = inputDf[RPT_COL_PERCENTAGE_DIFFERENCE].apply(clean_and_convert)
# Calculate difference only if there are no missing values
if missing_values_allocation == 0 and missing_values_pub_cost == 0:
# Calculate difference
inputDf[BULK_COL_PERCENTAGE_DIFFERENCE] = (inputDf[RPT_COL_OF_TOTAL_PUB_COST] - inputDf[RPT_COL_SBA_ALLOCATION]) * 100
# Round the 'Percentage Difference' column to two decimal places
inputDf[RPT_COL_PERCENTAGE_DIFFERENCE] = round(inputDf[RPT_COL_PERCENTAGE_DIFFERENCE], 2)
# Define output column names
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'
# Select required columns for output
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT, RPT_COL_PERCENTAGE_DIFFERENCE]]
# Rename columns
outputDf.columns = [BULK_COL_CAMPAIGN, BULK_COL_ACCOUNT, BULK_COL_PERCENTAGE_DIFFERENCE]
# Format 'Percentage Difference' column to display with a percentage symbol
outputDf[BULK_COL_PERCENTAGE_DIFFERENCE] = outputDf[BULK_COL_PERCENTAGE_DIFFERENCE].astype(str) + '%'
# Output
print(outputDf)
Post generated on 2024-05-15 07:44:05 GMT