Script 825: Percentage Difference Column

Purpose

Calculates the difference between SBA allocation and % of total pub cost.

To Elaborate

This Python script solves the problem of calculating the difference between the SBA allocation and the percentage of total publication cost. It takes input data from a DataFrame and performs the following tasks:

  • Checks for unique values in the “Percentage Difference” column.
  • Checks for missing values in the “SBA Allocation” and “% of Total Pub. Cost” columns.
  • Cleans and converts values in the “Percentage Difference” column to float.
  • Calculates the difference between the “SBA Allocation” and “% of Total Pub. Cost” columns, only if there are no missing values.
  • Rounds the “Percentage Difference” column to two decimal places.
  • Selects the required columns for output.
  • Renames the columns in the output DataFrame.
  • Formats the “Percentage Difference” column to display with a percentage symbol.
  • Outputs the final DataFrame.

Walking Through the Code

  1. The input data is loaded into a DataFrame named inputDf.
  2. Constants for column names are defined.
  3. Unique values in the “Percentage Difference” column are checked and printed.
  4. Missing values in the “SBA Allocation” and “% of Total Pub. Cost” columns are checked.
  5. A function named clean_and_convert is defined to remove non-numeric characters and convert values to float.
  6. The “Percentage Difference” column in inputDf is cleaned and converted using the clean_and_convert function.
  7. If there are no missing values in the “SBA Allocation” and “% of Total Pub. Cost” columns, the difference between the two columns is calculated and stored in the “Percentage Difference” column of inputDf.
  8. The “Percentage Difference” column in inputDf is rounded to two decimal places.
  9. Output column names are defined.
  10. The required columns are selected from inputDf and stored in outputDf.
  11. The columns in outputDf are renamed.
  12. The “Percentage Difference” column in outputDf is formatted to display with a percentage symbol.
  13. The final DataFrame outputDf is printed.

Vitals

  • Script ID : 825
  • Client ID / Customer ID: 1306926629 / 60270083
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Percentage Difference
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
  • Created by dwaidhas@marinsoftware.com on 2024-03-19 20:47
  • Last Updated by dwaidhas@marinsoftware.com on 2024-04-22 17:09
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
##
## name: Difference Column Percentages
## description:
##  Calculates the difference between SBA allocation and % of total pub cost
## 
## author: Dana Waidhas
## created: 2024-03-19
## 

# Load input data into DataFrame
inputDf = dataSourceDict["1"]

# primary data source and columns
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SBA_STRATEGY = 'SBA Strategy'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_OF_TOTAL_PUB_COST = '% of Total Pub. Cost'
RPT_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'

# user code start here
# Check unique values in 'Percentage Difference' column
unique_values = inputDf[RPT_COL_PERCENTAGE_DIFFERENCE].unique()
print("Unique values in 'Percentage Difference' column:", unique_values)

# Check for missing values in 'SBA Allocation' and '% of Total Pub. Cost' columns
missing_values_allocation = inputDf[RPT_COL_SBA_ALLOCATION].isnull().sum()
missing_values_pub_cost = inputDf[RPT_COL_OF_TOTAL_PUB_COST].isnull().sum()

if missing_values_allocation > 0 or missing_values_pub_cost > 0:
    print("Warning: There are missing values in either 'SBA Allocation' or '% of Total Pub. Cost' column. Please handle them before continuing.")

# Define a function to clean and convert values to float
def clean_and_convert(value):
    try:
        # Remove non-numeric characters and convert to float
        cleaned_value = float(''.join(filter(lambda x: x.isdigit() or x == '.' or x == '-', str(value))))
        return cleaned_value
    except ValueError:
        return float('nan')

# Remove non-numeric characters and convert 'Percentage Difference' column to float
inputDf[RPT_COL_PERCENTAGE_DIFFERENCE] = inputDf[RPT_COL_PERCENTAGE_DIFFERENCE].apply(clean_and_convert)

# Calculate difference only if there are no missing values
if missing_values_allocation == 0 and missing_values_pub_cost == 0:
    # Calculate difference
    inputDf[BULK_COL_PERCENTAGE_DIFFERENCE] = (inputDf[RPT_COL_OF_TOTAL_PUB_COST] - inputDf[RPT_COL_SBA_ALLOCATION]) * 100

# Round the 'Percentage Difference' column to two decimal places
inputDf[RPT_COL_PERCENTAGE_DIFFERENCE] = round(inputDf[RPT_COL_PERCENTAGE_DIFFERENCE], 2)

# Define output column names
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_PERCENTAGE_DIFFERENCE = 'Percentage Difference'

# Select required columns for output
outputDf = inputDf[[RPT_COL_CAMPAIGN, RPT_COL_ACCOUNT, RPT_COL_PERCENTAGE_DIFFERENCE]]

# Rename columns
outputDf.columns = [BULK_COL_CAMPAIGN, BULK_COL_ACCOUNT, BULK_COL_PERCENTAGE_DIFFERENCE]


# Format 'Percentage Difference' column to display with a percentage symbol
outputDf[BULK_COL_PERCENTAGE_DIFFERENCE] = outputDf[BULK_COL_PERCENTAGE_DIFFERENCE].astype(str) + '%'

# Output
print(outputDf)

Post generated on 2024-05-15 07:44:05 GMT

comments powered by Disqus