Script 1677: Script Group Upload
Purpose:
The Python script processes and merges financial data to calculate adjusted publication costs for the latest available date.
To Elaborate
The script is designed to process financial data from two sources: a primary data source containing account and cost information, and a reference data source with group and publication identifiers. The main objective is to calculate the adjusted publication cost for the latest available date by considering the current margin. The script cleans and prepares the data, ensuring consistency in data types, and then filters the data to focus on the most recent date. It performs calculations to adjust the publication cost based on the current margin and merges this information with reference data to associate publication identifiers. The final output is a structured DataFrame containing the date, group ID, and adjusted publication cost, which can be used for further analysis or reporting.
Walking Through the Code
- Data Preparation:
- The script begins by converting the ‘Date’ column in the primary data source to a datetime format and ensures that ‘Pub. Cost $’ and ‘Current Margin’ are numeric. This step is crucial for accurate calculations and filtering.
- The reference data source is cleaned by stripping extra spaces from column names and converting ‘Pub. ID’ to a string for consistency.
- Data Filtering:
- The script identifies the latest available date in the primary data source and filters the data to include only records from this date. This ensures that the calculations are based on the most recent data.
- Cost Adjustment Calculation:
- For the filtered data, the script calculates the adjusted publication cost by dividing the ‘Pub. Cost $’ by (1 - ‘Current Margin’). This calculation adjusts the cost to account for the margin, providing a more accurate representation of expenses.
- Data Merging:
- The script performs a left join between the filtered primary data and the reference data on the ‘Account’ and ‘Group’ columns, respectively. This step associates each account with its corresponding publication identifier.
- Output Preparation:
- An output DataFrame is initialized with specific columns for the date, group ID, and adjusted publication cost. The script populates this DataFrame with the merged data and resets the index for a clean presentation.
- User Changeable Parameters:
- Users can modify the data source identifiers and column names if the structure of the input data changes. Additionally, the calculation logic for adjusted publication cost can be adjusted if the business rules evolve.
Vitals
- Script ID : 1677
- Client ID / Customer ID: 1306928641 / 60270613
- Action Type: Revenue Upload
- Item Changed: None
- Output Columns: Date, Group ID, Full Margin Cost Conv
- Linked Datasource: M1 Report
- Reference Datasource: M1 Report
- Owner: Jesus Garza (jgarza@marinsoftware.com)
- Created by Jesus Garza on 2025-01-28 19:04
- Last Updated by Jesus Garza on 2025-02-10 18:00
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
##
## name: Script - Group Upload
## description:
##
##
## author: Jesus Garza
## created: 2025-01-28
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_ACCOUNT = 'Account'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT_ID = 'Account ID'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CURRENT_MARGIN = 'Current Margin'
# Reference data source and columns
reportDf = dataSourceDict["2"] # Report DataFrame
RPT_COL_GROUP = 'Group'
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_GROUP_ID = 'Group ID'
# Output columns and initial setup
REVENUE_COL_DATE = 'Date'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_FULL_MARGIN_COST_CONV = 'Full Margin Cost Conv'
# Clean and prepare the primary data
inputDf['Date'] = pd.to_datetime(inputDf['Date'], errors='coerce') # Convert 'Date' column to datetime
inputDf['Pub. Cost $'] = pd.to_numeric(inputDf['Pub. Cost $'], errors='coerce') # Convert 'Pub. Cost $' to numeric
inputDf['Current Margin'] = pd.to_numeric(inputDf['Current Margin'], errors='coerce') # Convert 'Current Margin' to numeric
# Clean and prepare the reference data
reportDf.rename(columns=lambda x: x.strip(), inplace=True) # Strip any extra spaces in column names
reportDf[RPT_COL_PUB_ID] = reportDf[RPT_COL_PUB_ID].astype(str) # Ensure Pub. ID is string for consistency
# Determine the latest available date in the dataset
latest_date = inputDf['Date'].max()
# Filter data for the latest available date
latest_data = inputDf[inputDf['Date'] == latest_date].copy() # Use .copy() to avoid SettingWithCopyWarning
# Calculate the adjusted pub cost
latest_data['Adjusted Pub Cost'] = latest_data['Pub. Cost $'] / (1 - latest_data['Current Margin'])
latest_data['Adjusted Pub Cost'] = latest_data['Adjusted Pub Cost'].round(2) # Round to 2 decimal places
# Merge with the reference DataFrame to fetch the Pub. ID
# Perform a left join on the "Account" column in inputDf to the "Group" column in reportDf
merged_data = latest_data.merge(
reportDf[['Group', 'Pub. ID']],
how='left',
left_on='Account',
right_on='Group'
)
# Initialize outputDf with the required columns
outputDf = pd.DataFrame(columns=[REVENUE_COL_DATE, REVENUE_COL_GROUP_ID, REVENUE_COL_FULL_MARGIN_COST_CONV])
# Populate outputDf with data from merged_data
outputDf[REVENUE_COL_DATE] = merged_data['Date']
outputDf[REVENUE_COL_GROUP_ID] = merged_data['Pub. ID'] # Use Pub. ID for Group ID
outputDf[REVENUE_COL_FULL_MARGIN_COST_CONV] = merged_data['Adjusted Pub Cost']
# Reset the index for a clean output
outputDf = outputDf.reset_index(drop=True)
print(tableize(inputDf.head()))
Post generated on 2025-03-11 01:25:51 GMT