Script 1677: Script Group Upload
Purpose
The script processes and merges financial data from two sources to calculate and output adjusted publication costs by group.
To Elaborate
The Python script is designed to process financial data from two different data sources, focusing on calculating adjusted publication costs. It first cleans and prepares the data by converting date and numeric fields into appropriate formats. The script identifies the most recent date in the dataset and filters the data to include only records from this date. It then calculates the adjusted publication cost by considering the current margin. The script merges this processed data with a reference dataset to associate each account with a corresponding group ID. Finally, it outputs a structured DataFrame containing the date, group ID, and the calculated adjusted publication cost, ready for further analysis or reporting.
Walking Through the Code
- Data Preparation
- The script begins by cleaning the primary data source, converting the ‘Date’ column to a datetime format and ensuring numeric columns are properly formatted. It also cleans the reference data by stripping spaces from column names and ensuring consistency in data types.
- Data Filtering and Calculation
- It determines the latest available date in the dataset and filters the data to include only records from this date. The script calculates the adjusted publication cost by dividing the publication cost by the complement of the current margin, rounding the result to two decimal places.
- Data Merging and Output Preparation
- The script merges the filtered and calculated data with the reference data to associate each account with a group ID. It initializes an output DataFrame with specified columns and populates it with the merged data, resetting the index for a clean output.
Vitals
- Script ID : 1677
- Client ID / Customer ID: 1306928641 / 60270613
- Action Type: Revenue Upload
- Item Changed: None
- Output Columns: Date, Group ID, Full Margin Cost Conv
- Linked Datasource: M1 Report
- Reference Datasource: M1 Report
- Owner: Jesus Garza (jgarza@marinsoftware.com)
- Created by Jesus Garza on 2025-01-28 19:04
- Last Updated by Jesus Garza on 2025-02-10 18:00
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
##
## name: Script - Group Upload
## description:
##
##
## author: Jesus Garza
## created: 2025-01-28
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_ACCOUNT = 'Account'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT_ID = 'Account ID'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CURRENT_MARGIN = 'Current Margin'
# Reference data source and columns
reportDf = dataSourceDict["2"] # Report DataFrame
RPT_COL_GROUP = 'Group'
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_GROUP_ID = 'Group ID'
# Output columns and initial setup
REVENUE_COL_DATE = 'Date'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_FULL_MARGIN_COST_CONV = 'Full Margin Cost Conv'
# Clean and prepare the primary data
inputDf['Date'] = pd.to_datetime(inputDf['Date'], errors='coerce') # Convert 'Date' column to datetime
inputDf['Pub. Cost $'] = pd.to_numeric(inputDf['Pub. Cost $'], errors='coerce') # Convert 'Pub. Cost $' to numeric
inputDf['Current Margin'] = pd.to_numeric(inputDf['Current Margin'], errors='coerce') # Convert 'Current Margin' to numeric
# Clean and prepare the reference data
reportDf.rename(columns=lambda x: x.strip(), inplace=True) # Strip any extra spaces in column names
reportDf[RPT_COL_PUB_ID] = reportDf[RPT_COL_PUB_ID].astype(str) # Ensure Pub. ID is string for consistency
# Determine the latest available date in the dataset
latest_date = inputDf['Date'].max()
# Filter data for the latest available date
latest_data = inputDf[inputDf['Date'] == latest_date].copy() # Use .copy() to avoid SettingWithCopyWarning
# Calculate the adjusted pub cost
latest_data['Adjusted Pub Cost'] = latest_data['Pub. Cost $'] / (1 - latest_data['Current Margin'])
latest_data['Adjusted Pub Cost'] = latest_data['Adjusted Pub Cost'].round(2) # Round to 2 decimal places
# Merge with the reference DataFrame to fetch the Pub. ID
# Perform a left join on the "Account" column in inputDf to the "Group" column in reportDf
merged_data = latest_data.merge(
reportDf[['Group', 'Pub. ID']],
how='left',
left_on='Account',
right_on='Group'
)
# Initialize outputDf with the required columns
outputDf = pd.DataFrame(columns=[REVENUE_COL_DATE, REVENUE_COL_GROUP_ID, REVENUE_COL_FULL_MARGIN_COST_CONV])
# Populate outputDf with data from merged_data
outputDf[REVENUE_COL_DATE] = merged_data['Date']
outputDf[REVENUE_COL_GROUP_ID] = merged_data['Pub. ID'] # Use Pub. ID for Group ID
outputDf[REVENUE_COL_FULL_MARGIN_COST_CONV] = merged_data['Adjusted Pub Cost']
# Reset the index for a clean output
outputDf = outputDf.reset_index(drop=True)
print(tableize(inputDf.head()))
Post generated on 2025-02-21 10:25:25 GMT