Script 1403: Wishlist Ingestion
Purpose
The script processes and merges data from two sources to produce a structured output for wishlist ingestion.
To Elaborate
The Python script is designed to merge and process data from two different sources: a primary data source and a Google Sheets document. The goal is to create a structured output that aligns with specific business requirements for wishlist ingestion. The script performs a merge operation based on common fields, ensuring that the resulting dataset includes relevant information such as dates, wishlist conversion metrics, and group identifiers. It also standardizes the format of certain fields, such as ensuring that the ‘Group ID’ is always represented as an integer. This structured output is then prepared for further analysis or reporting, facilitating the management and allocation of resources in a business context.
Walking Through the Code
- Data Input
- The script begins by loading data from two sources:
inputDf
from a primary data source andgSheetsDf
from a Google Sheets document. - These dataframes are stored in a dictionary called
dataSourceDict
.
- The script begins by loading data from two sources:
- Data Merging
- The script merges the two dataframes on the ‘Campaign’ and ‘Group’ columns from
inputDf
and corresponding columns ‘C’ and ‘D’ fromgSheetsDf
. - The merge operation is performed using a left join to ensure all records from the Google Sheets are retained.
- The script merges the two dataframes on the ‘Campaign’ and ‘Group’ columns from
- Data Transformation
- After merging, the script renames columns to more descriptive names using predefined constants.
- It ensures that the ‘Group ID’ column is always output as an integer by converting it and handling any errors by filling them with zero.
- Output Preparation
- The script selects specific columns for the final output, focusing on date, wishlist conversions, group ID, and comments.
- It sets a flag
skip_output_validations
toTrue
, indicating that output validations are bypassed.
- Output Display
- Finally, the script prints the structured output using a function
tableize
, which formats the dataframe for display.
- Finally, the script prints the structured output using a function
Vitals
- Script ID : 1403
- Client ID / Customer ID: 1306927811 / 60270355
- Action Type: Revenue Upload
- Item Changed: None
- Output Columns: Date, Group ID, Comments, Wishlists Conv
- Linked Datasource: FTP/Email Feed
- Reference Datasource: Google Sheets
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2024-09-24 11:04
- Last Updated by Tom McCaughey on 2024-09-24 12:18
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#Offline format script
# Define the column names as constants
RPT_COL_GROUP = 'Group'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_PUB_ID = 'Group ID'
RPT_COL_DATE = 'Date'
RPT_COL_WISHLISTS = 'WishlistUpload Conv'
RPT_COL_GSHEET_GROUP = 'Comments'
inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"] # gsheets dataframe (first sheet)
# Merge dataframes based on both 'Campaign' and 'Group'
mergedDf = pd.merge(
gSheetsDf,
inputDf,
left_on=['C', 'D'], # Assuming 'D' refers to 'Group' in gSheetsDf
right_on=['Campaign', 'Group'],
how='left'
)
mergedDf = mergedDf.rename(columns={
'A': RPT_COL_DATE,
'E': RPT_COL_WISHLISTS,
'D': RPT_COL_GSHEET_GROUP,
'Pub. ID': RPT_COL_PUB_ID # Assuming 'Group' from inputDf should be 'Group ID' in the output
})
# Ensure 'Group ID' is always output as an integer
mergedDf[RPT_COL_PUB_ID] = pd.to_numeric(mergedDf[RPT_COL_PUB_ID], errors='coerce').fillna(0).astype(int)
outputDf = mergedDf[[RPT_COL_DATE, RPT_COL_WISHLISTS, RPT_COL_PUB_ID, RPT_COL_GSHEET_GROUP]]
skip_output_validations = True
print(tableize(outputDf))
Post generated on 2024-11-27 06:58:46 GMT