Script 1403: Wishlist Ingestion

Purpose:

The script processes and merges data from two sources to produce a structured output for wishlist ingestion.

To Elaborate

The Python script is designed to handle data ingestion for a wishlist system by merging information from two different data sources. It specifically focuses on aligning and combining data based on campaign and group identifiers. The script ensures that the resulting dataset is structured correctly, with particular attention to the format of the ‘Group ID’ field, which is converted to an integer. This process facilitates the integration of wishlist data into a larger system, ensuring that the necessary fields are correctly aligned and formatted for further processing or analysis.

Walking Through the Code

  1. Data Source Initialization:
    • The script begins by loading data from two sources into dataframes: inputDf and gSheetsDf. These represent the primary data and Google Sheets data, respectively.
  2. Data Merging:
    • The script merges these dataframes on the ‘Campaign’ and ‘Group’ columns. This step aligns the data from both sources based on these common identifiers, ensuring that related information is combined into a single dataset.
  3. Column Renaming:
    • After merging, the script renames certain columns to match predefined constants. This step standardizes the column names for consistency and clarity in the output.
  4. Data Type Conversion:
    • The ‘Group ID’ column is converted to an integer type. This ensures that the data is in the correct format for any subsequent processing or analysis.
  5. Output Preparation:
    • A subset of columns is selected for the final output, focusing on key fields such as date, wishlists, group ID, and comments. The script then prints this structured output, ready for further use.

Vitals

  • Script ID : 1403
  • Client ID / Customer ID: 1306927811 / 60270355
  • Action Type: Revenue Upload
  • Item Changed: None
  • Output Columns: Date, Group ID, Comments, Wishlists Conv
  • Linked Datasource: FTP/Email Feed
  • Reference Datasource: Google Sheets
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2024-09-24 11:04
  • Last Updated by Tom McCaughey on 2024-09-24 12:18
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#Offline format script
# Define the column names as constants
RPT_COL_GROUP = 'Group'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_PUB_ID = 'Group ID'
RPT_COL_DATE = 'Date'
RPT_COL_WISHLISTS = 'WishlistUpload Conv'
RPT_COL_GSHEET_GROUP = 'Comments'


inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)

# Merge dataframes based on both 'Campaign' and 'Group'
mergedDf = pd.merge(
    gSheetsDf, 
    inputDf, 
    left_on=['C', 'D'],  # Assuming 'D' refers to 'Group' in gSheetsDf
    right_on=['Campaign', 'Group'], 
    how='left'
)

mergedDf = mergedDf.rename(columns={
    'A': RPT_COL_DATE,
    'E': RPT_COL_WISHLISTS,
    'D': RPT_COL_GSHEET_GROUP,
    'Pub. ID': RPT_COL_PUB_ID  # Assuming 'Group' from inputDf should be 'Group ID' in the output
})

# Ensure 'Group ID' is always output as an integer
mergedDf[RPT_COL_PUB_ID] = pd.to_numeric(mergedDf[RPT_COL_PUB_ID], errors='coerce').fillna(0).astype(int)

outputDf = mergedDf[[RPT_COL_DATE, RPT_COL_WISHLISTS, RPT_COL_PUB_ID, RPT_COL_GSHEET_GROUP]]

skip_output_validations = True

print(tableize(outputDf))

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus