Script 1403: Wishlist Ingestion

Purpose

The script processes and merges data from two sources to produce a structured output for wishlist ingestion.

To Elaborate

The Python script is designed to merge and process data from two different sources: a primary data source and a Google Sheets document. The goal is to create a structured output that aligns with specific business requirements for wishlist ingestion. The script performs a merge operation based on common fields, ensuring that the resulting dataset includes relevant information such as dates, wishlist conversion metrics, and group identifiers. It also standardizes the format of certain fields, such as ensuring that the ‘Group ID’ is always represented as an integer. This structured output is then prepared for further analysis or reporting, facilitating the management and allocation of resources in a business context.

Walking Through the Code

  1. Data Input
    • The script begins by loading data from two sources: inputDf from a primary data source and gSheetsDf from a Google Sheets document.
    • These dataframes are stored in a dictionary called dataSourceDict.
  2. Data Merging
    • The script merges the two dataframes on the ‘Campaign’ and ‘Group’ columns from inputDf and corresponding columns ‘C’ and ‘D’ from gSheetsDf.
    • The merge operation is performed using a left join to ensure all records from the Google Sheets are retained.
  3. Data Transformation
    • After merging, the script renames columns to more descriptive names using predefined constants.
    • It ensures that the ‘Group ID’ column is always output as an integer by converting it and handling any errors by filling them with zero.
  4. Output Preparation
    • The script selects specific columns for the final output, focusing on date, wishlist conversions, group ID, and comments.
    • It sets a flag skip_output_validations to True, indicating that output validations are bypassed.
  5. Output Display
    • Finally, the script prints the structured output using a function tableize, which formats the dataframe for display.

Vitals

  • Script ID : 1403
  • Client ID / Customer ID: 1306927811 / 60270355
  • Action Type: Revenue Upload
  • Item Changed: None
  • Output Columns: Date, Group ID, Comments, Wishlists Conv
  • Linked Datasource: FTP/Email Feed
  • Reference Datasource: Google Sheets
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2024-09-24 11:04
  • Last Updated by Tom McCaughey on 2024-09-24 12:18
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#Offline format script
# Define the column names as constants
RPT_COL_GROUP = 'Group'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_PUB_ID = 'Group ID'
RPT_COL_DATE = 'Date'
RPT_COL_WISHLISTS = 'WishlistUpload Conv'
RPT_COL_GSHEET_GROUP = 'Comments'


inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)

# Merge dataframes based on both 'Campaign' and 'Group'
mergedDf = pd.merge(
    gSheetsDf, 
    inputDf, 
    left_on=['C', 'D'],  # Assuming 'D' refers to 'Group' in gSheetsDf
    right_on=['Campaign', 'Group'], 
    how='left'
)

mergedDf = mergedDf.rename(columns={
    'A': RPT_COL_DATE,
    'E': RPT_COL_WISHLISTS,
    'D': RPT_COL_GSHEET_GROUP,
    'Pub. ID': RPT_COL_PUB_ID  # Assuming 'Group' from inputDf should be 'Group ID' in the output
})

# Ensure 'Group ID' is always output as an integer
mergedDf[RPT_COL_PUB_ID] = pd.to_numeric(mergedDf[RPT_COL_PUB_ID], errors='coerce').fillna(0).astype(int)

outputDf = mergedDf[[RPT_COL_DATE, RPT_COL_WISHLISTS, RPT_COL_PUB_ID, RPT_COL_GSHEET_GROUP]]

skip_output_validations = True

print(tableize(outputDf))

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus