Script 1447: Auto Tag Groups with Fund type Dimension using Gsheet

Purpose

The Python script automates the process of tagging groups with fund types by matching data from a primary data source with a reference Google Sheets document.

To Elaborate

The script addresses the need to automatically assign fund types to groups based on a matching process between a primary data source and a reference Google Sheets document. The primary data source contains information about various groups, including their associated funds, while the Google Sheets document serves as a reference for fund codes and their corresponding fund types. The script iterates through each entry in the primary data source, attempts to find a matching fund code in the Google Sheets document, and assigns the corresponding fund type to the output. If no match is found, it assigns a default value of “Unknown” to indicate the absence of a corresponding fund type. This process ensures that each group is accurately tagged with the correct fund type, facilitating structured budget allocation and reporting.

Walking Through the Code

  1. Data Initialization
    • The script begins by defining the primary data source (inputDf) and the reference data source (gSheetsDf) from a dictionary of data sources.
    • It specifies the relevant columns for both data sources, ensuring that the columns for fund and fund type are correctly identified.
  2. Output Preparation
    • The script initializes the output DataFrame (outputDf) by copying the fund column from the primary data source and setting the fund type column to an empty string.
  3. Matching Process
    • The script iterates over each row in the primary data source.
    • For each fund in the primary data source, it attempts to find a matching row in the Google Sheets document based on the fund code.
    • If a match is found, the corresponding fund type from the Google Sheets document is assigned to the output DataFrame.
    • If no match is found, the fund type is set to “Unknown” as a placeholder.
  4. Output Display
    • Finally, the script prints the first few rows of the output DataFrame to verify the results of the matching process.

Vitals

  • Script ID : 1447
  • Client ID / Customer ID: 1306922277 / 60268979
  • Action Type: Bulk Upload
  • Item Changed: AdGroup
  • Output Columns: Account, Campaign, Group, FUND Type
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
  • Created by dwaidhas@marinsoftware.com on 2024-10-18 18:16
  • Last Updated by dwaidhas@marinsoftware.com on 2024-10-28 19:06
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
##
## name:  Auto Tag Groups with Fund and Fund type Dimension using Gsheet
## description:
##  
## 
## author: 
## created: 2024-10-18
## 

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_GROUP = 'Group'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP_ID = 'Group ID'
RPT_COL_FUND = 'FUND'
RPT_COL_FUND_TYPE = 'FUND Type'

# Reference data source (Google Sheets) and columns
gSheetsDf = dataSourceDict["2_1"]  # Use the first sheet from GSheets
# Let's define the first column in gSheets as the column to match with RPT_COL_FUND
GSHEET_COL_FUND_CODE = gSheetsDf.columns[0]  # First column of gSheets

# Ensure you have the correct columns for FUND and FUND TYPE in gSheets
GSHEET_COL_FUND = gSheetsDf.columns[1]  # Second column is FUND
GSHEET_COL_FUND_TYPE = gSheetsDf.columns[2]  # Third column is FUND TYPE

# Output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_FUND = 'FUND'
BULK_COL_FUND_TYPE = 'FUND Type'

outputDf[BULK_COL_FUND] = inputDf[RPT_COL_FUND]
outputDf[BULK_COL_FUND_TYPE] = ""  # Initialize as empty

# Perform matching based on the first column (Fund Code) in gSheetsDf
for index, row in inputDf.iterrows():
    fund = row[RPT_COL_FUND]
    
    # Print the value of fund to check what you're trying to match
    print(f"Trying to match fund: {fund}")
    
    # Find matching row in the GSheet's first column (GSHEET_COL_FUND_CODE)
    matching_row = gSheetsDf[gSheetsDf[GSHEET_COL_FUND_CODE] == fund]
    
    # Log if we find any match
    if not matching_row.empty:
        fund_type = matching_row[GSHEET_COL_FUND_TYPE].values[0]
        print(f"Match found: {fund} -> {fund_type}")
        
        # If a match is found, assign the corresponding Fund Type (from Column 3) to the output
        outputDf.at[index, BULK_COL_FUND_TYPE] = fund_type
    else:
        # Log if no match is found
        print(f"No matching Fund Code found for: {fund}")
        outputDf.at[index, BULK_COL_FUND_TYPE] = "Unknown"  # Default or placeholder value

# Display the first few rows of outputDf
print(outputDf.head())

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus