Script 1447: Auto Tag Groups with Fund type Dimension using Gsheet
Purpose
The Python script automates the process of tagging groups with fund types by matching data from a primary data source with a reference Google Sheets document.
To Elaborate
The script addresses the need to automatically assign fund types to groups based on a matching process between a primary data source and a reference Google Sheets document. The primary data source contains information about various groups, including their associated funds, while the Google Sheets document serves as a reference for fund codes and their corresponding fund types. The script iterates through each entry in the primary data source, attempts to find a matching fund code in the Google Sheets document, and assigns the corresponding fund type to the output. If no match is found, it assigns a default value of “Unknown” to indicate the absence of a corresponding fund type. This process ensures that each group is accurately tagged with the correct fund type, facilitating structured budget allocation and reporting.
Walking Through the Code
- Data Initialization
- The script begins by defining the primary data source (
inputDf
) and the reference data source (gSheetsDf
) from a dictionary of data sources. - It specifies the relevant columns for both data sources, ensuring that the columns for fund and fund type are correctly identified.
- The script begins by defining the primary data source (
- Output Preparation
- The script initializes the output DataFrame (
outputDf
) by copying the fund column from the primary data source and setting the fund type column to an empty string.
- The script initializes the output DataFrame (
- Matching Process
- The script iterates over each row in the primary data source.
- For each fund in the primary data source, it attempts to find a matching row in the Google Sheets document based on the fund code.
- If a match is found, the corresponding fund type from the Google Sheets document is assigned to the output DataFrame.
- If no match is found, the fund type is set to “Unknown” as a placeholder.
- Output Display
- Finally, the script prints the first few rows of the output DataFrame to verify the results of the matching process.
Vitals
- Script ID : 1447
- Client ID / Customer ID: 1306922277 / 60268979
- Action Type: Bulk Upload
- Item Changed: AdGroup
- Output Columns: Account, Campaign, Group, FUND Type
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
- Created by dwaidhas@marinsoftware.com on 2024-10-18 18:16
- Last Updated by dwaidhas@marinsoftware.com on 2024-10-28 19:06
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
##
## name: Auto Tag Groups with Fund and Fund type Dimension using Gsheet
## description:
##
##
## author:
## created: 2024-10-18
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_GROUP = 'Group'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP_ID = 'Group ID'
RPT_COL_FUND = 'FUND'
RPT_COL_FUND_TYPE = 'FUND Type'
# Reference data source (Google Sheets) and columns
gSheetsDf = dataSourceDict["2_1"] # Use the first sheet from GSheets
# Let's define the first column in gSheets as the column to match with RPT_COL_FUND
GSHEET_COL_FUND_CODE = gSheetsDf.columns[0] # First column of gSheets
# Ensure you have the correct columns for FUND and FUND TYPE in gSheets
GSHEET_COL_FUND = gSheetsDf.columns[1] # Second column is FUND
GSHEET_COL_FUND_TYPE = gSheetsDf.columns[2] # Third column is FUND TYPE
# Output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_FUND = 'FUND'
BULK_COL_FUND_TYPE = 'FUND Type'
outputDf[BULK_COL_FUND] = inputDf[RPT_COL_FUND]
outputDf[BULK_COL_FUND_TYPE] = "" # Initialize as empty
# Perform matching based on the first column (Fund Code) in gSheetsDf
for index, row in inputDf.iterrows():
fund = row[RPT_COL_FUND]
# Print the value of fund to check what you're trying to match
print(f"Trying to match fund: {fund}")
# Find matching row in the GSheet's first column (GSHEET_COL_FUND_CODE)
matching_row = gSheetsDf[gSheetsDf[GSHEET_COL_FUND_CODE] == fund]
# Log if we find any match
if not matching_row.empty:
fund_type = matching_row[GSHEET_COL_FUND_TYPE].values[0]
print(f"Match found: {fund} -> {fund_type}")
# If a match is found, assign the corresponding Fund Type (from Column 3) to the output
outputDf.at[index, BULK_COL_FUND_TYPE] = fund_type
else:
# Log if no match is found
print(f"No matching Fund Code found for: {fund}")
outputDf.at[index, BULK_COL_FUND_TYPE] = "Unknown" # Default or placeholder value
# Display the first few rows of outputDf
print(outputDf.head())
Post generated on 2024-11-27 06:58:46 GMT