Script 1445: Tag Groups with FUND Dimension

Purpose

The Python script tags groups with a FUND dimension based on abbreviations found in group names.

To Elaborate

The script is designed to enhance a dataset by tagging each group with a corresponding FUND dimension. This is achieved by checking if any predefined abbreviations, sourced from a Google Sheets document, are present in the group names. If an abbreviation is found within a group name, the group is tagged with that abbreviation as its FUND. If no abbreviation is found, the group is tagged as “Unknown”. This process helps in categorizing and organizing data based on specific FUND codes, which can be crucial for financial reporting and analysis.

Walking Through the Code

  1. Data Initialization
    • The script begins by setting up the primary data source, inputDf, which contains columns like ‘Group’, ‘Account’, ‘Campaign’, and ‘Group ID’.
    • It also initializes an output DataFrame, outputDf, with columns for ‘Account’, ‘Campaign’, ‘Group’, and ‘FUND’, setting the ‘FUND’ column to an empty string initially.
  2. Reference Data Setup
    • A reference DataFrame, gSheetsDf, is loaded from a Google Sheets document. This DataFrame contains FUND codes in its first column, which are used as abbreviations to tag the groups.
  3. Function Definition
    • The script defines a function, tag_fund_from_group, which takes a group name as input and checks for the presence of any FUND abbreviation from the Google Sheets data. If an abbreviation is found, it returns that abbreviation; otherwise, it returns “Unknown”.
  4. Applying the Function
    • The function is applied to each group name in the inputDf DataFrame. The results are stored in the ‘FUND’ column of the outputDf, effectively tagging each group with the appropriate FUND code or “Unknown”.
  5. Output Preview
    • Finally, the script prints the first few rows of the updated outputDf to provide a preview of the tagged data.

Vitals

  • Script ID : 1445
  • Client ID / Customer ID: 1306922277 / 60268979
  • Action Type: Bulk Upload (Preview)
  • Item Changed: AdGroup
  • Output Columns: Account, Campaign, Group, FUND
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: dwaidhas@marinsoftware.com (dwaidhas@marinsoftware.com)
  • Created by dwaidhas@marinsoftware.com on 2024-10-18 16:56
  • Last Updated by dwaidhas@marinsoftware.com on 2024-11-14 17:09
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
##
## name: Tag Groups with FUND Dimension
## description:
##  
## 
## author: Dana Waidhas 
## created: 2024-10-18
## 

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_GROUP = 'Group'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP_ID = 'Group ID'
RPT_COL_FUND = 'FUND'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_FUND = 'FUND'
outputDf[BULK_COL_FUND] = ""

# Reference data source (Google Sheets) and columns
gSheetsDf = dataSourceDict["2_1"]  # First sheet from Google Sheets
GSHEET_COL_FUND_CODE = gSheetsDf.columns[0]  # Column A in Google Sheets (Abbreviations)

# Initialize FUND column in output DataFrame
outputDf[BULK_COL_FUND] = ""

# Function to check for abbreviation in Group name and tag FUND
def tag_fund_from_group(group_name):
    for abbreviation in gSheetsDf[GSHEET_COL_FUND_CODE]:
        if abbreviation in group_name:
            return abbreviation  # Tag FUND with matching abbreviation
    return "Unknown"  # Default if no abbreviation is found

# Apply the function to each Group in inputDf
outputDf[BULK_COL_FUND] = inputDf[RPT_COL_GROUP].apply(tag_fund_from_group)

# Preview the updated DataFrame
print(outputDf.head())

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus