Script 1621: CCT Incentives Automation

Purpose:

The Python script automates the merging and filtering of data from multiple sources to facilitate structured budget allocation for CCT incentives.

To Elaborate

The Python script named ‘CCT Incentives Automation’ is designed to streamline the process of managing and allocating budgets for CCT incentives by merging data from different sources. It takes data from a primary source and a reference source, merges them based on specific columns, and processes the merged data to ensure that only relevant and matching records are retained. The script aims to automate the structured budget allocation (SBA) process by ensuring that the data is accurately combined and filtered, thereby facilitating efficient management of incentives. The script also prepares the data for further analysis or reporting by selecting only the necessary columns for output, which helps in maintaining focus on key metrics and attributes related to CCT incentives.

Walking Through the Code

  1. Data Source Initialization:
    • The script begins by defining the primary and reference data sources using a dictionary (dataSourceDict). These data sources are represented as dataframes (inputDf and gSheetsDf).
  2. Data Merging:
    • The script merges the two dataframes (gSheetsDf and inputDf) using an inner join. This merge is performed on specific columns: ‘A’, ‘D’, ‘G’ from gSheetsDf and ‘CCT_sourcetemplate’, ‘Model’, ‘Incentive Number’ from inputDf. The inner join ensures that only rows with matching values in these columns are retained.
  3. Data Population:
    • After merging, the script populates the ‘Description Line 2’ column in the merged dataframe (mergedDf) with values from column ‘F’ of gSheetsDf.
  4. Data Filtering:
    • Although commented out, the script includes a step to filter out rows where the ‘Description Line 2’ values in the report and bulk columns do not match. This step is crucial for ensuring data consistency but is currently inactive.
  5. Column Selection:
    • The script selects specific columns from the merged dataframe to create the final output dataframe (outputDf). These columns are essential for the structured budget allocation process.
  6. Output Inspection:
    • The script concludes by printing the outputDf for inspection, with a flag (skip_output_validations) set to bypass any output validations.

Vitals

  • Script ID : 1621
  • Client ID / Customer ID: 1306926957 / 60270165
  • Action Type: Bulk Upload
  • Item Changed: Ad
  • Output Columns: Account, Campaign, Group, Creative ID, Description Line 2, CCT_sourcetemplate, Model, Headline
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2025-01-15 10:48
  • Last Updated by Tom McCaughey on 2025-01-29 09:58
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]

# reference data source and columns
gSheetsDf = dataSourceDict["2_1"]  # gSheets dataframe (first sheet)

RPT_COL_DESCRIPTION_LINE_2 = 'Description Line 2'
RPT_COL_CCT_SOURCETEMPLATE = 'CCT_sourcetemplate'
RPT_COL_MODEL = 'Model'
RPT_COL_HEADLINE = 'Headline'
RPT_COL_CREATIVE_ID = 'Creative ID'
RPT_COL_INCENTIVE_NUMBER = 'Incentive Number'
BULK_COL_DESCRIPTION_LINE_2 = 'Description Line 2'
BULK_COL_CCT_SOURCETEMPLATE = 'CCT_sourcetemplate'
BULK_COL_MODEL = 'Model'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_HEADLINE = 'Headline'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_INCENTIVE_NUMBER = 'Incentive Number'

# Merge the dataframes on specified columns
mergedDf = pd.merge(
    gSheetsDf, 
    inputDf, 
    left_on=['A', 'D', 'G'],  # Assuming 'D' refers to 'Group' in gSheetsDf
    right_on=['CCT_sourcetemplate', 'Model', 'Incentive Number'], 
    how='inner'  # Use 'inner' join to keep only matching rows
)

# Populate the Description Line 2 column with column F from gSheetsDf
mergedDf[BULK_COL_DESCRIPTION_LINE_2] = mergedDf['F']


# Filter out rows where RPT_COL_DESCRIPTION_LINE_2 matches BULK_COL_DESCRIPTION_LINE_2
#filteredDf = mergedDf[
#    mergedDf[RPT_COL_DESCRIPTION_LINE_2] != mergedDf[BULK_COL_DESCRIPTION_LINE_2]
#]

# Select only the required columns for the output
outputDf = mergedDf[
    [BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_GROUP, BULK_COL_CCT_SOURCETEMPLATE, BULK_COL_MODEL, BULK_COL_DESCRIPTION_LINE_2, BULK_COL_HEADLINE, BULK_COL_CREATIVE_ID, BULK_COL_INCENTIVE_NUMBER]
]

# Inspect the output
skip_output_validations = True
print(outputDf)

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus