Script 1621: CCT Incentives Automation

Purpose

The Python script automates the process of merging and filtering data from two sources to prepare a structured output for CCT incentives.

To Elaborate

The Python script is designed to automate the process of handling data related to CCT incentives by merging information from two different data sources. It specifically focuses on combining data from a primary input source and a reference Google Sheets source. The script performs an inner join on specified columns to ensure that only matching rows from both data sources are retained. After merging, it populates a specific column with data from the Google Sheets source and prepares a structured output by selecting only the necessary columns. This automation helps streamline the process of preparing data for CCT incentives, ensuring consistency and accuracy in the final output.

Walking Through the Code

  1. Data Preparation
    • The script begins by defining the primary and reference data sources, which are stored in a dictionary called dataSourceDict.
    • It retrieves data from these sources into dataframes inputDf and gSheetsDf.
  2. Data Merging
    • The script merges the two dataframes using an inner join on specified columns, ensuring that only rows with matching values in both dataframes are retained.
    • The merge operation is performed on columns that correspond to ‘CCT_sourcetemplate’, ‘Model’, and ‘Incentive Number’ from the input data and ‘A’, ‘D’, ‘G’ from the Google Sheets data.
  3. Data Population and Filtering
    • After merging, the script populates the ‘Description Line 2’ column in the merged dataframe with data from column ‘F’ of the Google Sheets dataframe.
    • Although there is a commented-out section for filtering rows based on a condition, it is not executed in the current script.
  4. Output Preparation
    • The script selects specific columns from the merged dataframe to prepare the final output dataframe, outputDf.
    • The selected columns include account, campaign, group, CCT_sourcetemplate, model, description line 2, headline, creative ID, and incentive number.
  5. Output Inspection
    • The script sets a flag to skip output validations and prints the final output dataframe for inspection.

Vitals

  • Script ID : 1621
  • Client ID / Customer ID: 1306926957 / 60270165
  • Action Type: Bulk Upload
  • Item Changed: Ad
  • Output Columns: Account, Campaign, Group, Creative ID, Description Line 2, CCT_sourcetemplate, Model, Headline
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2025-01-15 10:48
  • Last Updated by Tom McCaughey on 2025-01-29 09:58
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]

# reference data source and columns
gSheetsDf = dataSourceDict["2_1"]  # gSheets dataframe (first sheet)

RPT_COL_DESCRIPTION_LINE_2 = 'Description Line 2'
RPT_COL_CCT_SOURCETEMPLATE = 'CCT_sourcetemplate'
RPT_COL_MODEL = 'Model'
RPT_COL_HEADLINE = 'Headline'
RPT_COL_CREATIVE_ID = 'Creative ID'
RPT_COL_INCENTIVE_NUMBER = 'Incentive Number'
BULK_COL_DESCRIPTION_LINE_2 = 'Description Line 2'
BULK_COL_CCT_SOURCETEMPLATE = 'CCT_sourcetemplate'
BULK_COL_MODEL = 'Model'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_HEADLINE = 'Headline'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_INCENTIVE_NUMBER = 'Incentive Number'

# Merge the dataframes on specified columns
mergedDf = pd.merge(
    gSheetsDf, 
    inputDf, 
    left_on=['A', 'D', 'G'],  # Assuming 'D' refers to 'Group' in gSheetsDf
    right_on=['CCT_sourcetemplate', 'Model', 'Incentive Number'], 
    how='inner'  # Use 'inner' join to keep only matching rows
)

# Populate the Description Line 2 column with column F from gSheetsDf
mergedDf[BULK_COL_DESCRIPTION_LINE_2] = mergedDf['F']


# Filter out rows where RPT_COL_DESCRIPTION_LINE_2 matches BULK_COL_DESCRIPTION_LINE_2
#filteredDf = mergedDf[
#    mergedDf[RPT_COL_DESCRIPTION_LINE_2] != mergedDf[BULK_COL_DESCRIPTION_LINE_2]
#]

# Select only the required columns for the output
outputDf = mergedDf[
    [BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_GROUP, BULK_COL_CCT_SOURCETEMPLATE, BULK_COL_MODEL, BULK_COL_DESCRIPTION_LINE_2, BULK_COL_HEADLINE, BULK_COL_CREATIVE_ID, BULK_COL_INCENTIVE_NUMBER]
]

# Inspect the output
skip_output_validations = True
print(outputDf)

Post generated on 2025-02-21 10:25:25 GMT

comments powered by Disqus