Script 1509: Keyword Click weighted Quality Calls

Purpose

The Python script attributes quality call conversions to keywords based on their click weight.

To Elaborate

The script is designed to allocate quality call conversions to specific keywords by calculating a click-weighted metric. This involves processing data from two sources: a primary data source containing keyword-level information and a reference data source with group-level quality call conversion data. The script calculates the proportion of clicks each keyword contributes to its ad group and uses this ratio to distribute the total quality call conversions of the group among its keywords. The result is a dataset that attributes a portion of the quality call conversions to each keyword, based on its relative click contribution within its ad group. This allows for a more granular understanding of keyword performance in terms of driving quality calls.

Walking Through the Code

  1. Initialization and Setup
    • The script begins by determining whether it is running on a server or locally. If running locally, it loads a pickled data source dictionary (dataSourceDict) from a specified file path.
    • It sets up the environment by importing necessary libraries such as pandas, numpy, and others, and configures display settings for pandas.
  2. Data Preparation
    • The script extracts data from dataSourceDict into two DataFrames: inputDf for keyword-level data and reportDf for group-level data.
    • It drops unnecessary columns from inputDf and renames columns in reportDf to avoid conflicts.
  3. Calculation of Click Ratios
    • The script calculates the total number of clicks for each ad group and computes the click ratio for each keyword by dividing its clicks by the total clicks of its ad group.
  4. Merging and Calculation of Click-weighted Quality Call Conversions
    • It merges inputDf with reportDf to associate group-level quality call conversions with each keyword.
    • The script calculates the click-weighted quality call conversions for each keyword by multiplying the group’s quality call conversions by the keyword’s click ratio.
  5. Output Preparation
    • The script filters the results to include only keywords with non-zero click-weighted quality call conversions.
    • It formats the output DataFrame to include necessary columns and renames them for revenue upload purposes.
    • If running locally, it saves the output and debug data to CSV files for further analysis.

Vitals

  • Script ID : 1509
  • Client ID / Customer ID: 1306928013 / 60270419
  • Action Type: Revenue Upload
  • Item Changed: None
  • Output Columns: Date, Group ID, Target IDs, Click Wtd Quality Call Conv
  • Linked Datasource: M1 Report
  • Reference Datasource: M1 Report
  • Owner: Michael Huang (mhuang@marinsoftware.com)
  • Created by Michael Huang on 2024-11-09 08:47
  • Last Updated by Michael Huang on 2024-11-14 22:45
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
##
## name: Keyword Click-weighted Quality Calls
## description:
##  Attribute Quality Call Conv to Keywords via Click Weight
## 
## author: Michael S. Huang
## created: 2024-11-09
## 

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/mass_general_quality_call_attribution_20241115.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
  # Chicago Timezone is GMT-5. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-5))

    # import pandas
    import pandas as pd
    import numpy as np

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed

    # pandas settings
    pd.set_option('display.max_columns', None)  # Display all columns
    pd.set_option('display.max_colwidth', None)  # Display full content of each column

else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_DATE = 'Date'
RPT_COL_KEYWORD_ID = 'Keyword ID'
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_GROUP_PUB_ID = 'Group Pub. ID'
RPT_COL_CAMPAIGN_PUB_ID = 'Campaign Pub. ID'
RPT_COL_GROUP = 'Group'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_IMPR = 'Impr.'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CLICK_WTD_QUALITY_CALL_CONV = 'Click Wtd Quality Call Conv.'

# reference data source and columns
reportDf = dataSourceDict["2"]  # report dataframe
RPT_COL_GROUP = 'Group'
RPT_COL_DATE = 'Date'
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_GROUP_ID = 'Group ID'
RPT_COL_CAMPAIGN_PUB_ID = 'Campaign Pub. ID'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_QUALITY_CALLS_CONV = 'Quality Calls Conv.'

# output columns and initial values
REVENUE_COL_DATE = 'Date'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_CLICK_WTD_QUALITY_CALL_CONV = 'Click Wtd Quality Call Conv'
outputDf[REVENUE_COL_DATE] = "<<YOUR VALUE>>"
outputDf[REVENUE_COL_GROUP_ID] = "<<YOUR VALUE>>"
outputDf[REVENUE_COL_TARGET_IDS] = "<<YOUR VALUE>>"
outputDf[REVENUE_COL_CLICK_WTD_QUALITY_CALL_CONV] = "<<YOUR VALUE>>"

COL_CLICK_RATIO = 'click_ratio'

####  user code starts here

print("keyword report shape", inputDf.shape)
print("keyword sample\n", tableize(inputDf.head(5)))
print("group report shape", reportDf.shape)
print("group sample\n", tableize(reportDf.head(5)))

## cleanup
# drop extra columns used for debugging
inputDf = inputDf.drop(columns=[RPT_COL_CLICK_WTD_QUALITY_CALL_CONV])
# rename colliding columns
reportDf = reportDf.rename(columns={RPT_COL_PUB_ID: RPT_COL_GROUP_PUB_ID})


## calculate click-weight for each adgroup
# Calculate the total clicks for each AdGroup
adgroup_clicks = inputDf.groupby([RPT_COL_CAMPAIGN_PUB_ID, RPT_COL_GROUP_PUB_ID, RPT_COL_DATE])[RPT_COL_CLICKS].transform('sum')
# Calculate the click ratio for each keyword
inputDf[COL_CLICK_RATIO] = inputDf[RPT_COL_CLICKS] / adgroup_clicks

## calculate click-weights
# Merge inputDf with reportDf to get the RPT_COL_QUALITY_CALLS_CONV for each GROUP_PUB_ID
mergedDf = inputDf.merge(reportDf[[RPT_COL_CAMPAIGN_PUB_ID, RPT_COL_GROUP_PUB_ID, RPT_COL_GROUP_ID, RPT_COL_DATE, RPT_COL_QUALITY_CALLS_CONV]], 
                         left_on=[RPT_COL_CAMPAIGN_PUB_ID, RPT_COL_GROUP_PUB_ID, RPT_COL_DATE], 
                         right_on=[RPT_COL_CAMPAIGN_PUB_ID, RPT_COL_GROUP_PUB_ID, RPT_COL_DATE],
                         how='inner')

print("mergedDf shape", mergedDf.shape)
print("mergedDf sample\n", mergedDf.head(5))

# Calculate the keyword-level Call Quality Conv
mergedDf[RPT_COL_CLICK_WTD_QUALITY_CALL_CONV] = mergedDf[RPT_COL_QUALITY_CALLS_CONV] * mergedDf[COL_CLICK_RATIO]

debugDf = mergedDf.round(2)

# only output rows with non-zero click-weighted quality call conv
output_cols = [RPT_COL_DATE, RPT_COL_GROUP_PUB_ID, RPT_COL_PUB_ID, RPT_COL_CLICK_WTD_QUALITY_CALL_CONV]
outputDf = mergedDf.loc[mergedDf[RPT_COL_CLICK_WTD_QUALITY_CALL_CONV] > 0, output_cols].round(2)

# TargetID for keywords need prefix `kwd-`
#     outputCol['Target IDs'] = 'kwd-' + inputCol['Pub. ID']
outputDf[RPT_COL_PUB_ID] = 'kwd-' + outputDf[RPT_COL_PUB_ID].astype(str)

# rename columns for Revenue Upload
outputDf = outputDf.rename(columns={
    RPT_COL_DATE: REVENUE_COL_DATE,
    RPT_COL_GROUP_PUB_ID: REVENUE_COL_GROUP_ID,
    RPT_COL_PUB_ID: REVENUE_COL_TARGET_IDS,
    RPT_COL_CLICK_WTD_QUALITY_CALL_CONV: REVENUE_COL_CLICK_WTD_QUALITY_CALL_CONV
})

# DEBUG: limit to one group
# outputDf = outputDf.loc[outputDf[REVENUE_COL_GROUP_ID] == 151266388237]

print("outputDf.shape", outputDf.shape)

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")
else:
    print("outputDf", outputDf.head(5))

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus