Script 1089: latency
Purpose
Python script to apply a latency curve adjustment to GEM data.
To Elaborate
This Python script is designed to ingest GEM data and apply a latency curve adjustment to the data. The latency curve adjustment is based on a set of parameters defined in a Google document. The script then outputs the adjusted data.
Walking Through the Code
- The script starts by defining constants for device types and match types.
- The current date is obtained using the
datetime.datetime.now()
function. - The primary data source is assigned to the variable
inputDf
. - An optional filter is applied to
inputDf
to only include rows related to a specific conversion type. - The ‘Creative ID’ column in
inputDf_filtered
is converted to a string and any trailing “.0” is removed from each value. - The ‘Group ID’ column in
inputDf_filtered
is converted to a string and any trailing “.0” is removed from each value. - The ‘Campaign ID’ column in
inputDf_filtered
is converted to a string and any trailing “.0” is removed from each value. - The ‘Product ID’ column in
inputDf_filtered
is converted to a string and any trailing “.0” is removed from each value. - The ‘Date’ column in
inputDf_filtered
is converted to a datetime series with timezone information. - The
today
variable is converted to a pandas Timestamp with the same timezone as the ‘Date’ column ininputDf_filtered
. - The difference in days between
today_timestamp
andinputDf_filtered['DateStamp']
is calculated and assigned toDate_diff
. - Parameters for the latency curve adjustment are defined.
- The latency factor is calculated based on the parameters and the
Date_diff
. - The latency factor is limited to a maximum of 1 if the
Date_diff
is greater than 29. - The latency factor is limited to a maximum of 1 if it is greater than 1.
- Output columns and initial values are defined.
- The output dataframe is populated with the adjusted data.
- Duplicates are removed from the output dataframe.
- Custom output columns are allowed if
skip_output_validations
is set to True. - The
tableize
function is used to print the first few rows of theinputDf_filtered
dataframe.
Vitals
- Script ID : 1089
- Client ID / Customer ID: 1306927637 / 64302
- Action Type: Revenue Upload
- Item Changed: None
- Output Columns: Date, Keyword ID, Creative ID, Group ID, Campaign ID, Product ID, Target IDs, Currency, Device, Comments, Google Conv.Code New (Click Date) Conv, Google Conv.Code New (Click Date) Rev, Google Conv.Code New - Latency Adjusted (Click Date) Conv, Google Conv.Code New - Latency Adjusted (Click Date) Rev
- Linked Datasource: FTP/Email Feed
- Reference Datasource: None
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2024-05-14 12:42
- Last Updated by Tom McCaughey on 2024-05-14 12:42
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
##
## name: latency test
## description: ingest GEM data and apply a latency curve adjustment as per https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit?usp=sharing
##
##
## author: Tom McCaughey
## created: 2024-04-30
##
DEVICE = {
'MOBILE': 'm',
'DESKTOP': 'c',
'TABLET': 't',
}
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
#optional filter to only include the source rows relating to the conversion type we want
inputDf_filtered = inputDf[inputDf['Conversion Type'] == 'GoogleConvCodeNew'].copy()
# Convert 'Creative ID' column to string
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].str.replace('\.0$', '', regex=True)
# Convert 'Group ID' column to string
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].str.replace('\.0$', '', regex=True)
# Convert 'Campaign ID' column to string
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].str.replace('\.0$', '', regex=True)
# Convert 'Product ID' column to string
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].str.replace('\.0$', '', regex=True)
# Ensure 'inputDf_filtered['Date']' is a datetime Series with timezone information
inputDf_filtered['DateStamp'] = pd.to_datetime(inputDf_filtered['Date']).dt.tz_localize(CLIENT_TIMEZONE)
# Convert 'today' to a pandas Timestamp with the same timezone as 'inputDf_filtered['Date']'
today_timestamp = pd.Timestamp(today, tz=CLIENT_TIMEZONE)
Date_diff = (today_timestamp - inputDf_filtered['DateStamp']).dt.days
#params for latency curve as per - https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit
latency_D = 1
latency_C = 1
latency_B = 1
calc_latency_factor = latency_D/(1+(Date_diff/latency_C)**(-latency_B))
#limit calc_latency_factor to a max of 1 based on days difference
calc_latency_factor[Date_diff > 29] = 1
calc_latency_factor[calc_latency_factor > 1] = 1
# output columns and initial values
REVENUE_COL_DATE = 'Date'
REVENUE_COL_CURRENCY = 'Currency'
REVENUE_COL_DEVICE = 'Device'
REVENUE_COL_KEYWORD_ID = 'Keyword ID'
REVENUE_COL_CREATIVE_ID = 'Creative ID'
##REVENUE_COL_KEYWORD = 'Keyword'
##REVENUE_COL_MATCH_TYPE = 'Match Type'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_CAMPAIGN_ID = 'Campaign ID'
REVENUE_COL_PRODUCT_ID = 'Product ID'
##REVENUE_COL_SL_ID = 'SL ID'
##REVENUE_COL_ADEXTENSION_ID = 'AdExtension ID'
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_COMMENTS = 'Comments'
REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_CONV = 'Google Conv.Code New Conv'
REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_REV = 'Google Conv.Code New Rev'
REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_CONV = 'Google Conv.Code New - Latency Adjusted (Click Date) Conv'
REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_REV = 'Google Conv.Code New - Latency Adjusted (Click Date) Rev'
outputDf[REVENUE_COL_DATE] = inputDf_filtered['Date']
outputDf[REVENUE_COL_CURRENCY] = inputDf_filtered['Currency']
outputDf[REVENUE_COL_DEVICE] = inputDf_filtered['Device']
outputDf[REVENUE_COL_KEYWORD_ID] = ""
outputDf[REVENUE_COL_CREATIVE_ID] = inputDf_filtered['Creative ID']
##outputDf[REVENUE_COL_KEYWORD] = inputDf_filtered['Keyword']
##outputDf[REVENUE_COL_MATCH_TYPE] = inputDf['Match Type']
outputDf[REVENUE_COL_GROUP_ID] = inputDf_filtered['Group ID']
outputDf[REVENUE_COL_CAMPAIGN_ID] = inputDf_filtered['Campaign ID']
outputDf[REVENUE_COL_PRODUCT_ID] = inputDf_filtered['Product ID']
##outputDf[REVENUE_COL_SL_ID] = inputDf_filtered['SL ID']
##outputDf[REVENUE_COL_ADEXTENSION_ID] = inputDf_filtered['AdExtension ID']
outputDf[REVENUE_COL_TARGET_IDS] = inputDf_filtered['Target IDs']
outputDf[REVENUE_COL_COMMENTS] = "Date difference = " + Date_diff.astype(str) +", calc_latency_factor is " + calc_latency_factor.astype(str)
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_CONV] = inputDf_filtered['Conversions']
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_REV] = inputDf_filtered['Revenue']
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_CONV] = (inputDf_filtered['Conversions'] /calc_latency_factor)
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_REV] = (inputDf_filtered['Revenue'] /calc_latency_factor)
#remove duplicates
outputDf = outputDf.drop_duplicates()
#allow for custom output columns
skip_output_validations = True
# user code start here
print(tableize(inputDf_filtered.head()))
Post generated on 2024-05-15 07:44:05 GMT