Script 1091: latency
Purpose
Python script to apply a latency curve adjustment to GEM data.
To Elaborate
This Python script is designed to ingest GEM data and apply a latency curve adjustment to the data. The latency curve adjustment is based on a set of parameters and is used to adjust the conversion and revenue values in the data.
The key business rules of this script are as follows:
- The script filters the input data to only include rows related to a specific conversion type.
- The script converts certain columns in the input data to string type and removes “.0” from the end of each value.
- The script ensures that the ‘Date’ column in the input data is a datetime series with timezone information.
- The script calculates a latency factor based on the difference between the ‘Date’ column and the current date.
- The script limits the calculated latency factor to a maximum of 1 based on the days difference.
- The script creates output columns and assigns values to them based on the input data and the calculated latency factor.
- The script removes duplicate rows from the output data.
Walking Through the Code
- The script defines constants for device types and match types.
- The script initializes the ‘today’ variable with the current date.
- The script assigns the primary data source to the ‘inputDf’ variable.
- The script filters the ‘inputDf’ to only include rows related to the ‘NewMCCConversionsPurchase’ conversion type and assigns the filtered data to the ‘inputDf_filtered’ variable.
- The script converts the ‘Creative ID’, ‘Group ID’, ‘Campaign ID’, and ‘Product ID’ columns in the ‘inputDf_filtered’ to string type and removes “.0” from the end of each value.
- The script ensures that the ‘Date’ column in the ‘inputDf_filtered’ is a datetime series with timezone information and assigns the result to the ‘DateStamp’ column.
- The script converts the ‘today’ variable to a pandas Timestamp with the same timezone as the ‘DateStamp’ column and assigns it to the ‘today_timestamp’ variable.
- The script calculates the difference in days between ‘today_timestamp’ and ‘DateStamp’ and assigns it to the ‘Date_diff’ variable.
- The script defines parameters for the latency curve.
- The script calculates the latency factor based on the ‘Date_diff’ and the latency curve parameters and assigns it to the ‘calc_latency_factor’ variable.
- The script limits the ‘calc_latency_factor’ to a maximum of 1 based on the ‘Date_diff’.
- The script creates output columns and assigns values to them based on the input data and the ‘calc_latency_factor’.
- The script removes duplicate rows from the ‘outputDf’.
- The script sets the ‘skip_output_validations’ variable to True.
- The script prints the tableized version of the first few rows of the ‘inputDf_filtered’.
Vitals
- Script ID : 1091
- Client ID / Customer ID: 1306927641 / 64302
- Action Type: Revenue Upload
- Item Changed: None
- Output Columns: Date, Keyword ID, Creative ID, Group ID, Campaign ID, Product ID, Target IDs, Currency, Device, Comments, New - MCC Conversions - Purchase (Click Date) Conv, New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Conv, New - MCC Conversions - Purchase (Click Date) Rev, New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Rev
- Linked Datasource: FTP/Email Feed
- Reference Datasource: None
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2024-05-14 12:47
- Last Updated by Tom McCaughey on 2024-05-14 12:47
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
##
## name: latency test
## description: ingest GEM data and apply a latency curve adjustment as per https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit?usp=sharing
##
##
## author: Tom McCaughey
## created: 2024-04-30
##
DEVICE = {
'MOBILE': 'm',
'DESKTOP': 'c',
'TABLET': 't',
}
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
#optional filter to only include the source rows relating to the conversion type we want
inputDf_filtered = inputDf[inputDf['Conversion Type'] == 'NewMCCConversionsPurchase'].copy()
# Convert 'Creative ID' column to string
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].str.replace('\.0$', '', regex=True)
# Convert 'Group ID' column to string
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].str.replace('\.0$', '', regex=True)
# Convert 'Campaign ID' column to string
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].str.replace('\.0$', '', regex=True)
# Convert 'Product ID' column to string
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].str.replace('\.0$', '', regex=True)
# Ensure 'inputDf_filtered['Date']' is a datetime Series with timezone information
inputDf_filtered['DateStamp'] = pd.to_datetime(inputDf_filtered['Date']).dt.tz_localize(CLIENT_TIMEZONE)
# Convert 'today' to a pandas Timestamp with the same timezone as 'inputDf_filtered['Date']'
today_timestamp = pd.Timestamp(today, tz=CLIENT_TIMEZONE)
Date_diff = (today_timestamp - inputDf_filtered['DateStamp']).dt.days
#params for latency curve as per - https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit
latency_D = 1.696
latency_C = 19.710
latency_B = 0.791
calc_latency_factor = latency_D/(1+(Date_diff/latency_C)**(-latency_B))
#limit calc_latency_factor to a max of 1 based on days difference
calc_latency_factor[Date_diff > 29] = 1
calc_latency_factor[calc_latency_factor > 1] = 1
# output columns and initial values
REVENUE_COL_DATE = 'Date'
REVENUE_COL_CURRENCY = 'Currency'
REVENUE_COL_DEVICE = 'Device'
REVENUE_COL_KEYWORD_ID = 'Keyword ID'
REVENUE_COL_CREATIVE_ID = 'Creative ID'
##REVENUE_COL_KEYWORD = 'Keyword'
##REVENUE_COL_MATCH_TYPE = 'Match Type'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_CAMPAIGN_ID = 'Campaign ID'
REVENUE_COL_PRODUCT_ID = 'Product ID'
##REVENUE_COL_SL_ID = 'SL ID'
##REVENUE_COL_ADEXTENSION_ID = 'AdExtension ID'
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_COMMENTS = 'Comments'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_CONV = 'New - MCC Conversions - Purchase Conv'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_REV = 'New - MCC Conversions - Purchase Rev'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_CONV = 'New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Conv'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_REV = 'New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Rev'
outputDf[REVENUE_COL_DATE] = inputDf_filtered['Date']
outputDf[REVENUE_COL_CURRENCY] = inputDf_filtered['Currency']
outputDf[REVENUE_COL_DEVICE] = inputDf_filtered['Device']
outputDf[REVENUE_COL_KEYWORD_ID] = ""
outputDf[REVENUE_COL_CREATIVE_ID] = inputDf_filtered['Creative ID']
##outputDf[REVENUE_COL_KEYWORD] = inputDf_filtered['Keyword']
##outputDf[REVENUE_COL_MATCH_TYPE] = inputDf['Match Type']
outputDf[REVENUE_COL_GROUP_ID] = inputDf_filtered['Group ID']
outputDf[REVENUE_COL_CAMPAIGN_ID] = inputDf_filtered['Campaign ID']
outputDf[REVENUE_COL_PRODUCT_ID] = inputDf_filtered['Product ID']
##outputDf[REVENUE_COL_SL_ID] = inputDf_filtered['SL ID']
##outputDf[REVENUE_COL_ADEXTENSION_ID] = inputDf_filtered['AdExtension ID']
outputDf[REVENUE_COL_TARGET_IDS] = inputDf_filtered['Target IDs']
outputDf[REVENUE_COL_COMMENTS] = "Date difference = " + Date_diff.astype(str) +", calc_latency_factor is " + calc_latency_factor.astype(str)
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_CONV] = inputDf_filtered['Conversions']
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_REV] = inputDf_filtered['Revenue']
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_CONV] = (inputDf_filtered['Conversions'] /calc_latency_factor)
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_REV] = (inputDf_filtered['Revenue'] /calc_latency_factor)
#remove duplicates
outputDf = outputDf.drop_duplicates()
#allow for custom output columns
skip_output_validations = True
# user code start here
print(tableize(inputDf_filtered.head()))
Post generated on 2024-05-15 07:44:05 GMT