Script 1091: latency
Purpose:
The Python script processes GEM data to apply a latency curve adjustment for conversion and revenue metrics based on a specified latency model.
To Elaborate
The Python script is designed to ingest data from a GEM source and apply a latency curve adjustment to conversion and revenue metrics. This adjustment is based on a predefined latency model, which is detailed in an external document. The script filters the data to include only rows related to a specific conversion type, ‘NewMCCConversionsPurchase’. It then processes several columns to ensure they are in the correct format, particularly converting IDs to strings and removing unnecessary characters. The script calculates a latency factor using parameters from the latency model, which adjusts the conversion and revenue values based on the number of days since the event occurred. The adjusted values are then stored in an output DataFrame, which is cleaned of duplicates and prepared for further analysis or reporting.
Walking Through the Code
- Data Ingestion and Filtering
- The script begins by ingesting data from a primary source and filters it to include only rows related to ‘NewMCCConversionsPurchase’ conversion type.
- It ensures that certain columns, such as ‘Creative ID’, ‘Group ID’, ‘Campaign ID’, and ‘Product ID’, are converted to strings and cleans them by removing “.0” from the end of each value.
- Date Processing and Latency Calculation
- The script converts the ‘Date’ column to a datetime format with timezone information.
- It calculates the difference in days between the current date and the event date.
- Using predefined parameters (
latency_D
,latency_C
,latency_B
), it computes a latency factor that adjusts conversion and revenue metrics based on the days difference.
- Output Preparation
- The script prepares an output DataFrame, copying relevant columns from the filtered input data.
- It applies the latency adjustment to conversion and revenue metrics, ensuring the adjusted values do not exceed a maximum limit.
- The output DataFrame is cleaned of duplicates and includes comments detailing the date difference and calculated latency factor.
- User Customization
- The script allows for custom output columns and skips output validations, providing flexibility for user-specific requirements.
- Users can modify parameters related to the latency model to adjust the calculations as needed.
Vitals
- Script ID : 1091
- Client ID / Customer ID: 1306927641 / 64302
- Action Type: Revenue Upload
- Item Changed: None
- Output Columns: Date, Keyword ID, Creative ID, Group ID, Campaign ID, Product ID, Target IDs, Currency, Device, Comments, New - MCC Conversions - Purchase (Click Date) Conv, New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Conv, New - MCC Conversions - Purchase (Click Date) Rev, New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Rev
- Linked Datasource: FTP/Email Feed
- Reference Datasource: None
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2024-05-14 12:47
- Last Updated by Tom McCaughey on 2024-05-16 08:57
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
##
## name: latency test
## description: ingest GEM data and apply a latency curve adjustment as per https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit?usp=sharing
##
##
## author: Tom McCaughey
## created: 2024-04-30
##
DEVICE = {
'MOBILE': 'm',
'DESKTOP': 'c',
'TABLET': 't',
}
MATCH_TYPE = {
'EXACT': 'exact',
'PHRASE': 'phrase',
'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
#optional filter to only include the source rows relating to the conversion type we want
inputDf_filtered = inputDf[inputDf['Conversion Type'] == 'NewMCCConversionsPurchase'].copy()
# Convert 'Creative ID' column to string
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].str.replace('\.0$', '', regex=True)
# Convert 'Group ID' column to string
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].str.replace('\.0$', '', regex=True)
# Convert 'Campaign ID' column to string
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].str.replace('\.0$', '', regex=True)
# Convert 'Product ID' column to string
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].str.replace('\.0$', '', regex=True)
# Ensure 'inputDf_filtered['Date']' is a datetime Series with timezone information
inputDf_filtered['DateStamp'] = pd.to_datetime(inputDf_filtered['Date']).dt.tz_localize(CLIENT_TIMEZONE)
# Convert 'today' to a pandas Timestamp with the same timezone as 'inputDf_filtered['Date']'
today_timestamp = pd.Timestamp(today, tz=CLIENT_TIMEZONE)
Date_diff = (today_timestamp - inputDf_filtered['DateStamp']).dt.days
#params for latency curve as per - https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit
latency_D = 1.297
latency_C = 2.179
latency_B = 0.457
calc_latency_factor = latency_D/(1+(Date_diff/latency_C)**(-latency_B))
#limit calc_latency_factor to a max of 1 based on days difference
calc_latency_factor[Date_diff > 29] = 1
calc_latency_factor[calc_latency_factor > 1] = 1
# output columns and initial values
REVENUE_COL_DATE = 'Date'
REVENUE_COL_CURRENCY = 'Currency'
REVENUE_COL_DEVICE = 'Device'
REVENUE_COL_KEYWORD_ID = 'Keyword ID'
REVENUE_COL_CREATIVE_ID = 'Creative ID'
##REVENUE_COL_KEYWORD = 'Keyword'
##REVENUE_COL_MATCH_TYPE = 'Match Type'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_CAMPAIGN_ID = 'Campaign ID'
REVENUE_COL_PRODUCT_ID = 'Product ID'
##REVENUE_COL_SL_ID = 'SL ID'
##REVENUE_COL_ADEXTENSION_ID = 'AdExtension ID'
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_COMMENTS = 'Comments'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_CONV = 'New - MCC Conversions - Purchase Conv'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_REV = 'New - MCC Conversions - Purchase Rev'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_CONV = 'New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Conv'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_REV = 'New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Rev'
outputDf[REVENUE_COL_DATE] = inputDf_filtered['Date']
outputDf[REVENUE_COL_CURRENCY] = inputDf_filtered['Currency']
outputDf[REVENUE_COL_DEVICE] = inputDf_filtered['Device']
outputDf[REVENUE_COL_KEYWORD_ID] = ""
outputDf[REVENUE_COL_CREATIVE_ID] = inputDf_filtered['Creative ID']
##outputDf[REVENUE_COL_KEYWORD] = inputDf_filtered['Keyword']
##outputDf[REVENUE_COL_MATCH_TYPE] = inputDf['Match Type']
outputDf[REVENUE_COL_GROUP_ID] = inputDf_filtered['Group ID']
outputDf[REVENUE_COL_CAMPAIGN_ID] = inputDf_filtered['Campaign ID']
outputDf[REVENUE_COL_PRODUCT_ID] = inputDf_filtered['Product ID']
##outputDf[REVENUE_COL_SL_ID] = inputDf_filtered['SL ID']
##outputDf[REVENUE_COL_ADEXTENSION_ID] = inputDf_filtered['AdExtension ID']
outputDf[REVENUE_COL_TARGET_IDS] = inputDf_filtered['Target IDs']
outputDf[REVENUE_COL_COMMENTS] = "Date difference = " + Date_diff.astype(str) +", calc_latency_factor is " + calc_latency_factor.astype(str)
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_CONV] = inputDf_filtered['Conversions']
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_REV] = inputDf_filtered['Revenue']
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_CONV] = (inputDf_filtered['Conversions'] /calc_latency_factor)
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_REV] = (inputDf_filtered['Revenue'] /calc_latency_factor)
#remove duplicates
outputDf = outputDf.drop_duplicates()
#allow for custom output columns
skip_output_validations = True
# user code start here
print(tableize(inputDf_filtered.head()))
Post generated on 2025-03-11 01:25:51 GMT