Script 1091: latency

Purpose

Python script to apply a latency curve adjustment to GEM data.

To Elaborate

This Python script is designed to ingest GEM data and apply a latency curve adjustment to the data. The latency curve adjustment is based on a set of parameters and is used to adjust the conversion and revenue values in the data.

The key business rules of this script are as follows:

  • The script filters the input data to only include rows related to a specific conversion type.
  • The script converts certain columns in the input data to string type and removes “.0” from the end of each value.
  • The script ensures that the ‘Date’ column in the input data is a datetime series with timezone information.
  • The script calculates a latency factor based on the difference between the ‘Date’ column and the current date.
  • The script limits the calculated latency factor to a maximum of 1 based on the days difference.
  • The script creates output columns and assigns values to them based on the input data and the calculated latency factor.
  • The script removes duplicate rows from the output data.

Walking Through the Code

  1. The script defines constants for device types and match types.
  2. The script initializes the ‘today’ variable with the current date.
  3. The script assigns the primary data source to the ‘inputDf’ variable.
  4. The script filters the ‘inputDf’ to only include rows related to the ‘NewMCCConversionsPurchase’ conversion type and assigns the filtered data to the ‘inputDf_filtered’ variable.
  5. The script converts the ‘Creative ID’, ‘Group ID’, ‘Campaign ID’, and ‘Product ID’ columns in the ‘inputDf_filtered’ to string type and removes “.0” from the end of each value.
  6. The script ensures that the ‘Date’ column in the ‘inputDf_filtered’ is a datetime series with timezone information and assigns the result to the ‘DateStamp’ column.
  7. The script converts the ‘today’ variable to a pandas Timestamp with the same timezone as the ‘DateStamp’ column and assigns it to the ‘today_timestamp’ variable.
  8. The script calculates the difference in days between ‘today_timestamp’ and ‘DateStamp’ and assigns it to the ‘Date_diff’ variable.
  9. The script defines parameters for the latency curve.
  10. The script calculates the latency factor based on the ‘Date_diff’ and the latency curve parameters and assigns it to the ‘calc_latency_factor’ variable.
  11. The script limits the ‘calc_latency_factor’ to a maximum of 1 based on the ‘Date_diff’.
  12. The script creates output columns and assigns values to them based on the input data and the ‘calc_latency_factor’.
  13. The script removes duplicate rows from the ‘outputDf’.
  14. The script sets the ‘skip_output_validations’ variable to True.
  15. The script prints the tableized version of the first few rows of the ‘inputDf_filtered’.

Vitals

  • Script ID : 1091
  • Client ID / Customer ID: 1306927641 / 64302
  • Action Type: Revenue Upload
  • Item Changed: None
  • Output Columns: Date, Keyword ID, Creative ID, Group ID, Campaign ID, Product ID, Target IDs, Currency, Device, Comments, New - MCC Conversions - Purchase (Click Date) Conv, New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Conv, New - MCC Conversions - Purchase (Click Date) Rev, New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Rev
  • Linked Datasource: FTP/Email Feed
  • Reference Datasource: None
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2024-05-14 12:47
  • Last Updated by Tom McCaughey on 2024-05-14 12:47
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
##
## name: latency test
## description: ingest GEM data and apply a latency curve adjustment as per https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit?usp=sharing
##  
## 
## author: Tom McCaughey
## created: 2024-04-30
## 

DEVICE = {
  'MOBILE': 'm',
  'DESKTOP': 'c',
  'TABLET': 't',
}
MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]

#optional filter to only include the source rows relating to the conversion type we want
inputDf_filtered = inputDf[inputDf['Conversion Type'] == 'NewMCCConversionsPurchase'].copy()

# Convert 'Creative ID' column to string
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].astype(str)

# Remove ".0" from the end of each value
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].str.replace('\.0$', '', regex=True)

# Convert 'Group ID' column to string
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].str.replace('\.0$', '', regex=True)

# Convert 'Campaign ID' column to string
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].str.replace('\.0$', '', regex=True)

# Convert 'Product ID' column to string
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].str.replace('\.0$', '', regex=True)

# Ensure 'inputDf_filtered['Date']' is a datetime Series with timezone information
inputDf_filtered['DateStamp'] = pd.to_datetime(inputDf_filtered['Date']).dt.tz_localize(CLIENT_TIMEZONE)


# Convert 'today' to a pandas Timestamp with the same timezone as 'inputDf_filtered['Date']'
today_timestamp = pd.Timestamp(today, tz=CLIENT_TIMEZONE)
Date_diff = (today_timestamp - inputDf_filtered['DateStamp']).dt.days

#params for latency curve as per - https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit

latency_D = 1.696
latency_C = 19.710
latency_B = 0.791

calc_latency_factor = latency_D/(1+(Date_diff/latency_C)**(-latency_B))

#limit calc_latency_factor to a max of 1 based on days difference
calc_latency_factor[Date_diff > 29] = 1

calc_latency_factor[calc_latency_factor > 1] = 1

# output columns and initial values
REVENUE_COL_DATE = 'Date'
REVENUE_COL_CURRENCY = 'Currency'
REVENUE_COL_DEVICE = 'Device'
REVENUE_COL_KEYWORD_ID = 'Keyword ID'
REVENUE_COL_CREATIVE_ID = 'Creative ID'
##REVENUE_COL_KEYWORD = 'Keyword'
##REVENUE_COL_MATCH_TYPE = 'Match Type'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_CAMPAIGN_ID = 'Campaign ID'
REVENUE_COL_PRODUCT_ID = 'Product ID'
##REVENUE_COL_SL_ID = 'SL ID'
##REVENUE_COL_ADEXTENSION_ID = 'AdExtension ID'
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_COMMENTS = 'Comments'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_CONV = 'New - MCC Conversions - Purchase Conv'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_REV = 'New - MCC Conversions - Purchase Rev'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_CONV = 'New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Conv'
REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_REV = 'New - MCC Conversions - Purchase - Latency Adjusted (Click Date) Rev'
outputDf[REVENUE_COL_DATE] = inputDf_filtered['Date']
outputDf[REVENUE_COL_CURRENCY] = inputDf_filtered['Currency']
outputDf[REVENUE_COL_DEVICE] = inputDf_filtered['Device']
outputDf[REVENUE_COL_KEYWORD_ID] = ""
outputDf[REVENUE_COL_CREATIVE_ID] = inputDf_filtered['Creative ID']
##outputDf[REVENUE_COL_KEYWORD] = inputDf_filtered['Keyword']
##outputDf[REVENUE_COL_MATCH_TYPE] = inputDf['Match Type']
outputDf[REVENUE_COL_GROUP_ID] = inputDf_filtered['Group ID']
outputDf[REVENUE_COL_CAMPAIGN_ID] = inputDf_filtered['Campaign ID']
outputDf[REVENUE_COL_PRODUCT_ID] = inputDf_filtered['Product ID']
##outputDf[REVENUE_COL_SL_ID] = inputDf_filtered['SL ID']
##outputDf[REVENUE_COL_ADEXTENSION_ID] = inputDf_filtered['AdExtension ID']
outputDf[REVENUE_COL_TARGET_IDS] = inputDf_filtered['Target IDs']
outputDf[REVENUE_COL_COMMENTS] = "Date difference = " + Date_diff.astype(str) +", calc_latency_factor is " + calc_latency_factor.astype(str)
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_CONV] = inputDf_filtered['Conversions']
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE_CLICK_DATE_REV] = inputDf_filtered['Revenue']
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_CONV] = (inputDf_filtered['Conversions'] /calc_latency_factor)
outputDf[REVENUE_COL_NEW__MCC_CONVERSIONS__PURCHASE__LATENCY_ADJUSTED_CLICK_DATE_REV] = (inputDf_filtered['Revenue'] /calc_latency_factor)

#remove duplicates
outputDf = outputDf.drop_duplicates()

#allow for custom output columns
skip_output_validations = True

# user code start here
print(tableize(inputDf_filtered.head()))

Post generated on 2024-05-15 07:44:05 GMT

comments powered by Disqus