Script 1089: latency

Purpose:

The Python script processes GEM data to apply a latency curve adjustment for conversion and revenue metrics based on a specified formula.

To Elaborate

The Python script is designed to ingest data from a source, specifically focusing on rows related to a particular conversion type, and apply a latency curve adjustment to conversion and revenue metrics. This adjustment is based on a formula that considers the difference in days between the current date and the date of each data entry. The script filters the data to include only relevant conversion types, cleans up certain columns by converting them to strings and removing unnecessary characters, and calculates a latency factor using predefined parameters. This factor is then used to adjust the conversion and revenue values, ensuring they reflect the latency effect. The adjusted values are stored in an output DataFrame, which is also cleaned of duplicates, and the script allows for custom output columns.

Walking Through the Code

  1. Data Ingestion and Filtering
    • The script begins by ingesting data from a primary source and filters it to include only rows related to the conversion type ‘GoogleConvCodeNew’.
    • It ensures that specific columns (‘Creative ID’, ‘Group ID’, ‘Campaign ID’, ‘Product ID’) are converted to strings and cleans them by removing any trailing “.0”.
  2. Date Processing and Latency Calculation
    • The script calculates the difference in days between the current date and the date of each entry in the filtered data.
    • It uses predefined parameters (latency_D, latency_C, latency_B) to compute a latency factor based on this date difference.
    • The latency factor is capped at a maximum of 1 for entries with a date difference greater than 29 days.
  3. Output Preparation
    • The script prepares an output DataFrame, populating it with relevant columns from the filtered data and applying the latency adjustment to conversion and revenue metrics.
    • It removes duplicate entries from the output DataFrame and allows for custom output columns, bypassing output validations.
  4. User Code Execution
    • Finally, the script prints a table representation of the first few rows of the filtered data for user inspection.

Vitals

  • Script ID : 1089
  • Client ID / Customer ID: 1306927637 / 64302
  • Action Type: Revenue Upload
  • Item Changed: None
  • Output Columns: Date, Keyword ID, Creative ID, Group ID, Campaign ID, Product ID, Target IDs, Currency, Device, Comments, Google Conv.Code New (Click Date) Conv, Google Conv.Code New (Click Date) Rev, Google Conv.Code New - Latency Adjusted (Click Date) Conv, Google Conv.Code New - Latency Adjusted (Click Date) Rev
  • Linked Datasource: FTP/Email Feed
  • Reference Datasource: None
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2024-05-14 12:42
  • Last Updated by Tom McCaughey on 2024-05-16 08:54
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
##
## name: latency test
## description: ingest GEM data and apply a latency curve adjustment as per https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit?usp=sharing
##  
## 
## author: Tom McCaughey
## created: 2024-04-30
## 

DEVICE = {
  'MOBILE': 'm',
  'DESKTOP': 'c',
  'TABLET': 't',
}
MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]

#optional filter to only include the source rows relating to the conversion type we want
inputDf_filtered = inputDf[inputDf['Conversion Type'] == 'GoogleConvCodeNew'].copy()

# Convert 'Creative ID' column to string
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].astype(str)

# Remove ".0" from the end of each value
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].str.replace('\.0$', '', regex=True)

# Convert 'Group ID' column to string
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].str.replace('\.0$', '', regex=True)

# Convert 'Campaign ID' column to string
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].str.replace('\.0$', '', regex=True)

# Convert 'Product ID' column to string
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].str.replace('\.0$', '', regex=True)

# Ensure 'inputDf_filtered['Date']' is a datetime Series with timezone information
inputDf_filtered['DateStamp'] = pd.to_datetime(inputDf_filtered['Date']).dt.tz_localize(CLIENT_TIMEZONE)


# Convert 'today' to a pandas Timestamp with the same timezone as 'inputDf_filtered['Date']'
today_timestamp = pd.Timestamp(today, tz=CLIENT_TIMEZONE)
Date_diff = (today_timestamp - inputDf_filtered['DateStamp']).dt.days

#params for latency curve as per - https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit

latency_D = 2.355
latency_C = 50.250
latency_B = 0.629

calc_latency_factor = latency_D/(1+(Date_diff/latency_C)**(-latency_B))

#limit calc_latency_factor to a max of 1 based on days difference
calc_latency_factor[Date_diff > 29] = 1

calc_latency_factor[calc_latency_factor > 1] = 1

# output columns and initial values
REVENUE_COL_DATE = 'Date'
REVENUE_COL_CURRENCY = 'Currency'
REVENUE_COL_DEVICE = 'Device'
REVENUE_COL_KEYWORD_ID = 'Keyword ID'
REVENUE_COL_CREATIVE_ID = 'Creative ID'
##REVENUE_COL_KEYWORD = 'Keyword'
##REVENUE_COL_MATCH_TYPE = 'Match Type'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_CAMPAIGN_ID = 'Campaign ID'
REVENUE_COL_PRODUCT_ID = 'Product ID'
##REVENUE_COL_SL_ID = 'SL ID'
##REVENUE_COL_ADEXTENSION_ID = 'AdExtension ID'
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_COMMENTS = 'Comments'
REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_CONV = 'Google Conv.Code New Conv'
REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_REV = 'Google Conv.Code New Rev'
REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_CONV = 'Google Conv.Code New - Latency Adjusted (Click Date) Conv'
REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_REV = 'Google Conv.Code New - Latency Adjusted (Click Date) Rev'
outputDf[REVENUE_COL_DATE] = inputDf_filtered['Date']
outputDf[REVENUE_COL_CURRENCY] = inputDf_filtered['Currency']
outputDf[REVENUE_COL_DEVICE] = inputDf_filtered['Device']
outputDf[REVENUE_COL_KEYWORD_ID] = ""
outputDf[REVENUE_COL_CREATIVE_ID] = inputDf_filtered['Creative ID']
##outputDf[REVENUE_COL_KEYWORD] = inputDf_filtered['Keyword']
##outputDf[REVENUE_COL_MATCH_TYPE] = inputDf['Match Type']
outputDf[REVENUE_COL_GROUP_ID] = inputDf_filtered['Group ID']
outputDf[REVENUE_COL_CAMPAIGN_ID] = inputDf_filtered['Campaign ID']
outputDf[REVENUE_COL_PRODUCT_ID] = inputDf_filtered['Product ID']
##outputDf[REVENUE_COL_SL_ID] = inputDf_filtered['SL ID']
##outputDf[REVENUE_COL_ADEXTENSION_ID] = inputDf_filtered['AdExtension ID']
outputDf[REVENUE_COL_TARGET_IDS] = inputDf_filtered['Target IDs']
outputDf[REVENUE_COL_COMMENTS] = "Date difference = " + Date_diff.astype(str) +", calc_latency_factor is " + calc_latency_factor.astype(str)
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_CONV] = inputDf_filtered['Conversions']
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_REV] = inputDf_filtered['Revenue']
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_CONV] = (inputDf_filtered['Conversions'] /calc_latency_factor)
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_REV] = (inputDf_filtered['Revenue'] /calc_latency_factor)

#remove duplicates
outputDf = outputDf.drop_duplicates()

#allow for custom output columns
skip_output_validations = True

# user code start here
print(tableize(inputDf_filtered.head()))

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus