Script 1089: latency

Purpose

The Python script processes GEM data to apply a latency curve adjustment for conversion and revenue metrics.

To Elaborate

The script is designed to process data from a GEM data source, specifically focusing on rows related to a particular conversion type, “GoogleConvCodeNew.” It applies a latency curve adjustment to conversion and revenue metrics based on a predefined mathematical model. The latency adjustment is calculated using parameters defined in an external document, which are applied to the data to account for delays in conversion reporting. The script ensures that certain columns are formatted correctly, such as converting IDs to strings and removing unnecessary decimal points. It also calculates the difference in days between the current date and the date of each data entry to apply the latency adjustment accurately. The adjusted metrics are then prepared for output, with duplicates removed to ensure data integrity.

Walking Through the Code

  1. Data Filtering and Preparation
    • The script begins by filtering the input data to include only rows where the ‘Conversion Type’ is ‘GoogleConvCodeNew.’
    • It converts several ID columns (‘Creative ID’, ‘Group ID’, ‘Campaign ID’, ‘Product ID’) to strings and removes any trailing “.0” to ensure proper formatting.
  2. Date Handling and Latency Calculation
    • The ‘Date’ column is converted to a datetime format with timezone information.
    • The script calculates the difference in days between the current date and each entry’s date.
    • It uses predefined parameters (latency_D, latency_C, latency_B) to compute a latency factor, which is capped at a maximum of 1 for entries older than 29 days.
  3. Output Preparation
    • The script prepares the output DataFrame by copying relevant columns from the filtered input data.
    • It calculates latency-adjusted conversion and revenue metrics by dividing the original metrics by the latency factor.
    • Duplicates are removed from the output to maintain data quality, and custom output columns are allowed by skipping validations.

Vitals

  • Script ID : 1089
  • Client ID / Customer ID: 1306927637 / 64302
  • Action Type: Revenue Upload
  • Item Changed: None
  • Output Columns: Date, Keyword ID, Creative ID, Group ID, Campaign ID, Product ID, Target IDs, Currency, Device, Comments, Google Conv.Code New (Click Date) Conv, Google Conv.Code New (Click Date) Rev, Google Conv.Code New - Latency Adjusted (Click Date) Conv, Google Conv.Code New - Latency Adjusted (Click Date) Rev
  • Linked Datasource: FTP/Email Feed
  • Reference Datasource: None
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2024-05-14 12:42
  • Last Updated by Tom McCaughey on 2024-05-16 08:54
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
##
## name: latency test
## description: ingest GEM data and apply a latency curve adjustment as per https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit?usp=sharing
##  
## 
## author: Tom McCaughey
## created: 2024-04-30
## 

DEVICE = {
  'MOBILE': 'm',
  'DESKTOP': 'c',
  'TABLET': 't',
}
MATCH_TYPE = {
  'EXACT': 'exact',
  'PHRASE': 'phrase',
  'BROAD': 'broad',
}
today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]

#optional filter to only include the source rows relating to the conversion type we want
inputDf_filtered = inputDf[inputDf['Conversion Type'] == 'GoogleConvCodeNew'].copy()

# Convert 'Creative ID' column to string
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].astype(str)

# Remove ".0" from the end of each value
inputDf_filtered['Creative ID'] = inputDf_filtered['Creative ID'].str.replace('\.0$', '', regex=True)

# Convert 'Group ID' column to string
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Group ID'] = inputDf_filtered['Group ID'].str.replace('\.0$', '', regex=True)

# Convert 'Campaign ID' column to string
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Campaign ID'] = inputDf_filtered['Campaign ID'].str.replace('\.0$', '', regex=True)

# Convert 'Product ID' column to string
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].astype(str)
# Remove ".0" from the end of each value
inputDf_filtered['Product ID'] = inputDf_filtered['Product ID'].str.replace('\.0$', '', regex=True)

# Ensure 'inputDf_filtered['Date']' is a datetime Series with timezone information
inputDf_filtered['DateStamp'] = pd.to_datetime(inputDf_filtered['Date']).dt.tz_localize(CLIENT_TIMEZONE)


# Convert 'today' to a pandas Timestamp with the same timezone as 'inputDf_filtered['Date']'
today_timestamp = pd.Timestamp(today, tz=CLIENT_TIMEZONE)
Date_diff = (today_timestamp - inputDf_filtered['DateStamp']).dt.days

#params for latency curve as per - https://docs.google.com/document/d/1MactW0CdcDjQ7-vDbaex2KOGStiB_vnypaFsXI2dg-o/edit

latency_D = 2.355
latency_C = 50.250
latency_B = 0.629

calc_latency_factor = latency_D/(1+(Date_diff/latency_C)**(-latency_B))

#limit calc_latency_factor to a max of 1 based on days difference
calc_latency_factor[Date_diff > 29] = 1

calc_latency_factor[calc_latency_factor > 1] = 1

# output columns and initial values
REVENUE_COL_DATE = 'Date'
REVENUE_COL_CURRENCY = 'Currency'
REVENUE_COL_DEVICE = 'Device'
REVENUE_COL_KEYWORD_ID = 'Keyword ID'
REVENUE_COL_CREATIVE_ID = 'Creative ID'
##REVENUE_COL_KEYWORD = 'Keyword'
##REVENUE_COL_MATCH_TYPE = 'Match Type'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_CAMPAIGN_ID = 'Campaign ID'
REVENUE_COL_PRODUCT_ID = 'Product ID'
##REVENUE_COL_SL_ID = 'SL ID'
##REVENUE_COL_ADEXTENSION_ID = 'AdExtension ID'
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_COMMENTS = 'Comments'
REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_CONV = 'Google Conv.Code New Conv'
REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_REV = 'Google Conv.Code New Rev'
REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_CONV = 'Google Conv.Code New - Latency Adjusted (Click Date) Conv'
REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_REV = 'Google Conv.Code New - Latency Adjusted (Click Date) Rev'
outputDf[REVENUE_COL_DATE] = inputDf_filtered['Date']
outputDf[REVENUE_COL_CURRENCY] = inputDf_filtered['Currency']
outputDf[REVENUE_COL_DEVICE] = inputDf_filtered['Device']
outputDf[REVENUE_COL_KEYWORD_ID] = ""
outputDf[REVENUE_COL_CREATIVE_ID] = inputDf_filtered['Creative ID']
##outputDf[REVENUE_COL_KEYWORD] = inputDf_filtered['Keyword']
##outputDf[REVENUE_COL_MATCH_TYPE] = inputDf['Match Type']
outputDf[REVENUE_COL_GROUP_ID] = inputDf_filtered['Group ID']
outputDf[REVENUE_COL_CAMPAIGN_ID] = inputDf_filtered['Campaign ID']
outputDf[REVENUE_COL_PRODUCT_ID] = inputDf_filtered['Product ID']
##outputDf[REVENUE_COL_SL_ID] = inputDf_filtered['SL ID']
##outputDf[REVENUE_COL_ADEXTENSION_ID] = inputDf_filtered['AdExtension ID']
outputDf[REVENUE_COL_TARGET_IDS] = inputDf_filtered['Target IDs']
outputDf[REVENUE_COL_COMMENTS] = "Date difference = " + Date_diff.astype(str) +", calc_latency_factor is " + calc_latency_factor.astype(str)
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_CONV] = inputDf_filtered['Conversions']
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW_CLICK_DATE_REV] = inputDf_filtered['Revenue']
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_CONV] = (inputDf_filtered['Conversions'] /calc_latency_factor)
outputDf[REVENUE_COL_GOOGLE_CONVCODE_NEW__LATENCY_ADJUSTED_CLICK_DATE_REV] = (inputDf_filtered['Revenue'] /calc_latency_factor)

#remove duplicates
outputDf = outputDf.drop_duplicates()

#allow for custom output columns
skip_output_validations = True

# user code start here
print(tableize(inputDf_filtered.head()))

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus