Script 1757: Script Rev Adjmt Keywords

Purpose:
The Python script adjusts revenue and conversion data at the keyword level based on latency factors to provide more accurate estimates.
To Elaborate
The script is designed to refine revenue and conversion data for keywords by accounting for latency factors, which are parameters that affect the timing and accuracy of reported data. It parses a string of latency parameters, applies these to calculate a latency factor, and adjusts the conversion and revenue figures accordingly. The script ensures that the input data is complete and valid, checks for necessary columns, and processes the data to compute adjusted values. It also adds new columns for target IDs and comments, providing insights into the adjustments made. This process helps in generating more precise estimates of keyword performance over a 28-day period, which is crucial for budgeting and performance analysis in digital marketing campaigns.
Walking Through the Code
- Parsing Latency Parameters:
- The script begins by defining a function to parse a string containing latency parameters, converting them into a dictionary for easy access.
- These parameters are used to calculate the latency factor, which adjusts the revenue and conversion data.
- Data Validation and Preparation:
- It checks if the input dataset is empty and verifies the presence of required columns.
- The ‘Date’ column is converted to a datetime format, and invalid dates are removed to ensure data integrity.
- Calculating Latency Factor:
- The script calculates the difference in days between the stated revenue date and the current date.
- Using the parsed latency parameters, it computes a latency factor, ensuring it does not exceed 1 for date differences greater than 29 days.
- Adjusting Revenue and Conversion Data:
- Adjusted conversions and revenue are calculated by dividing the original values by the latency factor, rounding to two decimal places for precision.
- Adding Additional Columns:
- A new ‘Target IDs’ column is created with a “kwd-“ prefix, and ‘Group ID’ is mapped to ‘Group Pub. ID’.
- A ‘Comments’ column is added to provide context on the adjustments made.
- Final Output Preparation:
- The script defines the final output columns, including keyword and match type information, and creates a dataframe with these columns, ensuring no duplicate entries.
Vitals
- Script ID : 1757
- Client ID / Customer ID: 314475863 / 63970
- Action Type: Revenue Upload
- Item Changed: None
- Output Columns: Date, Group ID, Comments, Target IDs, Keyword, Match Type, pub. 28d est. Conv, pub. 28d est. Rev
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jesus Garza (jgarza@marinsoftware.com)
- Created by Jesus Garza on 2025-02-20 19:00
- Last Updated by Jesus Garza on 2025-02-26 13:58
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
##
## name: Script - Rev Adjmt Keywords
## description: Adjusts revenue and conversion data at the keyword level based on latency factors.
##
## author: Jesus Garza
## created: 2025-02-20
##
# Function to parse name-value string for latency parameters
def parse_name_value_string(input_string):
"""
Parses a string with name=value pairs and returns a dictionary.
Args:
input_string: The string to parse (e.g., "D=4.3 C=52.2 B=0.42 H=6").
Returns:
A dictionary where keys are the names and values are the parsed values.
"""
variables = {}
if not input_string: # Handle empty string
return variables
pairs = input_string.split() # Split by spaces
for pair in pairs:
try:
name, value = pair.split("=")
try:
value = float(value) # Convert value to float if possible
except ValueError:
pass # Keep as string if float conversion fails
variables[name] = value # Assign value to dictionary
except ValueError:
print(f"Warning: Invalid name=value pair: '{pair}'. Skipping.")
continue # Continue processing next pair
return variables
# Example: Read latency parameters from an input string
latency_param_string = "D=2.770 C=716.803 B=0.502 H=0.552"
parsed_vars = parse_name_value_string(latency_param_string)
# Assign parsed latency parameters
D = parsed_vars.get("D", 2.770) # Default if missing
C = parsed_vars.get("C", 716.803)
B = parsed_vars.get("B", 0.502)
H = parsed_vars.get("H", 0.552)
# Define today's date
today = datetime.datetime.now().date()
# Primary data source
inputDf = dataSourceDict["1"]
# Ensure the input dataset is not empty
if inputDf.empty:
raise ValueError("Error: Input dataset is empty or not found.")
# Define column names for keyword-level processing
RPT_COL_DATE = 'Date'
RPT_COL_KEYWORD_ID = 'Keyword ID'
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_GROUP_PUB_ID = 'Group Pub. ID' # Using Group Pub. ID instead of Group ID
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_PUB_CONV = 'Pub. Conv.'
RPT_COL_PUB_REVENUE = 'Pub. Revenue'
# Output column names
REVENUE_COL_DATE = 'Date'
REVENUE_COL_KEYWORD_ID = 'Keyword ID'
REVENUE_COL_KEYWORD = 'Keyword'
REVENUE_COL_MATCH_TYPE = 'Match Type'
REVENUE_COL_GROUP_ID = 'Group ID' # Now using Group Pub. ID
REVENUE_COL_TARGET_IDS = 'Target IDs' # New column
REVENUE_COL_PUB_28D_EST_CONV = 'pub. 28d est. Conv'
REVENUE_COL_PUB_28D_EST_REV = 'pub. 28d est. Rev'
REVENUE_COL_COMMENTS = 'Comments'
# Ensure required columns exist in the dataset
required_columns = [
RPT_COL_DATE, RPT_COL_KEYWORD_ID, RPT_COL_KEYWORD, RPT_COL_MATCH_TYPE,
RPT_COL_GROUP_PUB_ID, RPT_COL_PUB_ID, RPT_COL_PUB_CONV, RPT_COL_PUB_REVENUE
]
missing_columns = [col for col in required_columns if col not in inputDf.columns]
if missing_columns:
raise ValueError(f"Missing required columns: {missing_columns}")
# Convert 'Date' column to datetime format
inputDf[RPT_COL_DATE] = pd.to_datetime(inputDf[RPT_COL_DATE], errors='coerce')
# Drop any rows where Date is NaT (invalid datetime)
inputDf = inputDf.dropna(subset=[RPT_COL_DATE])
# Calculate 'date_diff' (days between stated revenue and today)
inputDf["date_diff"] = (pd.Timestamp(today) - inputDf[RPT_COL_DATE]).dt.days
# Calculate latency factor using parsed parameters
inputDf["calc_latency_factor"] = H + (D / (1 + (inputDf["date_diff"] / C) ** (-B)))
# Ensure latency factor does not exceed 1 for date differences greater than 29 days
inputDf.loc[inputDf["date_diff"] > 29, "calc_latency_factor"] = 1
inputDf.loc[inputDf["calc_latency_factor"] > 1, "calc_latency_factor"] = 1
# Compute adjusted conversions and revenue (rounded to 2 decimal places)
inputDf[REVENUE_COL_PUB_28D_EST_CONV] = (inputDf[RPT_COL_PUB_CONV] / inputDf["calc_latency_factor"]).round(2)
inputDf[REVENUE_COL_PUB_28D_EST_REV] = (inputDf[RPT_COL_PUB_REVENUE] / inputDf["calc_latency_factor"]).round(2)
# Add new 'Target IDs' column with "kwd-" prefix
inputDf[REVENUE_COL_TARGET_IDS] = "kwd-" + inputDf[RPT_COL_PUB_ID].astype(str)
# Map 'Group ID' to 'Group Pub. ID'
inputDf[REVENUE_COL_GROUP_ID] = inputDf[RPT_COL_GROUP_PUB_ID]
# Add Comments column
inputDf[REVENUE_COL_COMMENTS] = (
"Date difference = " + inputDf["date_diff"].astype(str) +
", calc_latency_factor is " + inputDf["calc_latency_factor"].round(4).astype(str)
)
# Define final output columns (including Keyword & Match Type)
output_columns = [
REVENUE_COL_DATE,
REVENUE_COL_KEYWORD_ID,
REVENUE_COL_KEYWORD,
REVENUE_COL_MATCH_TYPE,
REVENUE_COL_GROUP_ID,
REVENUE_COL_TARGET_IDS,
REVENUE_COL_PUB_28D_EST_CONV,
REVENUE_COL_PUB_28D_EST_REV,
REVENUE_COL_COMMENTS # Added Comments column
]
# Create final output dataframe
outputDf = inputDf[output_columns].drop_duplicates()
Post generated on 2025-03-11 01:25:51 GMT