Script 1749: Script Rev Adjmt Product Groups

Purpose:

The script adjusts revenue and conversion data based on latency factors to provide more accurate estimates.

To Elaborate

The Python script is designed to adjust revenue and conversion data by accounting for latency factors, which are delays in data reporting. This adjustment is crucial for businesses that rely on timely and accurate financial data to make informed decisions. The script parses a string of latency parameters, calculates a latency factor for each data entry, and uses this factor to adjust the reported conversions and revenue. The adjusted data is then prepared for output, ensuring that it reflects more accurate estimates over a 28-day period. The script also handles potential data issues, such as missing columns or invalid dates, to ensure robust data processing.

Walking Through the Code

  1. Parsing Latency Parameters:
    • The script begins by defining a function parse_name_value_string to parse a string of name-value pairs representing latency parameters.
    • It converts these values into a dictionary, handling potential errors in parsing.
  2. Setting Default Parameters:
    • The parsed parameters are assigned to variables D, C, B, and H, with default values provided if any parameter is missing.
  3. Data Preparation:
    • The script checks if the primary data source is empty and raises an error if so.
    • It ensures that required columns are present in the dataset, converting the ‘Date’ column to a datetime format and dropping invalid dates.
  4. Calculating Latency Factor:
    • A new column date_diff is calculated to determine the number of days between the reported date and the current date.
    • The script computes a calc_latency_factor using the parsed parameters, ensuring it does not exceed 1 for entries older than 29 days.
  5. Adjusting Data:
    • Adjusted conversions and revenue are calculated by dividing the original values by the latency factor, rounding to two decimal places.
    • Additional columns are added, such as ‘Target IDs’ with a “pla-“ prefix and ‘Comments’ detailing the date difference and latency factor.
  6. Finalizing Output:
    • The script handles infinite values resulting from division by zero.
    • It defines the final output columns, renames necessary columns, and removes duplicates to create the final output dataframe.

Vitals

  • Script ID : 1749
  • Client ID / Customer ID: 314475863 / 63970
  • Action Type: Revenue Upload
  • Item Changed: None
  • Output Columns: Date, Product ID, Group ID, Comments, Target IDs, pub. 28d est. Conv, pub. 28d est. Rev
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jesus Garza (jgarza@marinsoftware.com)
  • Created by Jesus Garza on 2025-02-20 00:34
  • Last Updated by ascott@marinsoftware.com on 2025-02-26 14:33
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
##
## name: Script - Rev Adjmt v2
## description: Adjusts revenue and conversion data based on latency factors.
##  
## author: Jesus Garza
## created: 2025-02-19
##

# Function to parse name-value string for latency parameters
def parse_name_value_string(input_string):
    """
    Parses a string with name=value pairs and returns a dictionary.

    Args:
        input_string: The string to parse (e.g., "D=4.3 C=52.2 B=0.42 H=6").

    Returns:
        A dictionary where keys are the names and values are the parsed values.
        Returns an empty dictionary if the input string is invalid.
        Prints a warning message if there's an issue parsing a specific pair.
    """

    variables = {}

    if not input_string:  # Handle empty string
        return variables

    pairs = input_string.split()  # Split by spaces

    for pair in pairs:
        try:
            name, value = pair.split("=")
            try:
                value = float(value)  # Convert value to float if possible
            except ValueError:
                pass  # Keep as string if float conversion fails
            variables[name] = value  # Assign value to dictionary
        except ValueError:
            print(f"Warning: Invalid name=value pair: '{pair}'. Skipping.")
            continue  # Continue processing next pair

    return variables

# Example: Read latency parameters from an input string
latency_param_string = "D=2.770 C=716.803 B=0.502 H=0.552"
parsed_vars = parse_name_value_string(latency_param_string)

# Assign parsed latency parameters
D = parsed_vars.get("D", 2.770)  # Default if missing
C = parsed_vars.get("C", 716.803)
B = parsed_vars.get("B", 0.502)
H = parsed_vars.get("H", 0.552)

# Define today's date
today = datetime.datetime.now().date()

# Primary data source
inputDf = dataSourceDict["1"]

# Ensure the input dataset is not empty
if inputDf.empty:
    raise ValueError("Error: Input dataset is empty or not found.")

# Define column names
RPT_COL_DATE = 'Date'
RPT_COL_GROUP_ID = 'Group ID'
RPT_COL_GROUP_PUB_ID = 'Group Pub. ID'
RPT_COL_PRODUCT_GROUP_ID = 'Product Group ID'  # This will be mapped to Product ID
RPT_COL_PUB_CONV = 'Pub. Conv.'
RPT_COL_PUB_REVENUE = 'Pub. Revenue'
RPT_COL_PUB_ID = 'Pub. ID'

# Output column names
REVENUE_COL_DATE = 'Date'
REVENUE_COL_GROUP_ID = 'Group ID'
REVENUE_COL_PRODUCT_ID = 'Product ID'  # Mapping Product Group ID to Product ID
REVENUE_COL_TARGET_IDS = 'Target IDs'
REVENUE_COL_PUB_28D_EST_CONV = 'pub. 28d est. Conv'
REVENUE_COL_PUB_28D_EST_REV = 'pub. 28d est. Rev'
REVENUE_COL_COMMENTS = 'Comments'

# Ensure required columns exist in the dataset
required_columns = [RPT_COL_DATE, RPT_COL_GROUP_ID, RPT_COL_PRODUCT_GROUP_ID, RPT_COL_PUB_CONV, RPT_COL_PUB_REVENUE]
missing_columns = [col for col in required_columns if col not in inputDf.columns]

if missing_columns:
    raise ValueError(f"Missing required columns: {missing_columns}")

# Convert 'Date' column to datetime format
inputDf[RPT_COL_DATE] = pd.to_datetime(inputDf[RPT_COL_DATE], errors='coerce')

# Drop any rows where Date is NaT (invalid datetime)
inputDf = inputDf.dropna(subset=[RPT_COL_DATE])

# Calculate 'date_diff' (days between stated revenue and today)
inputDf["date_diff"] = (pd.Timestamp(today) - inputDf[RPT_COL_DATE]).dt.days

# Calculate latency factor using parsed parameters
inputDf["calc_latency_factor"] = H + (D / (1 + (inputDf["date_diff"] / C) ** (-B)))

# Ensure latency factor does not exceed 1 for date differences greater than 29 days
inputDf.loc[inputDf["date_diff"] > 29, "calc_latency_factor"] = 1
inputDf.loc[inputDf["calc_latency_factor"] > 1, "calc_latency_factor"] = 1

# Compute adjusted conversions and revenue (rounded to 2 decimal places)
inputDf[REVENUE_COL_PUB_28D_EST_CONV] = (inputDf[RPT_COL_PUB_CONV] / inputDf["calc_latency_factor"]).round(2)
inputDf[REVENUE_COL_PUB_28D_EST_REV] = (inputDf[RPT_COL_PUB_REVENUE] / inputDf["calc_latency_factor"]).round(2)

# Add new 'Target IDs' column with "pla-" prefix
inputDf[REVENUE_COL_TARGET_IDS] = "pla-" + inputDf[RPT_COL_PUB_ID].astype(str)

# Map 'Group ID' to 'Group Pub. ID'
inputDf[REVENUE_COL_GROUP_ID] = inputDf[RPT_COL_GROUP_PUB_ID]

# Add Comments column
inputDf[REVENUE_COL_COMMENTS] = (
    "Date difference = " + inputDf["date_diff"].astype(str) + 
    ", calc_latency_factor is " + inputDf["calc_latency_factor"].round(4).astype(str)
)

# Handle infinite values (in case of division by zero)
inputDf.replace([float("inf"), -float("inf")], None, inplace=True)

# Define final output columns
output_columns = [
    REVENUE_COL_DATE,
    REVENUE_COL_GROUP_ID,
    REVENUE_COL_PRODUCT_ID,  # Mapping Product Group ID → Product ID
    REVENUE_COL_TARGET_IDS,
    REVENUE_COL_PUB_28D_EST_CONV,
    REVENUE_COL_PUB_28D_EST_REV,
    REVENUE_COL_COMMENTS  # Added Comments column
]

# Create final output dataframe with renamed column
outputDf = inputDf.rename(columns={RPT_COL_PRODUCT_GROUP_ID: REVENUE_COL_PRODUCT_ID})[output_columns].drop_duplicates()


Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus