Script 579: Salesforce Item ID SBA Pacing

Purpose

The Python script optimizes budget allocation for marketing campaigns to minimize lost impression share due to budget constraints.

To Elaborate

The script addresses the challenge of efficiently allocating marketing budgets across various campaigns to minimize lost impression share due to budget limitations. It focuses on campaigns that are active and within their budget period, considering historical spending and potential future spending. The script calculates the full potential spend by adjusting historical spend based on lost impression share due to budget constraints. It then aggregates data by Salesforce Item ID and filters out campaigns that are inactive, have no spend in the lookback period, or are outside the budget period. The script calculates the remaining budget and days in the budget period, and allocates the budget accordingly, ensuring that the daily budget meets a minimum threshold. The goal is to ensure that the pacing compliance percentage is ideally 100% each day, indicating optimal budget utilization.

Walking Through the Code

  1. Initialization and Configuration:
    • The script begins by setting a configurable parameter for the minimum daily budget (MINIMUM_DAILY_BUDGET).
    • It checks if the code is running on a server or locally, and loads data from a pickle file if running locally.
  2. Data Preparation:
    • The script sets up the input DataFrame (inputDf) and prepares it for processing by setting the index and grouping by Salesforce Item ID.
    • It calculates the full potential spend by adjusting historical spend based on lost impression share due to budget constraints.
  3. Data Aggregation:
    • The script removes date segmentation and calculates budget period spend by summing up costs within the budget start and end dates.
    • It aggregates data by various columns, including campaign status, daily budget, and clicks, using a specified aggregation specification.
  4. Campaign Filtering:
    • The script filters out campaigns that are inactive, have no spend in the lookback period, or are outside the budget period.
    • It calculates the number of eligible campaigns for budget allocation.
  5. Budget Allocation:
    • The script calculates the budget allocation ratio by capping full potential spend and comparing it to total spend within the same Salesforce Item ID budget group.
    • It calculates the remaining budget for each Salesforce Item ID budget group and the remaining days in the budget period.
  6. Daily Budget Calculation:
    • The script calculates the recommended daily budget for each campaign by dividing the allocated budget by the number of days left in the budget period.
    • It applies a minimum daily budget rule to ensure the daily budget does not fall below the specified threshold.
  7. Output Generation:
    • The script generates an output DataFrame with updated budget allocations and pacing compliance percentages.
    • It checks for changes in the recommended daily budget and other metrics, and outputs only the changed campaigns.

Vitals

  • Script ID : 579
  • Client ID / Customer ID: 1306926667 / 60270093
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Daily Budget, SBA Allocation, SBA Budget Pacing, SBA Rec. Daily Budget
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: ascott@marinsoftware.com (ascott@marinsoftware.com)
  • Created by ascott@marinsoftware.com on 2023-12-12 18:47
  • Last Updated by Michael Huang on 2024-03-07 12:25
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
#
# AVB Marketing - Budget Pacing - Minimize Lost IS (Budget)
#
# Allocates according to:
# * Only for campaigns within Budget Start/End Date
# * Remaining budget for each Salesforce Item ID Budget Group
# * Remaining days in Budget Period (Budget Start/End Date)
# * Historical spend and spend potential
# * Campaigns with spend in lookback period
# * Minimum daily budget
#
# Author: Michael S. Huang
#
# Created: 2023-09-30
#

##### Configurable Param #####

MINIMUM_DAILY_BUDGET = 10

##############################

########### START - Local Mode Config ###########
# Step 1: Uncomment download_preview_input flag and run Preview successfully with the Datasources you want
download_preview_input=False
# Step 2: In MarinOne, go to Scripts -> Preview -> Logs, download 'dataSourceDict' pickle file, and update pickle_path below
# pickle_path = ''
pickle_path = '/Users/mhuang/Downloads/pickle/avb_marketing_20240303_datasource_dict.pkl'
# Step 3: Copy this script into local IDE with Python virtual env loaded with pandas and numpy.
# Step 4: Run locally with below code to init dataSourceDict

# determine if code is running on server or locally
def is_executing_on_server():
    try:
        # Attempt to access a known restricted builtin
        dict_items = dataSourceDict.items()
        return True
    except NameError:
        # NameError: dataSourceDict object is missing (indicating not on server)
        return False

local_dev = False

if is_executing_on_server():
    print("Code is executing on server. Skip init.")
elif len(pickle_path) > 3:
    print("Code is NOT executing on server. Doing init.")
    local_dev = True
    # load dataSourceDict via pickled file
    import pickle
    dataSourceDict = pickle.load(open(pickle_path, 'rb'))

    # print shape and first 5 rows for each entry in dataSourceDict
    for key, value in dataSourceDict.items():
        print(f"Shape of dataSourceDict[{key}]: {value.shape}")
        # print(f"First 5 rows of dataSourceDict[{key}]:\n{value.head(5)}")

    # set outputDf same as inputDf
    inputDf = dataSourceDict["1"]
    outputDf = inputDf.copy()

    # setup timezone
    import datetime
    # LA Timezone is GMT-8. Adjust as needed.
    CLIENT_TIMEZONE = datetime.timezone(datetime.timedelta(hours=-8))

    # import pandas
    import pandas as pd
    import numpy as np

    # other imports
    import re
    import urllib

    # import Marin util functions
    from marin_scripts_utils import tableize, select_changed
else:
    print("Running locally but no pickle path defined. dataSourceDict not loaded.")
    exit(1)
########### END - Local Mode Setup ###########



RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_DATE = 'Date'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_STRATEGY = 'Strategy'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_PUB_COST = 'Pub. Cost $'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CONV = 'Conv.'
RPT_COL_IMPR_SHARE = 'Impr. share %'
RPT_COL_LOST_IMPR_SHARE_BUDGET = 'Lost Impr. Share (Budget) %'
RPT_COL_LOST_IMPR_SHARE_RANK = 'Lost Impr. Share (Rank) %'
RPT_COL_SALESFORCE_ITEM_ID = 'Salesforce Item Id'
RPT_COL_SBA_CAMPAIGN_BUDGET = 'SBA Campaign Budget'
RPT_COL_SBA_ALLOCATION = 'SBA Allocation'
RPT_COL_SBA_REC_DAILY_BUDGET = 'SBA Rec. Daily Budget'
RPT_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
RPT_COL_SBA_TRAFFIC = 'SBA Traffic'
RPT_COL_PROGRAM_END_DATE = 'Program End Date'
RPT_COL_BUDGET_START_DATE = 'Budget Start Date'
RPT_COL_BUDGET_END_DATE = 'Budget End Date'

BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_DAILY_BUDGET = 'Daily Budget'
BULK_COL_SBA_ALLOCATION = 'SBA Allocation'
BULK_COL_SBA_BUDGET_PACING = 'SBA Budget Pacing'
BULK_COL_SBA_REC_DAILY_BUDGET = 'SBA Rec. Daily Budget'

COL_SPEND_FULL_POTENTIAL = 'spend_lookback_full_potential'
COL_SPEND_FULL_POTENTIAL_CAPPED = 'spend_lookback_full_potential_capped'
COL_SPEND_BUDGET_PERIOD = 'spend_budget_period'
COL_SBA_ALLOCATION_NEW_FLOAT = RPT_COL_SBA_ALLOCATION + '_new_float'
COL_SBA_ALLOCATION_NEW = RPT_COL_SBA_ALLOCATION + '_new'
COL_SALESFORCE_ITEM_ID_BUDGET_REMAINING = 'Salesforce_Item_ID_budget_remaining'
COL_SBA_BUDGET_PACING_NEW = RPT_COL_SBA_BUDGET_PACING + '_new'
COL_CAMPAIGN_BUDGET_REMAINING = 'campaign_budget_remaining'
COL_DAILY_BUDGET_NEW = RPT_COL_DAILY_BUDGET + '_new'
COL_SBA_REC_DAILY_BUDGET_NEW = RPT_COL_SBA_REC_DAILY_BUDGET + '_new'
COL_BUDGET_PERIOD_DAYS_REMAINING= 'budget_period_days_remaining'
COL_BUDGET_PERIOD_DAYS_TOTAL= 'budget_period_days_total'
COL_PACING_CALC = 'pacing_calc'

today = datetime.datetime.now(CLIENT_TIMEZONE).date()
today_pd = pd.to_datetime(today)

print("today", today_pd)
print("inputDf shape", inputDf.shape)
print(inputDf.info())

EMPTY_OUTPUT = pd.DataFrame(columns=[BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_DAILY_BUDGET, BULK_COL_SBA_REC_DAILY_BUDGET, BULK_COL_SBA_BUDGET_PACING, BULK_COL_SBA_ALLOCATION])

def process(inputDf):

    inputDf = inputDf.set_index([RPT_COL_SALESFORCE_ITEM_ID])
    group_by_salesforce_item_ID = inputDf.groupby(RPT_COL_SALESFORCE_ITEM_ID)

    # ## Calculate Full-Potential Spend
    # * Adjust Historical Spend by _Lost Impression Share due to Budget_ (see [Formula](https://docs.google.com/document/d/1EbCQ5z9Up8TZ6GISEeCaRSB3Fc15vCPCfeIydree23M/edit#bookmark=id.5fsx7jlseze6))
    # 

    adj_ratio = 1 + (inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET] / (1 - inputDf[RPT_COL_LOST_IMPR_SHARE_BUDGET]))

    inputDf[COL_SPEND_FULL_POTENTIAL] = round(inputDf[RPT_COL_PUB_COST] * adj_ratio, 2)

    # ## Remove Date Segmentation
    # * Calculate Budget Period Spend
                    
    # give new column values to sum up
    inputDf[COL_SPEND_BUDGET_PERIOD] = inputDf[RPT_COL_PUB_COST]

    # SUM Series but only include dates within Start/End Dates (inclusive)
    def date_range_sum(x, date, start_date, end_date):
        # print('budget_period_sum got: ', start_date.info(), end_date.info(), x.info())
        rows_within_budget_period = x[ (date >= start_date) & (date <= end_date) & (x.values > 0)]
        return rows_within_budget_period.sum()

    groupby_cols = [ \
        RPT_COL_SALESFORCE_ITEM_ID, \
        RPT_COL_STRATEGY, \
        RPT_COL_PUBLISHER_NAME, \
        RPT_COL_ACCOUNT, \
        RPT_COL_CAMPAIGN, \
    ]

    # index by groupby columns so can do inputDf.loc[x.index] in agg spec below
    inputDf = inputDf.reset_index() \
                    .set_index(groupby_cols) \
                    .sort_index()

    agg_spec = {
        RPT_COL_CAMPAIGN_STATUS: 'last', \
        RPT_COL_DAILY_BUDGET: 'last', \
        RPT_COL_SBA_CAMPAIGN_BUDGET: 'last', \
        RPT_COL_SBA_ALLOCATION: 'last', \
        RPT_COL_SBA_REC_DAILY_BUDGET: 'last', \
        RPT_COL_SBA_BUDGET_PACING: 'last', \
        RPT_COL_SBA_TRAFFIC: 'last', \
        RPT_COL_BUDGET_START_DATE: 'last', \
        RPT_COL_BUDGET_END_DATE: 'last', \
        RPT_COL_CLICKS: 'sum', \
        RPT_COL_CONV: 'sum', \
        RPT_COL_PUB_COST: 'sum', \
        COL_SPEND_BUDGET_PERIOD: lambda x: date_range_sum(x, \
                                                            inputDf.loc[x.index[0], RPT_COL_DATE], \
                                                            inputDf.loc[x.index[0], RPT_COL_BUDGET_START_DATE], \
                                                            inputDf.loc[x.index[0], RPT_COL_BUDGET_END_DATE] \
                                                            ), \
        COL_SPEND_FULL_POTENTIAL: 'sum', \
        RPT_COL_PROGRAM_END_DATE: 'last', \
    }


    df_campaign_agg = inputDf \
                            .groupby(groupby_cols) \
                            .agg(agg_spec) \
                            .reset_index() \
                            .set_index(RPT_COL_SALESFORCE_ITEM_ID)


    # ## Only allocate budget for recently trafficking campaigns
    # * Exclude Campaigns that are:
    # ** not ACTIVE 
    # ** without spend in lookback period
    # ** Program Date is in the past
    # ** current date not within Budget Start/End Dates

    inactive_campaigns = (df_campaign_agg[RPT_COL_CAMPAIGN_STATUS] != 'Active') & (df_campaign_agg[RPT_COL_PUB_COST] == 0)
    expired_campaigns = df_campaign_agg[RPT_COL_PROGRAM_END_DATE].notnull() & (df_campaign_agg[RPT_COL_PROGRAM_END_DATE] < pd.to_datetime(today))
    in_budget_period =  df_campaign_agg[RPT_COL_BUDGET_START_DATE].notnull() & \
                        df_campaign_agg[RPT_COL_BUDGET_END_DATE].notnull() & \
                        (today_pd >= df_campaign_agg[RPT_COL_BUDGET_START_DATE]) & \
                        (today_pd <= df_campaign_agg[RPT_COL_BUDGET_END_DATE])

    campaign_count = df_campaign_agg.size

    df_campaign_agg = df_campaign_agg.loc[ ~(inactive_campaigns | expired_campaigns) & in_budget_period]

    eligible_campaign_count = df_campaign_agg.size

    print(f"Out of {campaign_count} Campaigns, {eligible_campaign_count} are eligible for budget allocation")

    if eligible_campaign_count == 0:
        print("No campaigns to allocate. Stopping.")
        return EMPTY_OUTPUT, EMPTY_OUTPUT # Early return from the function, effectively ending the script

    # ## Calculate Budget Allocation Ratio 
    # * Cap full potential spend at 2X (don't spend twice as much as before)
    # * Compare full potential spend for each campaign to total spend within same SALESFORCE_ITEM_ID budget group

    df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] = df_campaign_agg \
        .apply(lambda row: min(row[COL_SPEND_FULL_POTENTIAL], 2 * row[RPT_COL_PUB_COST]), axis=1)


    # use transform to calculate sum for each SALESFORCE_ITEM_ID and make it available on every row
    # note: no need to build aggregate DataFrame and JOIN back to original

    df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] = 100.0 * \
            df_campaign_agg[COL_SPEND_FULL_POTENTIAL_CAPPED] / \
            df_campaign_agg.groupby(RPT_COL_SALESFORCE_ITEM_ID)[COL_SPEND_FULL_POTENTIAL_CAPPED].transform('sum')

    df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] = df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT].fillna(0)

    df_campaign_agg[COL_SBA_ALLOCATION_NEW] = round(df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT],0).astype(str) + '%'


    # 
    # ## Calculate Remaining Budget
    # * For each SALESFORCE_ITEM_ID budget group, calculate how much Budget is left by substracting SBA Monthly budget from MTD SALESFORCE_ITEM_ID spend

    df_campaign_agg[COL_SALESFORCE_ITEM_ID_BUDGET_REMAINING] =  \
            df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET] - \
            df_campaign_agg.groupby(by=[RPT_COL_SALESFORCE_ITEM_ID])[COL_SPEND_BUDGET_PERIOD].sum()


    # ## Calculate total and remaining days

    df_campaign_agg[COL_BUDGET_PERIOD_DAYS_TOTAL] = (df_campaign_agg[RPT_COL_BUDGET_END_DATE] - df_campaign_agg[RPT_COL_BUDGET_START_DATE]).dt.days + 1

    def calculate_remaining_days(row, today_pd):
        start_date = row[RPT_COL_BUDGET_START_DATE]
        end_date = row[RPT_COL_BUDGET_END_DATE]
        days = None
        if pd.isnull(start_date) or pd.isnull(end_date):
            days = np.nan  # or some default value if null dates are not expected
        if today_pd < start_date:
            days = (end_date - start_date).days + 1
        elif today_pd > end_date:
            days = 0
        else:
            days = (end_date - today_pd).days + 1

        return days

    df_campaign_agg[COL_BUDGET_PERIOD_DAYS_REMAINING] = df_campaign_agg.apply(
        lambda row: calculate_remaining_days(row, today_pd),
        axis=1
    )

    df_campaign_agg[COL_CAMPAIGN_BUDGET_REMAINING] = round(df_campaign_agg[COL_SALESFORCE_ITEM_ID_BUDGET_REMAINING] * df_campaign_agg[COL_SBA_ALLOCATION_NEW_FLOAT] / 100.0, 1)


    # ## Calculate SBA Daily Budget
    # 
    # * Calcualte next day Daily Budget by dividing allocated budget by number of Days left in the current budget period
    def calculate_daily_budget(row, today_pd):
        if today_pd < row[RPT_COL_BUDGET_START_DATE] or today_pd > row[RPT_COL_BUDGET_END_DATE]:
            return 0
        else:
            budget = round(row[COL_CAMPAIGN_BUDGET_REMAINING] / row[COL_BUDGET_PERIOD_DAYS_REMAINING], 0)

            # Apply Minimum Rule
            budget = max(MINIMUM_DAILY_BUDGET, budget)

            return budget


    df_campaign_agg[COL_SBA_REC_DAILY_BUDGET_NEW] = df_campaign_agg.apply(
        lambda row: calculate_daily_budget(row, today_pd),
        axis=1
    )




    # ### Traffic Budget
    df_campaign_agg[COL_DAILY_BUDGET_NEW] = np.nan

    # campaigns to traffic
    to_traffic = df_campaign_agg[RPT_COL_SBA_TRAFFIC].notnull() & \
                (df_campaign_agg[RPT_COL_SBA_TRAFFIC].astype(str).str.lower() == 'traffic')
    print("Not weekend. Traffic count", to_traffic.sum())

    # copy budgets over
    df_campaign_agg[COL_DAILY_BUDGET_NEW] = df_campaign_agg[COL_SBA_REC_DAILY_BUDGET_NEW]
    # then blank out budget for non-traffic campaigns
    df_campaign_agg.loc[~to_traffic, COL_DAILY_BUDGET_NEW] = np.nan


    # ## Calculate Salesforece Item ID -level Pacing compliance percentage. Ideally should be 100% each day.

    # calc ratio to prorate monthly budget
    ratio_of_elapsed_days_in_period = (df_campaign_agg[COL_BUDGET_PERIOD_DAYS_TOTAL] - df_campaign_agg[COL_BUDGET_PERIOD_DAYS_REMAINING]) / df_campaign_agg[COL_BUDGET_PERIOD_DAYS_TOTAL]

    # divide budget period spend by prorated total budget
    mask = df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET] > 0
    df_campaign_agg[COL_PACING_CALC] = round(100.0 * \
                                        df_campaign_agg.groupby(by=[RPT_COL_SALESFORCE_ITEM_ID])[COL_SPEND_BUDGET_PERIOD].sum() / \
                                        (ratio_of_elapsed_days_in_period * df_campaign_agg[RPT_COL_SBA_CAMPAIGN_BUDGET]), \
                                        0).astype(str) + '%'
    df_campaign_agg.loc[mask, COL_SBA_BUDGET_PACING_NEW] = df_campaign_agg.loc[mask, COL_PACING_CALC]



    # Debug DF with full details
    df_SALESFORCE_ITEM_ID_budget = group_by_salesforce_item_ID[[RPT_COL_SBA_CAMPAIGN_BUDGET]].transform('max').dropna().drop_duplicates()
    print("Salesforce Item ID budgets", df_SALESFORCE_ITEM_ID_budget.head().to_string())


    # ## Generate outputDf

    # Check for changes
    changed = df_campaign_agg[COL_SBA_REC_DAILY_BUDGET_NEW].notnull() & \
        ( \
        (df_campaign_agg[RPT_COL_SBA_REC_DAILY_BUDGET] != df_campaign_agg[COL_SBA_REC_DAILY_BUDGET_NEW]) | \
        (df_campaign_agg[RPT_COL_DAILY_BUDGET] != df_campaign_agg[COL_DAILY_BUDGET_NEW]) | \
        (df_campaign_agg[RPT_COL_SBA_ALLOCATION] != df_campaign_agg[COL_SBA_ALLOCATION_NEW]) | \
        (df_campaign_agg[RPT_COL_SBA_BUDGET_PACING] != df_campaign_agg[COL_SBA_BUDGET_PACING_NEW]) \
        )

    print("Changed rows:", changed.sum())

    # Debug
    debugDf = df_campaign_agg.loc[changed] \
        .reset_index() \
        .sort_values(by=[RPT_COL_SALESFORCE_ITEM_ID, COL_DAILY_BUDGET_NEW, COL_SBA_REC_DAILY_BUDGET_NEW], ascending=False) 

    # print("debugDf", tableize(debugDf))

    # Only emit output for changed campaigns
    if changed.sum() > 0:

        # construct outputDf
        outputDf = df_campaign_agg.loc[changed, [RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, COL_DAILY_BUDGET_NEW, COL_SBA_REC_DAILY_BUDGET_NEW, COL_SBA_BUDGET_PACING_NEW, COL_SBA_ALLOCATION_NEW]] \
                        .copy() \
                        .rename(columns={ \
                                COL_DAILY_BUDGET_NEW: BULK_COL_DAILY_BUDGET, \
                                COL_SBA_REC_DAILY_BUDGET_NEW: BULK_COL_SBA_REC_DAILY_BUDGET, \
                                COL_SBA_BUDGET_PACING_NEW: BULK_COL_SBA_BUDGET_PACING, \
                                COL_SBA_ALLOCATION_NEW: BULK_COL_SBA_ALLOCATION, \
                            }) \
                        .reset_index() \
                        .sort_values(by=[RPT_COL_SALESFORCE_ITEM_ID, BULK_COL_DAILY_BUDGET, BULK_COL_SBA_REC_DAILY_BUDGET], ascending=False) \
                        .drop(RPT_COL_SALESFORCE_ITEM_ID, axis=1)

        print("outputDf shape", outputDf.shape)
        print("outputDf", tableize(outputDf.head()))
    else:
        print("No changes detected, returning an empty dataframe")
        outputDf = EMPTY_OUTPUT

    return outputDf, debugDf

## fix type issues

# change back to percent string
if inputDf[RPT_COL_SBA_ALLOCATION].dtype == "float":
    inputDf[RPT_COL_SBA_ALLOCATION] = round(inputDf[RPT_COL_SBA_ALLOCATION] * 100.0, 0).astype(str) + '%'
if inputDf[BULK_COL_SBA_BUDGET_PACING].dtype == "float":
    inputDf[BULK_COL_SBA_BUDGET_PACING] = round(inputDf[BULK_COL_SBA_BUDGET_PACING] * 100.0, 0).astype(str) + '%'

# coerce Date, Program End Date, Budget Start Date, and Budget End Date into Date type
inputDf[RPT_COL_DATE] = pd.to_datetime(inputDf[RPT_COL_DATE], errors='coerce')
inputDf[RPT_COL_PROGRAM_END_DATE] = pd.to_datetime(inputDf[RPT_COL_PROGRAM_END_DATE], errors='coerce')
inputDf[RPT_COL_BUDGET_START_DATE] = pd.to_datetime(inputDf[RPT_COL_BUDGET_START_DATE], errors='coerce')
inputDf[RPT_COL_BUDGET_END_DATE] = pd.to_datetime(inputDf[RPT_COL_BUDGET_END_DATE], errors='coerce')


## do actual processing
outputDf, debugDf = process(inputDf)

## local debug
if local_dev:
    output_filename = 'outputDf.csv'
    outputDf.to_csv(output_filename, index=False)
    print(f"Local Dev: Output written to: {output_filename}")

    debug_filename = 'debugDf.csv'
    debugDf.to_csv(debug_filename, index=False)
    print(f"Local Dev: Debug written to: {debug_filename}")

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus