Script 1585: Script Auto Tag Line of Business

Purpose:

The Python script extracts the Line of Business (LOB) tag from campaign names by identifying the value between the first and second underscores.

To Elaborate

The script is designed to automate the extraction of the Line of Business (LOB) tag from campaign names within a dataset. This is achieved by parsing the campaign names and identifying the segment between the first and second underscores. The extracted LOB tags are then added to a new column in the output dataset. The script ensures that only campaigns with a valid LOB tag are included in the final output by removing any rows where the extraction fails. This process helps in categorizing campaigns based on their LOB, which can be crucial for reporting and analysis purposes.

Walking Through the Code

  1. Data Preparation:
    • The script begins by importing necessary modules and defining the primary data source, which is a DataFrame containing campaign data.
    • It specifies the columns of interest from the input data, such as ‘Campaign’, ‘Account’, and ‘Campaign ID’.
  2. LOB Extraction Function:
    • A function extract_lob is defined to extract the LOB tag from a campaign name. It splits the campaign name by underscores and returns the second element if available, otherwise returns None.
  3. Output DataFrame Creation:
    • A new DataFrame outputDf is created by copying relevant columns from the input DataFrame.
    • The column names are adjusted to match the desired output format.
  4. Applying the Extraction Function:
    • The extract_lob function is applied to the ‘Campaign’ column of the output DataFrame to populate the ‘Line of Business_Dim’ column with extracted LOB tags.
  5. Data Cleaning:
    • Rows where the LOB extraction resulted in None are removed from the output DataFrame to ensure only valid entries are retained.
  6. Debugging:
    • The script includes print statements to display the first few rows of both the input and output DataFrames for verification purposes.

Vitals

  • Script ID : 1585
  • Client ID / Customer ID: 1306927959 / 60270403
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Line of Business_Dim, Campaign ID
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Grégory Pantaine (gpantaine@marinsoftware.com)
  • Created by Grégory Pantaine on 2024-12-19 11:54
  • Last Updated by Grégory Pantaine on 2025-01-16 17:31
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
##
## name: Script Autotag Campaign LOB
## description:
##  
## 
## author: 
## created: 2024-12-23
## 
##
## name: Script Autotag Campaign LOB
## description: Extracts Line of Business tag from campaign names (value between first and second underscore)
## 
## author: 
## created: 2024-12-23
## 

import datetime

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_CAMPAIGN_TYPE = 'Campaign Type'
RPT_COL_IMPR = 'Impr.'
RPT_COL_CLICKS = 'Clicks'
RPT_COL_CTR = 'CTR %'
RPT_COL_DAILY_BUDGET = 'Daily Budget'
RPT_COL_LAST_MODIFIED_DATE = 'Last Modified Date'
RPT_COL_RECOMMENDED_DAILY_BUDGET = 'Recommended Daily Budget'
RPT_COL_PUB_COST = 'Pub. Cost £'
RPT_COL_CAMPAIGN_CREATION_DATE = 'Campaign Creation Date'
RPT_COL_PUB_ID = 'Pub. ID'
RPT_COL_CAMPAIGN_OVERRIDE = 'Campaign Override'
RPT_COL_NETWORK = 'Network'
RPT_COL_NETWORK_TARGETING = 'Network Targeting'
RPT_COL_PUBLISHER = 'Publisher'
RPT_COL_PUBLISHER_BID_CAP = 'Publisher Bid Cap'
RPT_COL_PUBLISHER_BID_PORTFOLIO_NAME = 'Publisher Bid Portfolio Name'
RPT_COL_PUBLISHER_BID_STRATEGY = 'Publisher Bid Strategy'
RPT_COL_PUBLISHER_BID_STRATEGY_LEVEL = 'Publisher Bid Strategy Level'
RPT_COL_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_PUBLISHER_TARGET_IS = 'Publisher Target IS %'
RPT_COL_PUBLISHER_TARGET_IS_TYPE = 'Publisher Target IS Type'
RPT_COL_PUBLISHER_TARGET_ROAS = 'Publisher Target ROAS'
RPT_COL_PUBLISHER_TARGET_CPA = 'Publisher Target CPA'
RPT_COL_COUNTRY_DIM = 'Country_Dim'
RPT_COL_FRIENDLYNAME_DIM = 'FriendlyName_Dim'
RPT_COL_LINE_OF_BUSINESS_DIM = 'Line of Business_Dim'
RPT_COL_PROGRAM_DIM = 'Program_Dim'
RPT_COL_CAMPAIGNID = 'Campaign ID'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_LINE_OF_BUSINESS_DIM = 'Line of Business_Dim'
BULK_COL_CAMPAIGNID = 'Campaign ID'
# Function to extract value between first and second underscore
def extract_lob(campaign_name):
    try:
        parts = campaign_name.split('_')
        if len(parts) >= 2:
            return parts[1]
        return None  # Return None if format doesn't match
    except:
        return None  # Return None for any errors

# Create output dataframe with required columns
outputDf = inputDf[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_CAMPAIGNID]].copy()
outputDf.columns = [BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_CAMPAIGNID]

# Apply the extraction function to get LOB
outputDf[BULK_COL_LINE_OF_BUSINESS_DIM] = outputDf[BULK_COL_CAMPAIGN].apply(extract_lob)

# Remove rows where LOB extraction failed (where value is None)
outputDf = outputDf.dropna(subset=[BULK_COL_LINE_OF_BUSINESS_DIM])

# Debug print first few rows
print(tableize(inputDf.head()))
print("\nOutput DataFrame:")
print(tableize(outputDf.head()))

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus