Script 1593: SCRIPT Extract REF Marker

Purpose:

The Python script extracts the ‘REF Marker’ from the ‘Landing Page’ column and assigns it to the ‘REF Marker’ dimension in a structured data format.

To Elaborate

The script is designed to process a dataset containing various columns related to client advertising data, specifically focusing on extracting a reference marker from URLs in the ‘Landing Page’ column. The ‘REF Marker’ is a parameter within the URL that is identified by either “ref=” or “ref_=”. The script extracts this value and assigns it to a new column in the output dataset. If the ‘REF Marker’ is not found, it assigns a default value indicating the absence of a keyword-level URL. This process helps in organizing and analyzing advertising data by ensuring that the reference markers are correctly extracted and stored for further analysis.

Walking Through the Code

  1. Data Source and Column Setup:
    • The script begins by defining the primary data source and relevant columns from the input dataset. These columns include client information, currency, keyword, match type, group, campaign, account, landing page, and others.
  2. Extract REF Marker Function:
    • A function named extract_ref_marker is defined to extract the ‘REF Marker’ from the ‘Landing Page’ URL. It checks for the presence of “ref=” or “ref_=” and extracts the value following these markers. If neither is found, it returns a default message indicating no keyword-level URL.
  3. Process Function:
    • The process function creates an output DataFrame by copying specific columns from the input DataFrame. It applies the extract_ref_marker function to the ‘Landing Page’ column to populate the ‘REF Marker’ column in the output DataFrame.
    • Additionally, it adds a constant value “YES” to the ‘kwdimcheck’ column in the output DataFrame.
  4. Unit Test Function:
    • A placeholder unit test function test_process is included to verify the functionality of the process function, although it currently does not perform any specific tests.
  5. Execution:
    • The script concludes by executing the process function on the input DataFrame to generate the output DataFrame containing the extracted ‘REF Marker’ values.

Vitals

  • Script ID : 1593
  • Client ID / Customer ID: 1306928459 / 60270435
  • Action Type: Bulk Upload
  • Item Changed: Keyword
  • Output Columns: Account, Campaign, Group, Keyword, Match Type, kwdimcheck, REF Marker
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jeremy Brown (jbrown@marinsoftware.com)
  • Created by Jeremy Brown on 2024-12-20 15:07
  • Last Updated by Jeremy Brown on 2025-01-30 12:11
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
##
## name: SCRIPT: Extract REF Marker
## description:
## Extracts the REF Marker from Landing Page and assign the value to the 'REF Marker' Dimension
## 
## author: Jeremy Brown
## created: 2024-12-20
## 

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CLIENT = 'Client'
RPT_COL_CURRENCY = 'Currency'
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_GROUP = 'Group'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_LANDING_PAGE = 'Landing Page'
RPT_COL_REF_MARKER = 'REF Marker'
RPT_COL_KWDIMCHECK = 'kwdimcheck'
RPT_COL_STATUS = 'Status'
RPT_COL_IMPR = 'Impr.'

# Output column
BULK_COL_REF_MARKER = 'REF Marker'
BULK_COL_KWDIMCHECK = 'kwdimcheck'

# Function to extract the 'ref=' value from the 'Landing Page' column
def extract_ref_marker(landing_page):
    if not landing_page or pd.isna(landing_page):
        return "No Keyword-Level URL"

    # Find the ref parameter, whether "ref=" or "ref_="
    ref_start = landing_page.find("ref=")
    if ref_start == -1:
        ref_start = landing_page.find("ref_=")

    if ref_start == -1:
        return "No Keyword-Level URL"

    # Determine the correct length of the ref parameter
    ref_length = 4 if landing_page[ref_start + 3] == "=" else 5

    # Extract the value after 'ref=' or 'ref_='
    ref_value = landing_page[ref_start + ref_length:]

    # Stop at the next '&' if it exists
    ampersand_index = ref_value.find("&")
    if ampersand_index != -1:
        ref_value = ref_value[:ampersand_index]

    return ref_value

# Function to process the input DataFrame and populate the output DataFrame
def process(inputDf):
    # Create the output DataFrame with required columns
    outputDf = inputDf[[
        RPT_COL_CLIENT,
        RPT_COL_KEYWORD,
        RPT_COL_MATCH_TYPE,
        RPT_COL_GROUP,
        RPT_COL_CAMPAIGN,
        RPT_COL_ACCOUNT
    ]].copy()

    # Apply the extract_ref_marker function to the 'Landing Page' column
    outputDf[BULK_COL_REF_MARKER] = inputDf[RPT_COL_LANDING_PAGE].apply(extract_ref_marker)

    # Add 'kwdimcheck' column with a constant value "YES"
    outputDf[BULK_COL_KWDIMCHECK] = "YES"

    # Print the data changed for debugging
    print("Data after processing:")
    print(outputDf)

    return outputDf

# Unit test function for process
def test_process():
    print("###UNITTEST START####")
    
    try:
        # Placeholder for unit tests
        print("####PASS####")
    except Exception as e:
        print(f"####FAIL#### {e}")

# Trigger the main process
outputDf = process(inputDf)

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus