Script 1593: SCRIPT Extract REF Marker
Purpose:
The Python script extracts the ‘REF Marker’ from the ‘Landing Page’ column and assigns it to the ‘REF Marker’ dimension in a structured data format.
To Elaborate
The script is designed to process a dataset containing various columns related to client advertising data, specifically focusing on extracting a reference marker from URLs in the ‘Landing Page’ column. The ‘REF Marker’ is a parameter within the URL that is identified by either “ref=” or “ref_=”. The script extracts this value and assigns it to a new column in the output dataset. If the ‘REF Marker’ is not found, it assigns a default value indicating the absence of a keyword-level URL. This process helps in organizing and analyzing advertising data by ensuring that the reference markers are correctly extracted and stored for further analysis.
Walking Through the Code
- Data Source and Column Setup:
- The script begins by defining the primary data source and relevant columns from the input dataset. These columns include client information, currency, keyword, match type, group, campaign, account, landing page, and others.
- Extract REF Marker Function:
- A function named
extract_ref_marker
is defined to extract the ‘REF Marker’ from the ‘Landing Page’ URL. It checks for the presence of “ref=” or “ref_=” and extracts the value following these markers. If neither is found, it returns a default message indicating no keyword-level URL.
- A function named
- Process Function:
- The
process
function creates an output DataFrame by copying specific columns from the input DataFrame. It applies theextract_ref_marker
function to the ‘Landing Page’ column to populate the ‘REF Marker’ column in the output DataFrame. - Additionally, it adds a constant value “YES” to the ‘kwdimcheck’ column in the output DataFrame.
- The
- Unit Test Function:
- A placeholder unit test function
test_process
is included to verify the functionality of theprocess
function, although it currently does not perform any specific tests.
- A placeholder unit test function
- Execution:
- The script concludes by executing the
process
function on the input DataFrame to generate the output DataFrame containing the extracted ‘REF Marker’ values.
- The script concludes by executing the
Vitals
- Script ID : 1593
- Client ID / Customer ID: 1306928459 / 60270435
- Action Type: Bulk Upload
- Item Changed: Keyword
- Output Columns: Account, Campaign, Group, Keyword, Match Type, kwdimcheck, REF Marker
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeremy Brown (jbrown@marinsoftware.com)
- Created by Jeremy Brown on 2024-12-20 15:07
- Last Updated by Jeremy Brown on 2025-01-30 12:11
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
##
## name: SCRIPT: Extract REF Marker
## description:
## Extracts the REF Marker from Landing Page and assign the value to the 'REF Marker' Dimension
##
## author: Jeremy Brown
## created: 2024-12-20
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CLIENT = 'Client'
RPT_COL_CURRENCY = 'Currency'
RPT_COL_KEYWORD = 'Keyword'
RPT_COL_MATCH_TYPE = 'Match Type'
RPT_COL_GROUP = 'Group'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_LANDING_PAGE = 'Landing Page'
RPT_COL_REF_MARKER = 'REF Marker'
RPT_COL_KWDIMCHECK = 'kwdimcheck'
RPT_COL_STATUS = 'Status'
RPT_COL_IMPR = 'Impr.'
# Output column
BULK_COL_REF_MARKER = 'REF Marker'
BULK_COL_KWDIMCHECK = 'kwdimcheck'
# Function to extract the 'ref=' value from the 'Landing Page' column
def extract_ref_marker(landing_page):
if not landing_page or pd.isna(landing_page):
return "No Keyword-Level URL"
# Find the ref parameter, whether "ref=" or "ref_="
ref_start = landing_page.find("ref=")
if ref_start == -1:
ref_start = landing_page.find("ref_=")
if ref_start == -1:
return "No Keyword-Level URL"
# Determine the correct length of the ref parameter
ref_length = 4 if landing_page[ref_start + 3] == "=" else 5
# Extract the value after 'ref=' or 'ref_='
ref_value = landing_page[ref_start + ref_length:]
# Stop at the next '&' if it exists
ampersand_index = ref_value.find("&")
if ampersand_index != -1:
ref_value = ref_value[:ampersand_index]
return ref_value
# Function to process the input DataFrame and populate the output DataFrame
def process(inputDf):
# Create the output DataFrame with required columns
outputDf = inputDf[[
RPT_COL_CLIENT,
RPT_COL_KEYWORD,
RPT_COL_MATCH_TYPE,
RPT_COL_GROUP,
RPT_COL_CAMPAIGN,
RPT_COL_ACCOUNT
]].copy()
# Apply the extract_ref_marker function to the 'Landing Page' column
outputDf[BULK_COL_REF_MARKER] = inputDf[RPT_COL_LANDING_PAGE].apply(extract_ref_marker)
# Add 'kwdimcheck' column with a constant value "YES"
outputDf[BULK_COL_KWDIMCHECK] = "YES"
# Print the data changed for debugging
print("Data after processing:")
print(outputDf)
return outputDf
# Unit test function for process
def test_process():
print("###UNITTEST START####")
try:
# Placeholder for unit tests
print("####PASS####")
except Exception as e:
print(f"####FAIL#### {e}")
# Trigger the main process
outputDf = process(inputDf)
Post generated on 2025-03-11 01:25:51 GMT