Script 1591: Script Auto Tag Marketo ID

Purpose:

The script extracts and processes Marketo IDs from campaign names in a dataset.

To Elaborate

The Python script is designed to process a dataset containing marketing campaign information and extract specific identifiers, known as Marketo IDs, from the campaign names. The primary goal is to identify and extract these IDs, which are assumed to be located after the last underscore in the campaign name. If the extracted portion contains numbers, the script further truncates it to the first four digits. This process helps in organizing and structuring the data for further analysis or reporting, ensuring that only valid entries with successfully extracted Marketo IDs are retained in the final output.

Walking Through the Code

  1. Data Preparation:
    • The script begins by defining the primary data source and relevant columns from the dataset. It creates an output DataFrame with selected columns for further processing.
  2. Marketo ID Extraction:
    • A function extract_marketo_id is defined to handle the extraction of Marketo IDs from campaign names. It splits the campaign name by underscores and attempts to extract the last segment. If this segment contains numbers, it truncates the result to the first four digits.
  3. Data Processing:
    • The script applies the extract_marketo_id function to the campaign names in the dataset, storing the results in a new column. It then removes any rows where the extraction failed, ensuring that only valid data is retained.
  4. Output:
    • The script prepares the final output DataFrame, which includes the extracted Marketo IDs, and prints a preview of the input and output data for verification purposes.

Vitals

  • Script ID : 1591
  • Client ID / Customer ID: 1306927959 / 60270403
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, Marketo ID_Dim, Campaign ID
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Grégory Pantaine (gpantaine@marinsoftware.com)
  • Created by Grégory Pantaine on 2024-12-20 14:22
  • Last Updated by Grégory Pantaine on 2025-01-16 17:27
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import datetime

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_CAMPAIGN_ID = 'Campaign ID'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_MARKETO_ID_DIM = 'Marketo ID_Dim'
RPT_COL_CAMPAIGNID = 'Campaign ID'

# output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_MARKETO_ID_DIM = 'Marketo ID_Dim'
BULK_COL_CAMPAIGNID = 'Campaign ID'

# Function to extract and truncate value after the last underscore
def extract_marketo_id(campaign_name):
    try:
        parts = campaign_name.split('_')
        if parts:
            marketo_id = parts[-1]
            # Extract first 4 digits if the marketo_id contains numbers
            digits = ''.join(filter(str.isdigit, marketo_id))
            return digits[:4] if digits else marketo_id  # Return first 4 digits if available
        return None
    except:
        return None

# Create output dataframe with required columns
outputDf = inputDf[[RPT_COL_ACCOUNT, RPT_COL_CAMPAIGN, RPT_COL_CAMPAIGNID]].copy()
outputDf.columns = [BULK_COL_ACCOUNT, BULK_COL_CAMPAIGN, BULK_COL_CAMPAIGNID]

# Apply the extraction function to get Marketo ID
outputDf[BULK_COL_MARKETO_ID_DIM] = outputDf[BULK_COL_CAMPAIGN].apply(extract_marketo_id)

# Remove rows where Marketo ID extraction failed (where value is None)
outputDf = outputDf.dropna(subset=[BULK_COL_MARKETO_ID_DIM])

# Debug print first few rows
print(tableize(inputDf.head()))
print("\nOutput DataFrame:")
print(tableize(outputDf.head()))

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus