Script 1205: Script Group Country Code

Purpose:

The script extracts and assigns a country code from a group name into a designated column based on specific parsing rules.

To Elaborate

The Python script is designed to parse group names and extract country codes, which are then assigned to a specific column labeled ‘Country Code’. The country code is identified as the substring appearing before the first hyphen (‘-‘) in the group name. Special handling is provided for group names starting with “BEFR” or “BENL”, where the country code is explicitly set to “BE”. If the extracted substring is not exactly two characters long, the script assigns “N/A” to indicate that a valid country code could not be determined. This functionality is useful for organizing and categorizing data based on geographic identifiers embedded within group names.

Walking Through the Code

  1. Configurable Parameters:
    • The script begins by defining a configurable parameter PLACEMENT_KEY, which is used to identify the delimiter (‘ - ‘) in the group name.
    • The primary data source is specified through inputDf, which is a dictionary entry from dataSourceDict.
  2. Function Definition:
    • A function get_country_code_from_group_name is defined to extract the country code from a given group name.
    • The function handles special cases for group names starting with “BEFR” or “BENL” by returning “BE”.
    • It uses a regular expression to capture the substring before the first hyphen and checks if it is two characters long to qualify as a country code.
  3. Data Processing:
    • The script copies all rows from the input DataFrame inputDf to outputDf.
    • It applies the get_country_code_from_group_name function to each group name in the input DataFrame, storing the result in the ‘Country Code’ column of outputDf.
  4. Data Cleaning:
    • Extra whitespace is removed from the group names in outputDf to ensure clean data presentation.
  5. Output:
    • Finally, the script prints a tableized version of the output DataFrame, displaying the processed data with the extracted country codes.

Vitals

  • Script ID : 1205
  • Client ID / Customer ID: 1306927811 / 60270355
  • Action Type: Bulk Upload
  • Item Changed: AdGroup
  • Output Columns: Account, Campaign, Group, Country Code
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Grégory Pantaine (gpantaine@marinsoftware.com)
  • Created by Grégory Pantaine on 2024-06-21 11:54
  • Last Updated by Grégory Pantaine on 2024-06-21 12:01
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
## name: Script - Group - Country Code
## description:
## Parse Group Name and pick out the country code into a dimension 'Country Code'.
## Country code appears before the first '-' in the group name.
## 
## Copied by Grégory Pantaine
## created: 2024-06-21

########### Configurable Params - START ##########
PLACEMENT_KEY = ' - '

# Primary data source and columns
inputDf = dataSourceDict["1"]

# Output columns and initial values

RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_COUNTRY_CODE = 'Country Code'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_COUNTRY_CODE = 'Country Code'

# Function to extract country code from group name
def get_country_code_from_group_name(group_name):
    # Special cases for BEFR and BENL
    if group_name.startswith("BEFR") or group_name.startswith("BENL"):
        return "BE"
    
    # Regular expression pattern to match the country code before the first '-'
    regex_pattern = r"^([^-]+)"

    # Search for the country code using the pattern
    match = re.search(regex_pattern, group_name)
    if match:
        country_code = match.group(1).strip()
        if len(country_code) == 2:
            return country_code  # Return the matched country code
        else:
            return "N/A"
    else:
        return "N/A"  # Return "N/A" if no match is found

# Copy all input rows to output
outputDf = inputDf.copy()

# Extract country code from each group name
outputDf[BULK_COL_COUNTRY_CODE] = inputDf[RPT_COL_GROUP].apply(get_country_code_from_group_name)

# Remove extra whitespace from group names
outputDf[RPT_COL_GROUP] = outputDf[RPT_COL_GROUP].str.strip()

# Print the tableized version of the output DataFrame
print(tableize(outputDf))

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus