Script 1349: Dimension Update Subregion
Purpose
Automates the process of determining and populating subregion dimensions based on campaign name values in a dataset.
To Elaborate
The Python script is designed to automate the task of assigning subregion dimensions to campaigns based on their names. This is particularly useful for organizations that manage marketing campaigns across various geographical regions and need to categorize these campaigns accordingly. The script reads a dataset containing campaign information, identifies campaigns that do not yet have a subregion assigned, and uses predefined rules to determine the appropriate subregion based on the campaign name. This helps in maintaining consistent and accurate regional categorization, which is crucial for reporting and analysis purposes.
Walking Through the Code
-
Data Preparation: The script begins by loading the primary data source into a DataFrame named
inputDf
. It identifies key columns such as ‘Campaign’, ‘Account’, and ‘Subregion’, which are essential for processing. -
Subregion Determination Function: A function named
determine_subregion
is defined to map campaign names to subregions. This function uses a series of conditional checks to match the beginning of campaign names with specific subregion identifiers, returning the corresponding subregion name. -
Filtering Data: The script filters the input DataFrame to create a new DataFrame,
filteredDf
, which only includes rows where the subregion is not already populated. This ensures that only campaigns needing subregion assignment are processed. -
Applying Subregion Logic: The
determine_subregion
function is applied to each campaign name in the filtered DataFrame. The results are stored in a new DataFrame,outputDf
, which contains the updated subregion information. -
Output: Finally, the script prints the first few rows of the updated DataFrame to display the results of the subregion assignment process.
Vitals
- Script ID : 1349
- Client ID / Customer ID: 1306913420 / 60268008
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, Subregion
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Kyle Perkins (kyle.perkins@genesys.com)
- Created by Kyle Perkins on 2024-08-28 17:36
- Last Updated by Kyle Perkins on 2024-08-28 17:56
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
##
## name: Dimension Update - Subregion
## description:
## Automates populating subregion dimensions based on campaign name values.
##
## author:
## created: 2024-08-28
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# Primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_SUBREGION = 'Subregion' # Reference to the subregion column
# Output columns and initial values
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_SUBREGION = 'Subregion' # Output subregion column
# Define the function that mimics the Excel formula to determine subregion
def determine_subregion(campaign_name):
campaign_name = campaign_name.lower() # Convert to lowercase to ignore case
if campaign_name.startswith("apac_anz"):
return "APAC - ANZ"
elif campaign_name.startswith(("apac_all", "apac_en")):
return "APAC - ALL"
elif campaign_name.startswith("apac_asia"):
return "APAC - ASIA"
elif "apac_cn" in campaign_name:
return "APAC - CN"
elif campaign_name.startswith(("apac_in", "apac-in")):
return "APAC - IN"
elif campaign_name.startswith("apac_jp"):
return "APAC - JP"
elif campaign_name.startswith("apac_kr"):
return "APAC - KR"
elif campaign_name.startswith("apac_sea"):
return "APAC - SEA"
elif campaign_name.startswith("apac_taiwan"):
return "APAC - Taiwan"
elif campaign_name.startswith(("apac_th", "apac_thai")):
return "APAC - THAI"
elif campaign_name.startswith("emea_cz"):
return "EMEA - CZ"
elif campaign_name.startswith(("emea_de", "mk_de", "de -", "de|")):
return "EMEA - DE"
elif campaign_name.startswith("emea_dk"):
return "EMEA - DK"
elif campaign_name.startswith("emea_es"):
return "EMEA - ES"
elif campaign_name.startswith("emea_fi"):
return "EMEA - FI"
elif campaign_name.startswith(("emea_fr", "fr -")):
return "EMEA - FR"
elif campaign_name.startswith("emea_gr"):
return "EMEA - GR"
elif campaign_name.startswith("emea_il"):
return "EMEA - IL"
elif campaign_name.startswith("emea_it"):
return "EMEA - IT"
elif campaign_name.startswith("emea_me"):
return "EMEA - ME"
elif campaign_name.startswith("emea_nl"):
return "EMEA - NL"
elif campaign_name.startswith("emea_no"):
return "EMEA - NO"
elif campaign_name.startswith("emea_pl"):
return "EMEA - PL"
elif campaign_name.startswith("emea_roemea"):
return "EMEA - RoEMEA"
elif campaign_name.startswith("emea_ro"):
return "EMEA - RO"
elif campaign_name.startswith("emea_ru"):
return "EMEA - RU"
elif campaign_name.startswith("emea_se"):
return "EMEA - SE"
elif campaign_name.startswith("emea_tr"):
return "EMEA - TR"
elif campaign_name.startswith(("emea_uk", "mk_uk", "mpx_uk", "uk -")):
return "EMEA - UK"
elif campaign_name.startswith("emea_za"):
return "EMEA - ZA"
elif campaign_name.startswith(("latam_all", "latam_an", "latam_andean")):
return "LATAM - NOLA"
elif campaign_name.startswith(("latam_co", "latam_cr", "latam_ec", "latam_peru", "latam_nola")):
return "LATAM - NOLA"
elif campaign_name.startswith(("latam_sola", "latam_ar", "latam_chile", "latam_cl", "latam_sc")):
return "LATAM - SOLA"
elif campaign_name.startswith("latam_br"):
return "LATAM - BR"
elif campaign_name.startswith("latam_mx"):
return "LATAM - MX"
elif campaign_name.startswith(("latam_es", "latam_pt")):
return "LATAM - ES (Legacy)" if "latam_es" in campaign_name else "LATAM - PT (Legacy)"
elif campaign_name.startswith(("na_na", "na_ca", "dx_us", "bold_us")):
return "NORAM - NA"
else:
return ""
# Filter inputDf to skip rows where subregion is already populated and copy rows to new filteredDf dataframe
filteredDf = inputDf[~inputDf[RPT_COL_SUBREGION].notna()].copy()
# Initialize outputDf so that it only includes rows that have been filtered
outputDf = filteredDf
# Apply the function to each row in the filtered dataframe and set result as the output dataframe value
outputDf[BULK_COL_SUBREGION] = filteredDf[RPT_COL_CAMPAIGN].apply(determine_subregion)
# Printing the result
print(tableize(outputDf.head()))
Post generated on 2024-11-27 06:58:46 GMT