Script 355: Email Import Memira SE
Purpose
The Python script processes and merges data from two sources to update and format URLs based on specific conditions for Memira SE’s email import.
To Elaborate
The Python script is designed to handle data integration and transformation tasks for Memira SE’s email import process. It merges data from two different sources, specifically a Google Sheets document and another data source, based on common fields. The script then renames columns to align with predefined constants and checks for specific URL parameters within the data. If certain conditions are met, such as the presence of ‘utm_marin’ in URLs, it updates the URLs accordingly. This ensures that the final dataset is correctly formatted and ready for further use, such as in marketing campaigns or reporting. The script also handles missing data by replacing NaN values with empty strings in specific columns, ensuring data consistency and completeness.
Walking Through the Code
- Data Import and Initial Setup
- The script begins by importing data from two sources: a Google Sheets document and another data source, storing them in
gSheetsDf
andinputDf
respectively. - It merges these dataframes on the ‘Ad’ and ‘Keyword’ columns, using a left join to ensure all entries from the Google Sheets are retained.
- The script begins by importing data from two sources: a Google Sheets document and another data source, storing them in
- Column Renaming
- After merging, the script renames several columns to match predefined constants, which likely represent standardized column names for further processing.
- Handling Missing Data
- The script replaces NaN values in the ‘Destination URL’ and ‘Suggested URL’ columns with empty strings to prevent errors during URL processing.
- URL Processing
- It checks if the ‘Destination URL’ or ‘Suggested URL’ contains the string ‘utm_marin’. Based on this condition, it sets a new column, ‘FinalDestURL’, to either the ‘Destination URL’, ‘Suggested URL’, or a default URL (‘https://www.memira.se/’).
- Final Adjustments
- The ‘FinalDestURL’ column is renamed to match the constant
BULK_COL_CUSTOM_PARAMETERS
. - Any remaining NaN values in this column are replaced with empty strings to ensure data integrity.
- The ‘FinalDestURL’ column is renamed to match the constant
- Output
- The processed dataframe is stored in
outputDf
and printed in a tabular format for review or further use.
- The processed dataframe is stored in
Vitals
- Script ID : 355
- Client ID / Customer ID: 1306925575 / 60269545
- Action Type: Bulk Upload
- Item Changed: Ad
- Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Unique ID, Display URL, Custom Parameters, Status
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-10-13 10:49
- Last Updated by Tom McCaughey on 2024-01-10 14:24
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'
BULK_COL_STATUS = 'Status'
inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"] # gsheets dataframe (first sheet)
# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')
mergedDf = mergedDf.rename(columns={
'A': BULK_COL_UNIQUE_ID,
'B': BULK_COL_CAMPAIGN,
'E': BULK_COL_GROUP,
'F': BULK_COL_CREATIVE_ID,
'D': BULK_COL_MEDIA_SUBTYPE,
'C': BULK_COL_MEDIA_TYPE,
'G': BULK_COL_DISPLAY_URL,
'H': BULK_COL_STATUS
})
# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)
DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)
# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.se/'))
# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})
# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)
outputDf = mergedDf
print(tableize(outputDf))
Post generated on 2024-11-27 06:58:46 GMT