Script 355: Email Import Memira SE
Purpose:
The Python script processes and merges data from two sources to generate a structured dataset for email import, ensuring URLs are correctly formatted and assigned.
To Elaborate
The Python script is designed to handle and process data from two different sources, specifically focusing on merging and transforming this data to prepare it for email import tasks. The script merges data from a Google Sheets source and another input source based on specific columns (‘Ad’ and ‘Keyword’). It then renames columns to align with predefined constants and checks for specific URL parameters. If certain conditions are met, it assigns a final destination URL, ensuring that URLs are correctly formatted and assigned. This process is crucial for maintaining data integrity and ensuring that the email import process runs smoothly, with all necessary information correctly structured and formatted.
Walking Through the Code
- Data Input and Merging
- The script begins by loading data from two sources: a Google Sheets dataframe and another input dataframe.
- It merges these dataframes on the ‘Ad’ and ‘Keyword’ columns, using a left join to ensure all entries from the Google Sheets are retained.
- Column Renaming
- After merging, the script renames several columns to match predefined constants, ensuring consistency and clarity in the dataset.
- URL Handling
- The script checks for the presence of ‘utm_marin’ in the ‘Destination URL’ and ‘Suggested URL’ columns.
- Based on these checks, it assigns a ‘FinalDestURL’, prioritizing URLs containing ‘utm_marin’ and defaulting to a specific URL if neither contains it.
- Final Adjustments
- The ‘FinalDestURL’ column is renamed to a constant representing custom parameters.
- Any NaN values in this column are replaced with an empty string to prevent errors in subsequent processing.
- Output
- The processed dataframe is assigned to
outputDf
, and the script prints the tableized version of this dataframe for review.
- The processed dataframe is assigned to
Vitals
- Script ID : 355
- Client ID / Customer ID: 1306925575 / 60269545
- Action Type: Bulk Upload
- Item Changed: Ad
- Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Unique ID, Display URL, Custom Parameters, Status
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-10-13 10:49
- Last Updated by Tom McCaughey on 2024-01-10 14:24
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'
BULK_COL_STATUS = 'Status'
inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"] # gsheets dataframe (first sheet)
# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')
mergedDf = mergedDf.rename(columns={
'A': BULK_COL_UNIQUE_ID,
'B': BULK_COL_CAMPAIGN,
'E': BULK_COL_GROUP,
'F': BULK_COL_CREATIVE_ID,
'D': BULK_COL_MEDIA_SUBTYPE,
'C': BULK_COL_MEDIA_TYPE,
'G': BULK_COL_DISPLAY_URL,
'H': BULK_COL_STATUS
})
# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)
DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)
# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.se/'))
# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})
# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)
outputDf = mergedDf
print(tableize(outputDf))
Post generated on 2025-03-11 01:25:51 GMT