Script 505: Memira DK Offline Import
Purpose:
The Python script processes and merges data from two sources to generate a structured dataset for offline import, ensuring URLs are correctly formatted based on specific conditions.
To Elaborate
The script is designed to handle data integration and transformation tasks for Memira DK’s offline import process. It merges data from two different sources, specifically a Google Sheets document and another data source, based on common fields such as ‘Ad’ and ‘Keyword’. The primary goal is to ensure that the resulting dataset is correctly structured and formatted for further use. A key business rule implemented in the script is the conditional selection of URLs: if a URL contains the substring ‘utm_marin’, it is prioritized; otherwise, a default URL is used. This ensures that the URLs in the final dataset meet specific tracking requirements. The script also handles missing values by replacing them with empty strings, ensuring data consistency.
Walking Through the Code
- Data Preparation:
- The script begins by loading data from two sources into dataframes:
inputDf
andgSheetsDf
. - These dataframes are merged on the ‘Ad’ and ‘Keyword’ columns to create a unified dataset,
mergedDf
.
- The script begins by loading data from two sources into dataframes:
- Data Transformation:
- The merged dataframe is renamed to align with predefined column constants, ensuring consistency in column naming.
- Missing values in the ‘Destination URL’ and ‘Suggested URL’ columns are replaced with empty strings to avoid null-related issues.
- URL Handling:
- The script checks if the ‘Destination URL’ or ‘Suggested URL’ contains the substring ‘utm_marin’.
- Based on this check, it sets the ‘FinalDestURL’ to either the ‘Destination URL’, ‘Suggested URL’, or a default URL (‘https://www.memira.dk/’).
- Final Adjustments:
- The ‘FinalDestURL’ column is renamed to match the
BULK_COL_CUSTOM_PARAMETERS
constant. - Any remaining missing values in the
BULK_COL_CUSTOM_PARAMETERS
column are filled with empty strings.
- The ‘FinalDestURL’ column is renamed to match the
- Output:
- The processed dataframe,
outputDf
, is prepared for output, and its contents are printed in a tabular format using thetableize
function.
- The processed dataframe,
Vitals
- Script ID : 505
- Client ID / Customer ID: 1306925579 / 60269545
- Action Type: Bulk Upload
- Item Changed: Ad
- Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Unique ID, Custom Parameters, Display URL
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-11-07 13:13
- Last Updated by Tom McCaughey on 2024-04-23 12:52
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#Offline format script
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'
inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"] # gsheets dataframe (first sheet)
# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')
mergedDf = mergedDf.rename(columns={
'A': BULK_COL_UNIQUE_ID,
'B': BULK_COL_CAMPAIGN,
'E': BULK_COL_GROUP,
'F': BULK_COL_CREATIVE_ID,
'D': BULK_COL_MEDIA_SUBTYPE,
'C': BULK_COL_MEDIA_TYPE,
'G': BULK_COL_DISPLAY_URL
})
# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)
DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)
# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.dk/'))
# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})
# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)
outputDf = mergedDf
print(tableize(outputDf))
Post generated on 2025-03-11 01:25:51 GMT