Script 505: Memira DK Offline Import

Purpose:

The Python script processes and merges data from two sources to generate a structured dataset for offline import, ensuring URLs are correctly formatted based on specific conditions.

To Elaborate

The script is designed to handle data integration and transformation tasks for Memira DK’s offline import process. It merges data from two different sources, specifically a Google Sheets document and another data source, based on common fields such as ‘Ad’ and ‘Keyword’. The primary goal is to ensure that the resulting dataset is correctly structured and formatted for further use. A key business rule implemented in the script is the conditional selection of URLs: if a URL contains the substring ‘utm_marin’, it is prioritized; otherwise, a default URL is used. This ensures that the URLs in the final dataset meet specific tracking requirements. The script also handles missing values by replacing them with empty strings, ensuring data consistency.

Walking Through the Code

Data Preparation:
- The script begins by loading data from two sources into dataframes: inputDf and gSheetsDf.
- These dataframes are merged on the ‘Ad’ and ‘Keyword’ columns to create a unified dataset, mergedDf.
Data Transformation:
- The merged dataframe is renamed to align with predefined column constants, ensuring consistency in column naming.
- Missing values in the ‘Destination URL’ and ‘Suggested URL’ columns are replaced with empty strings to avoid null-related issues.
URL Handling:
- The script checks if the ‘Destination URL’ or ‘Suggested URL’ contains the substring ‘utm_marin’.
- Based on this check, it sets the ‘FinalDestURL’ to either the ‘Destination URL’, ‘Suggested URL’, or a default URL (‘https://www.memira.dk/’).
Final Adjustments:
- The ‘FinalDestURL’ column is renamed to match the BULK_COL_CUSTOM_PARAMETERS constant.
- Any remaining missing values in the BULK_COL_CUSTOM_PARAMETERS column are filled with empty strings.
Output:
- The processed dataframe, outputDf, is prepared for output, and its contents are printed in a tabular format using the tableize function.

Vitals

Script ID : 505
Client ID / Customer ID: 1306925579 / 60269545
Action Type: Bulk Upload
Item Changed: Ad
Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Unique ID, Custom Parameters, Display URL
Linked Datasource: M1 Report
Reference Datasource: Google Sheets
Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
Created by Tom McCaughey on 2023-11-07 13:13
Last Updated by Tom McCaughey on 2024-04-23 12:52

> See it in Action

Python Code

#Offline format script
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'

inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)

# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')

mergedDf = mergedDf.rename(columns={
    'A': BULK_COL_UNIQUE_ID,
    'B': BULK_COL_CAMPAIGN,
    'E': BULK_COL_GROUP,
    'F': BULK_COL_CREATIVE_ID,
    'D': BULK_COL_MEDIA_SUBTYPE,
    'C': BULK_COL_MEDIA_TYPE,
    'G': BULK_COL_DISPLAY_URL
})

# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)

DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)

# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
                                    np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.dk/'))

# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})

# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)

outputDf = mergedDf

print(tableize(outputDf))

Post generated on 2025-03-11 01:25:51 GMT

07 Nov 2023

« Script 503: Memira DK Email import Script 507: Memira No Email import »

MarinOne Scripts Creator's Corner