Script 509: Memira NO Offline Import

Purpose

The script processes and merges data from two sources to generate a structured output for offline import.

To Elaborate

The Python script is designed to merge and process data from two different sources, specifically a Google Sheets document and another data source, to create a structured output suitable for offline import. The script focuses on aligning and renaming columns to match a predefined format, handling missing values, and setting specific URL parameters based on certain conditions. The primary goal is to ensure that the data is correctly formatted and enriched with necessary parameters for further use, such as in marketing or advertising platforms. The script also includes logic to handle specific URL conditions, ensuring that the final output meets the required specifications for the intended use case.

Walking Through the Code

  1. Data Import and Initial Setup
    • The script begins by importing data from two sources: a Google Sheets document and another unspecified data source.
    • It uses these data sources to create two dataframes, inputDf and gSheetsDf.
  2. Data Merging
    • The script merges these dataframes based on specific columns (‘Ad’ and ‘Keyword’), using a left join to ensure all entries from the Google Sheets are retained.
  3. Column Renaming
    • After merging, the script renames several columns to match predefined constants, ensuring consistency in the output format.
  4. Handling Missing Values
    • The script replaces any NaN values in the ‘Destination URL’ and ‘Suggested URL’ columns with empty strings to prevent errors in subsequent processing.
  5. URL Parameter Logic
    • It checks for the presence of ‘utm_marin’ in the ‘Destination URL’ and ‘Suggested URL’ columns.
    • Based on these checks, it sets a ‘FinalDestURL’ column, prioritizing URLs containing ‘utm_marin’ and defaulting to a specific URL if neither contains it.
  6. Final Adjustments
    • The ‘FinalDestURL’ column is renamed to match a predefined constant for custom parameters.
    • Any remaining NaN values in this column are replaced with empty strings to ensure clean output.
  7. Output Preparation
    • The processed dataframe is assigned to outputDf, which is then printed in a tabular format for review.

Vitals

  • Script ID : 509
  • Client ID / Customer ID: 1306925577 / 60269545
  • Action Type: Bulk Upload
  • Item Changed: Ad
  • Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Display URL, Custom Parameters, Unique ID
  • Linked Datasource: M1 Report
  • Reference Datasource: Google Sheets
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2023-11-07 16:14
  • Last Updated by Tom McCaughey on 2024-05-03 08:29
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#Offline format script
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'

inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"]  # gsheets dataframe (first sheet)

# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')

mergedDf = mergedDf.rename(columns={
    'A': BULK_COL_UNIQUE_ID,
    'B': BULK_COL_CAMPAIGN,
    'E': BULK_COL_GROUP,
    'F': BULK_COL_CREATIVE_ID,
    'D': BULK_COL_MEDIA_SUBTYPE,
    'C': BULK_COL_MEDIA_TYPE,
    'G': BULK_COL_DISPLAY_URL
})

# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)

DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)

# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
                                    np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.se/'))

# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})

# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)

outputDf = mergedDf

print(tableize(outputDf))

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus