Script 415: Offline Import Memira
Purpose:
The Python script merges and processes data from two sources to generate a structured dataset for advertising campaigns.
To Elaborate
The Python script is designed to merge and process data from two different sources, specifically focusing on advertising campaign information. It takes data from a Google Sheets document and another data source, merging them based on specific columns related to advertisements and keywords. The script then renames certain columns to align with predefined constants, ensuring consistency in the dataset. It also handles missing values in URL columns by replacing them with empty strings and checks for specific URL parameters to determine the final destination URL. This processed data is then prepared for output, providing a structured dataset that can be used for further analysis or reporting on advertising campaigns.
Walking Through the Code
- Data Input and Initialization
- The script begins by loading data from two sources into dataframes:
inputDf
from a generic data source andgSheetsDf
from a Google Sheets document. - These dataframes are merged based on the ‘Ad’ and ‘Keyword’ columns, using a left join to ensure all entries from the Google Sheets data are retained.
- The script begins by loading data from two sources into dataframes:
- Column Renaming
- After merging, the script renames several columns to match predefined constants, ensuring that the dataset has a consistent structure and naming convention.
- Handling Missing Values
- The script replaces any NaN values in the ‘Destination URL’ and ‘Suggested URL’ columns with empty strings to prevent errors in subsequent processing.
- URL Processing
- It checks if the ‘Destination URL’ or ‘Suggested URL’ contains the parameter ‘utm_marin’. Based on this check, it sets the ‘FinalDestURL’ to either the ‘Destination URL’, ‘Suggested URL’, or a default URL (‘https://www.memira.se/’).
- Final Adjustments
- The ‘FinalDestURL’ column is renamed to match the constant
BULK_COL_CUSTOM_PARAMETERS
. - Any remaining NaN values in this column are replaced with empty strings to ensure data integrity.
- The ‘FinalDestURL’ column is renamed to match the constant
- Output Preparation
- The processed dataframe,
outputDf
, is prepared for output, and the results are printed in a tabular format using thetableize
function.
- The processed dataframe,
Vitals
- Script ID : 415
- Client ID / Customer ID: 1306925575 / 60269545
- Action Type: Bulk Upload
- Item Changed: Ad
- Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Unique ID, Custom Parameters, Display URL
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-10-25 15:50
- Last Updated by Tom McCaughey on 2024-01-10 14:24
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#Offline format script
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'
inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"] # gsheets dataframe (first sheet)
# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')
mergedDf = mergedDf.rename(columns={
'A': BULK_COL_UNIQUE_ID,
'B': BULK_COL_CAMPAIGN,
'E': BULK_COL_GROUP,
'F': BULK_COL_CREATIVE_ID,
'D': BULK_COL_MEDIA_SUBTYPE,
'C': BULK_COL_MEDIA_TYPE,
'G': BULK_COL_DISPLAY_URL
})
# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)
DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)
# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.se/'))
# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})
# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)
outputDf = mergedDf
print(tableize(outputDf))
Post generated on 2025-03-11 01:25:51 GMT