Script 505: Memira DK Offline Import
Purpose
The Python script processes and merges data from two sources to prepare a structured dataset for offline import, ensuring URLs are correctly assigned based on specific conditions.
To Elaborate
The Python script is designed to handle data integration and transformation tasks for Memira DK’s offline import process. It merges data from two different sources, specifically a Google Sheets document and another input dataset, aligning them based on specific columns. The script then renames columns to match predefined constants and processes URL data to ensure that the final dataset contains the correct URLs. If certain URL parameters are present, it assigns the appropriate URL to a new column, ensuring that the data is ready for further use or analysis. This process is crucial for maintaining data consistency and accuracy in the offline import process.
Walking Through the Code
- Data Import and Initial Setup
- The script begins by importing data from two sources: a Google Sheets document and another input dataset.
- It uses these datasets to create two DataFrames,
inputDf
andgSheetsDf
.
- Data Merging
- The script merges these DataFrames on specific columns (‘Ad’ and ‘Keyword’), ensuring that related data from both sources is combined into a single DataFrame,
mergedDf
.
- The script merges these DataFrames on specific columns (‘Ad’ and ‘Keyword’), ensuring that related data from both sources is combined into a single DataFrame,
- Column Renaming
- It renames several columns in the merged DataFrame to match predefined constants, ensuring consistency in column naming.
- URL Processing
- The script checks for the presence of a specific parameter (‘utm_marin’) in two URL columns.
- Based on this check, it assigns a value to a new column,
FinalDestURL
, using a default URL if necessary.
- Final Adjustments
- The
FinalDestURL
column is renamed to match a predefined constant. - Any missing values in this column are replaced with an empty string to ensure data integrity.
- The
- Output Preparation
- The processed DataFrame,
outputDf
, is prepared for output, ready for further use or analysis.
- The processed DataFrame,
Vitals
- Script ID : 505
- Client ID / Customer ID: 1306925579 / 60269545
- Action Type: Bulk Upload
- Item Changed: Ad
- Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Unique ID, Custom Parameters, Display URL
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-11-07 13:13
- Last Updated by Tom McCaughey on 2024-04-23 12:52
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#Offline format script
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'
inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"] # gsheets dataframe (first sheet)
# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')
mergedDf = mergedDf.rename(columns={
'A': BULK_COL_UNIQUE_ID,
'B': BULK_COL_CAMPAIGN,
'E': BULK_COL_GROUP,
'F': BULK_COL_CREATIVE_ID,
'D': BULK_COL_MEDIA_SUBTYPE,
'C': BULK_COL_MEDIA_TYPE,
'G': BULK_COL_DISPLAY_URL
})
# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)
DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)
# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.dk/'))
# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})
# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)
outputDf = mergedDf
print(tableize(outputDf))
Post generated on 2024-11-27 06:58:46 GMT