Script 509: Memira NO Offline Import
Purpose:
The Python script processes and merges data from two sources to generate a structured dataset for offline import, ensuring URLs are correctly assigned based on specific conditions.
To Elaborate
The Python script is designed to handle data integration and transformation tasks for offline import processes. It takes data from two sources, merges them based on specific columns, and then performs a series of transformations to ensure the data is in the correct format for further use. A key aspect of the script is its handling of URL data, where it checks for specific parameters within URLs and assigns a final URL based on these conditions. This ensures that the data is consistent and meets the requirements for structured budget allocation (SBA) processes. The script also includes steps to handle missing data by replacing NaN values with empty strings, ensuring that the final dataset is clean and ready for import.
Walking Through the Code
- Data Import and Initial Setup
- The script begins by importing data from two sources into dataframes:
inputDf
andgSheetsDf
. - These dataframes are merged based on the ‘Ad’ and ‘Keyword’ columns, using a left join to ensure all entries from
gSheetsDf
are retained.
- The script begins by importing data from two sources into dataframes:
- Data Transformation
- The merged dataframe is renamed to align with predefined column names, ensuring consistency in the dataset.
- The script checks for NaN values in the ‘Destination URL’ and ‘Suggested URL’ columns, replacing them with empty strings to prevent errors in subsequent operations.
- URL Handling
- The script identifies URLs containing the ‘utm_marin’ parameter in both ‘Destination URL’ and ‘Suggested URL’ columns.
- Based on the presence of this parameter, it assigns a ‘FinalDestURL’, defaulting to a specific URL if neither contains the parameter.
- Final Adjustments and Output
- The ‘FinalDestURL’ column is renamed to ‘Custom Parameters’ to match the expected output format.
- Any remaining NaN values in the ‘Custom Parameters’ column are replaced with empty strings.
- The final dataframe,
outputDf
, is prepared for output, ensuring it is ready for the next steps in the offline import process.
Vitals
- Script ID : 509
- Client ID / Customer ID: 1306925577 / 60269545
- Action Type: Bulk Upload
- Item Changed: Ad
- Output Columns: Account, Campaign, Group, Creative ID, Media sub-type, Media Type, Display URL, Custom Parameters, Unique ID
- Linked Datasource: M1 Report
- Reference Datasource: Google Sheets
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-11-07 16:14
- Last Updated by Tom McCaughey on 2024-05-03 08:29
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#Offline format script
# Define the column names as constants
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_CREATIVE_ID = 'Creative ID'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_DISPLAY_URL = 'Display URL'
BULK_COL_CUSTOM_PARAMETERS = 'Custom Parameters'
BULK_COL_UNIQUE_ID = 'Unique ID'
inputDf = dataSourceDict["1"]
gSheetsDf = dataSourceDict["2_1"] # gsheets dataframe (first sheet)
# Merge dataframes based on 'Ad' and 'Keyword'
mergedDf = pd.merge(gSheetsDf, inputDf, left_on='F', right_on='Keyword', how='left')
mergedDf = mergedDf.rename(columns={
'A': BULK_COL_UNIQUE_ID,
'B': BULK_COL_CAMPAIGN,
'E': BULK_COL_GROUP,
'F': BULK_COL_CREATIVE_ID,
'D': BULK_COL_MEDIA_SUBTYPE,
'C': BULK_COL_MEDIA_TYPE,
'G': BULK_COL_DISPLAY_URL
})
# If you want to replace NaN values with an empty string in the URL columns
mergedDf['Destination URL'].fillna('', inplace=True)
mergedDf['Suggested URL'].fillna('', inplace=True)
DestURL_contains_utm_marin = mergedDf['Destination URL'].str.contains('utm_marin', case=False, na=False)
SuggestURL_contains_utm_marin = mergedDf['Suggested URL'].str.contains('utm_marin', case=False, na=False)
# Set FinalDestURL based on conditions
mergedDf['FinalDestURL'] = np.where(DestURL_contains_utm_marin, mergedDf['Destination URL'],
np.where(SuggestURL_contains_utm_marin, mergedDf['Suggested URL'], 'https://www.memira.se/'))
# Rename the FinalDestURL column to BULK_COL_CUSTOM_PARAMETERS
mergedDf = mergedDf.rename(columns={'FinalDestURL': BULK_COL_CUSTOM_PARAMETERS})
# If you want to replace NaN values with an empty string in the BULK_COL_CUSTOM_PARAMETERS column
mergedDf[BULK_COL_CUSTOM_PARAMETERS].fillna('', inplace=True)
outputDf = mergedDf
print(tableize(outputDf))
Post generated on 2025-03-11 01:25:51 GMT