Script 345: mediatype tagging
Purpose:
The Python script categorizes media types and sub-types based on specific account and group criteria in a dataset.
To Elaborate
The Python script is designed to classify media types and sub-types within a dataset based on predefined criteria related to account names and group identifiers. It processes input data to assign specific media categories such as ‘Native’, ‘Social’, ‘SEM’, ‘Display’, ‘Online Video’, ‘TV’, ‘Email’, ‘OOH’, ‘Radio’, ‘Print’, ‘SMS’, ‘Quiz’, and ‘Affiliate’. These classifications are determined by checking for specific keywords within the ‘Account’, ‘Campaign’, ‘Group’, and ‘Publisher Name’ columns. The script aims to streamline the tagging process by automating the assignment of media types and sub-types, ensuring consistency and accuracy in media categorization for reporting or further analysis.
Walking Through the Code
- Initialization:
- The script begins by defining temporary columns for media type and sub-type with default values set to ‘Unknown Media Type’ and ‘Unknown Media sub-type’.
- These temporary columns are initialized with
numpy.nan
to prepare for data processing.
- Data Processing:
- The script uses conditional logic to update the temporary media type and sub-type columns based on specific keywords found in the ‘Account’, ‘Campaign’, ‘Group’, and ‘Publisher Name’ columns.
- For example, if the ‘Account’ column contains ‘Taboola’, the media type and sub-type are set to ‘Native’.
- Similar logic is applied for other platforms like ‘TikTok’, ‘Snapchat’, ‘Pinterest’, ‘Facebook’, ‘BING’, ‘Google’, and various other media categories.
- Output Preparation:
- After processing, the script copies the newly assigned media types and sub-types from the temporary columns to the output DataFrame.
- This ensures that the output reflects the updated media categorizations for further use or analysis.
- User Changeable Parameters:
- Users can modify the criteria for media type and sub-type assignment by changing the keywords and conditions within the
loc
method calls. - This allows customization based on different datasets or business requirements.
- Users can modify the criteria for media type and sub-type assignment by changing the keywords and conditions within the
Vitals
- Script ID : 345
- Client ID / Customer ID: 1306925581 / 60269545
- Action Type: Bulk Upload
- Item Changed: AdGroup
- Output Columns: Account, Campaign, Group, Media sub-type, Media Type
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-10-12 15:29
- Last Updated by Anton Antonov on 2024-09-09 15:07
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MEDIA_SUBTYPE = 'Media sub-type'
RPT_COL_MEDIA_TYPE = 'Media Type'
RPT_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_ACCOUNT_PUBLISHER = 'Publisher'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
BULK_COL_ACCOUNT_PUBLISHER = 'Publisher'
#outputDf[BULK_COL_MARKET] = "<<YOUR VALUE>>"
TMP_MediaType = 'Unknown Media Type'
TMP_SubType = 'Unknown Media sub-type'
# blank out tmp field
inputDf[TMP_MediaType] = numpy.nan
inputDf[TMP_SubType] = numpy.nan
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
print(tableize(inputDf))
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Taboola', na=False)) , TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Taboola', na=False)) , TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('TikTok', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('TikTok', na=False)) , TMP_SubType ] = 'TikTok'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Snapchat', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Snapchat', na=False)) , TMP_SubType ] = 'Snapchat'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Pinterest', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Pinterest', na=False)) , TMP_SubType ] = 'Pinterest'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_SubType ] = 'Meta'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_SubType ] = 'SEM-Bing'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_SubType ] = 'SEM-Google'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_MediaType ] = 'Display & Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_SubType ] = 'Demand Gen'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_MediaType ] = 'Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('Bumper', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('QX.se', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('QX.se', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_SubType ] = 'Email-Standalone'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_MediaType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_SubType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_MediaType ] = 'Quiz'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_SubType ] = 'Quiz-Primetime'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Standard display\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Standard display\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Affiliate', na=False)), TMP_MediaType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate]', na=False)), TMP_SubType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-cashback]', na=False)), TMP_SubType ] = 'Affiliate-cashback'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-campaign]', na=False)), TMP_SubType ] = 'Affiliate-campaign'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-partner]', na=False)), TMP_SubType ] = 'Affiliate-partner'
print(tableize(inputDf))
#print(inputDf.index.duplicated())
# copy new strategy to output
outputDf.loc[:,BULK_COL_MEDIA_TYPE] = inputDf.loc[:, TMP_MediaType]
outputDf.loc[:,BULK_COL_MEDIA_SUBTYPE] = inputDf.loc[:, TMP_SubType]
#outputDf = outputDf[inputDf[TMP_MediaType].notnull() & inputDf[TMP_SubType].notnull() & ~inputDf[BULK_COL_CAMPAIGN].str.contains('"')]
print(tableize(outputDf))
Post generated on 2025-03-11 01:25:51 GMT