Script 341: MediaType Autotagging
Purpose:
The Python script automates the tagging of media types and subtypes based on specific account and campaign criteria in a dataset.
To Elaborate
The script is designed to automate the process of categorizing media data by assigning specific media types and subtypes to entries in a dataset. This is achieved by examining the content of certain columns, such as account names and campaign details, and applying predefined rules to determine the appropriate tags. The script is particularly useful for marketing and advertising data, where different platforms and campaign types need to be consistently categorized for reporting and analysis purposes. By automating this tagging process, the script helps ensure accuracy and efficiency in data handling, reducing the potential for human error and saving time in data preparation tasks.
Walking Through the Code
- Initialization and Setup:
- The script begins by defining temporary column names for media type and subtype, initializing them with ‘Unknown’ values.
- It then clears these temporary fields in the input DataFrame by setting them to
NaN
.
- Tagging Logic:
- The core of the script involves a series of conditional statements that check for specific keywords within the ‘Account’ and ‘Campaign’ columns.
- For each condition met, the script assigns a corresponding media type and subtype to the temporary columns. For example, if ‘Taboola’ is found in the ‘Account’ column, the media type and subtype are set to ‘Native’.
- This logic is repeated for various platforms and campaign identifiers, such as ‘TikTok’, ‘Snapchat’, ‘Google’, and ‘Adform’, among others.
- Output Preparation:
- After processing the input data, the script copies the newly assigned media types and subtypes from the temporary columns to the output DataFrame.
- The final output is prepared by ensuring that only entries with non-null media type and subtype are included, ready for further analysis or reporting.
Vitals
- Script ID : 341
- Client ID / Customer ID: 1306925575 / 60269545
- Action Type: Bulk Upload
- Item Changed: AdGroup
- Output Columns: Account, Campaign, Group, Media sub-type, Media Type
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-10-12 09:26
- Last Updated by Anton Antonov on 2024-09-09 14:20
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MEDIA_SUBTYPE = 'Media sub-type'
RPT_COL_MEDIA_TYPE = 'Media Type'
RPT_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_ACCOUNT_PUBLISHER = 'Publisher'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
BULK_COL_ACCOUNT_PUBLISHER = 'Publisher'
#outputDf[BULK_COL_MARKET] = "<<YOUR VALUE>>"
TMP_MediaType = 'Unknown Media Type'
TMP_SubType = 'Unknown Media sub-type'
# blank out tmp field
inputDf[TMP_MediaType] = numpy.nan
inputDf[TMP_SubType] = numpy.nan
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
print(tableize(inputDf))
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Taboola', na=False)) , TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Taboola', na=False)) , TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('TikTok', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('TikTok', na=False)) , TMP_SubType ] = 'TikTok'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Snapchat', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Snapchat', na=False)) , TMP_SubType ] = 'Snapchat'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Pinterest', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Pinterest', na=False)) , TMP_SubType ] = 'Pinterest'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_SubType ] = 'Meta'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_SubType ] = 'SEM-Bing'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_SubType ] = 'SEM-Google'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_MediaType ] = 'Display & Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_SubType ] = 'Demand Gen'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_MediaType ] = 'Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('Bumper', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('QX.se', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('QX.se', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_SubType ] = 'Email-Standalone'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_MediaType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_SubType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_MediaType ] = 'Quiz'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_SubType ] = 'Quiz-Primetime'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Standard display\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Standard display\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Affiliate', na=False)), TMP_MediaType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate]', na=False)), TMP_SubType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-cashback]', na=False)), TMP_SubType ] = 'Affiliate-cashback'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-campaign]', na=False)), TMP_SubType ] = 'Affiliate-campaign'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-partner]', na=False)), TMP_SubType ] = 'Affiliate-partner'
print(tableize(inputDf))
#print(inputDf.index.duplicated())
# copy new strategy to output
outputDf.loc[:,BULK_COL_MEDIA_TYPE] = inputDf.loc[:, TMP_MediaType]
outputDf.loc[:,BULK_COL_MEDIA_SUBTYPE] = inputDf.loc[:, TMP_SubType]
#outputDf = outputDf[inputDf[TMP_MediaType].notnull() & inputDf[TMP_SubType].notnull() & ~inputDf[BULK_COL_CAMPAIGN].str.contains('"')]
print(tableize(outputDf))
Post generated on 2025-03-11 01:25:51 GMT