Script 347: mediatype tagging
Purpose:
The Python script categorizes media types and sub-types based on specific keywords found in account, campaign, and group data.
To Elaborate
The script is designed to automate the classification of media types and sub-types for marketing data. It processes input data to identify specific keywords within account, campaign, and group columns, and assigns corresponding media types and sub-types. This categorization helps in organizing and analyzing marketing data more effectively, ensuring that each entry is tagged with the correct media classification. The script is particularly useful for marketing teams who need to manage large datasets and require a consistent method for tagging media types across various platforms and campaigns.
Walking Through the Code
- Initialization:
- The script begins by defining temporary columns for media type and sub-type with default values set to ‘Unknown Media Type’ and ‘Unknown Media sub-type’.
- These temporary columns are initialized with
numpy.nan
to prepare for data processing.
- Data Processing:
- The script uses a series of conditional statements to check for specific keywords in the ‘Account’, ‘Campaign’, and ‘Group’ columns.
- For each condition met, it assigns a predefined media type and sub-type to the corresponding temporary columns.
- This involves checking for platform names like ‘Taboola’, ‘TikTok’, ‘Snapchat’, and others, and assigning categories such as ‘Native’, ‘Social’, ‘SEM’, etc.
- Output Preparation:
- After processing, the script copies the categorized media types and sub-types from the temporary columns to the output DataFrame.
- This ensures that the output data is ready for further analysis or reporting, with each entry correctly tagged according to the predefined rules.
Vitals
- Script ID : 347
- Client ID / Customer ID: 1306925577 / 60269545
- Action Type: Bulk Upload
- Item Changed: AdGroup
- Output Columns: Account, Campaign, Group, Media sub-type, Media Type
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-10-12 15:45
- Last Updated by Anton Antonov on 2024-09-09 15:11
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MEDIA_SUBTYPE = 'Media sub-type'
RPT_COL_MEDIA_TYPE = 'Media Type'
RPT_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_ACCOUNT_PUBLISHER = 'Publisher'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
BULK_COL_ACCOUNT_PUBLISHER = 'Publisher'
#outputDf[BULK_COL_MARKET] = "<<YOUR VALUE>>"
TMP_MediaType = 'Unknown Media Type'
TMP_SubType = 'Unknown Media sub-type'
# blank out tmp field
inputDf[TMP_MediaType] = numpy.nan
inputDf[TMP_SubType] = numpy.nan
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
print(tableize(inputDf))
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Taboola', na=False)) , TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Taboola', na=False)) , TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('TikTok', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('TikTok', na=False)) , TMP_SubType ] = 'TikTok'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Snapchat', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Snapchat', na=False)) , TMP_SubType ] = 'Snapchat'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Pinterest', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Pinterest', na=False)) , TMP_SubType ] = 'Pinterest'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_SubType ] = 'Meta'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_SubType ] = 'SEM-Bing'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_SubType ] = 'SEM-Google'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_MediaType ] = 'Display & Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_SubType ] = 'Demand Gen'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_MediaType ] = 'Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('Bumper', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('QX.se', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('QX.se', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_SubType ] = 'Email-Standalone'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_MediaType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_SubType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_MediaType ] = 'Quiz'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_SubType ] = 'Quiz-Primetime'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Standard display\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Standard display\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Affiliate', na=False)), TMP_MediaType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate]', na=False)), TMP_SubType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-cashback]', na=False)), TMP_SubType ] = 'Affiliate-cashback'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-campaign]', na=False)), TMP_SubType ] = 'Affiliate-campaign'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-partner]', na=False)), TMP_SubType ] = 'Affiliate-partner'
print(tableize(inputDf))
#print(inputDf.index.duplicated())
# copy new strategy to output
outputDf.loc[:,BULK_COL_MEDIA_TYPE] = inputDf.loc[:, TMP_MediaType]
outputDf.loc[:,BULK_COL_MEDIA_SUBTYPE] = inputDf.loc[:, TMP_SubType]
#outputDf = outputDf[inputDf[TMP_MediaType].notnull() & inputDf[TMP_SubType].notnull() & ~inputDf[BULK_COL_CAMPAIGN].str.contains('"')]
print(tableize(outputDf))
Post generated on 2025-03-11 01:25:51 GMT