Script 341: MediaType Autotagging
Purpose
Python script for media type autotagging.
To Elaborate
This Python script solves the problem of automatically assigning media types and subtypes to a dataset based on specific business rules. The script takes an input dataframe and adds two new columns, TMP_MediaType
and TMP_SubType
, which will contain the assigned media type and subtype values respectively. The script then iterates through the input dataframe and assigns the appropriate media type and subtype values based on the values in the RPT_COL_ACCOUNT_PUBLISHER_NAME
, RPT_COL_ACCOUNT
, and RPT_COL_GROUP
columns. The assigned values are based on specific conditions and rules defined in the script. Finally, the script copies the assigned media type and subtype values to the BULK_COL_MEDIA_TYPE
and BULK_COL_MEDIA_SUBTYPE
columns in the output dataframe.
Walking Through the Code
- Define column constants for the input and output dataframes.
- Create two temporary variables,
TMP_MediaType
andTMP_SubType
, and assign them initial values of ‘Unknown Media Type’ and ‘Unknown Media sub-type’ respectively. - Add the
TMP_MediaType
andTMP_SubType
columns to the input dataframe and set their values to NaN. - Get the current date and assign it to the
today
variable. - Print the input dataframe using the
tableize
function. - Use the
loc
function to assign the media type and subtype values ‘Native’ to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘Taboola’. - Use the
loc
function to assign the media type and subtype values ‘Social’ and ‘TikTok’ respectively to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘TikTok’. - Use the
loc
function to assign the media type and subtype values ‘Social’ and ‘Meta’ respectively to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘Facebook’. - Use the
loc
function to assign the media type and subtype values ‘SEM’ and ‘SEM-Bing’ respectively to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘BING’. - Use the
loc
function to assign the media type and subtype values ‘SEM’ and ‘SEM-Google’ respectively to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘Google’. - Use the
loc
function to assign the media type and subtype values ‘Display’ and ‘Standard display’ respectively to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘Google’ and theRPT_COL_CAMPAIGN
column contains ‘[GDN]’. - Use the
loc
function to assign the media type and subtype values ‘Display & Online video’ and ‘Demand Gen’ respectively to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘Google’ and theRPT_COL_CAMPAIGN
column contains ‘[Demand Gen]’. - Use the
loc
function to assign the media type and subtype values ‘Online video’ and ‘VIDEO-Instream’ respectively to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘Google’ and theRPT_COL_CAMPAIGN
column contains ‘[YT]’. - Use the
loc
function to assign the media subtype value ‘VIDEO-Bumper’ to rows where theRPT_COL_ACCOUNT_PUBLISHER_NAME
column contains ‘Google’, theRPT_COL_CAMPAIGN
column contains ‘[YT]’, and theRPT_COL_CAMPAIGN
column contains ‘Bumper’. - Repeat steps 11-14 for different media types and subtypes based on specific conditions and rules.
- Print the input dataframe using the
tableize
function. - Copy the media type and subtype values from the input dataframe to the output dataframe.
- Print the output dataframe using the
tableize
function.
Vitals
- Script ID : 341
- Client ID / Customer ID: 1306925575 / 60269545
- Action Type: Bulk Upload
- Item Changed: AdGroup
- Output Columns: Account, Campaign, Group, Media sub-type, Media Type
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
- Created by Tom McCaughey on 2023-10-12 09:26
- Last Updated by Tom McCaughey on 2024-04-16 10:38
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MEDIA_SUBTYPE = 'Media sub-type'
RPT_COL_MEDIA_TYPE = 'Media Type'
RPT_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_ACCOUNT_PUBLISHER = 'Publisher'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
BULK_COL_ACCOUNT_PUBLISHER = 'Publisher'
#outputDf[BULK_COL_MARKET] = "<<YOUR VALUE>>"
TMP_MediaType = 'Unknown Media Type'
TMP_SubType = 'Unknown Media sub-type'
# blank out tmp field
inputDf[TMP_MediaType] = numpy.nan
inputDf[TMP_SubType] = numpy.nan
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
print(tableize(inputDf))
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Taboola', na=False)) , TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Taboola', na=False)) , TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('TikTok', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('TikTok', na=False)) , TMP_SubType ] = 'TikTok'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_SubType ] = 'Meta'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_SubType ] = 'SEM-Bing'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_SubType ] = 'SEM-Google'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_MediaType ] = 'Display & Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_SubType ] = 'Demand Gen'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_MediaType ] = 'Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('Bumper', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_SubType ] = 'Email-Standalone'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_MediaType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_SubType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_MediaType ] = 'Quiz'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_SubType ] = 'Quiz-Primetime'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Affiliate', na=False)), TMP_MediaType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate]', na=False)), TMP_SubType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-cashback]', na=False)), TMP_SubType ] = 'Affiliate-cashback'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-campaign]', na=False)), TMP_SubType ] = 'Affiliate-campaign'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-partner]', na=False)), TMP_SubType ] = 'Affiliate-partner'
print(tableize(inputDf))
#print(inputDf.index.duplicated())
# copy new strategy to output
outputDf.loc[:,BULK_COL_MEDIA_TYPE] = inputDf.loc[:, TMP_MediaType]
outputDf.loc[:,BULK_COL_MEDIA_SUBTYPE] = inputDf.loc[:, TMP_SubType]
#outputDf = outputDf[inputDf[TMP_MediaType].notnull() & inputDf[TMP_SubType].notnull() & ~inputDf[BULK_COL_CAMPAIGN].str.contains('"')]
print(tableize(outputDf))
Post generated on 2024-05-15 07:44:05 GMT