Script 341: MediaType Autotagging

Purpose

Python script for media type autotagging.

To Elaborate

This Python script solves the problem of automatically assigning media types and subtypes to a dataset based on specific business rules. The script takes an input dataframe and adds two new columns, TMP_MediaType and TMP_SubType, which will contain the assigned media type and subtype values respectively. The script then iterates through the input dataframe and assigns the appropriate media type and subtype values based on the values in the RPT_COL_ACCOUNT_PUBLISHER_NAME, RPT_COL_ACCOUNT, and RPT_COL_GROUP columns. The assigned values are based on specific conditions and rules defined in the script. Finally, the script copies the assigned media type and subtype values to the BULK_COL_MEDIA_TYPE and BULK_COL_MEDIA_SUBTYPE columns in the output dataframe.

Walking Through the Code

  1. Define column constants for the input and output dataframes.
  2. Create two temporary variables, TMP_MediaType and TMP_SubType, and assign them initial values of ‘Unknown Media Type’ and ‘Unknown Media sub-type’ respectively.
  3. Add the TMP_MediaType and TMP_SubType columns to the input dataframe and set their values to NaN.
  4. Get the current date and assign it to the today variable.
  5. Print the input dataframe using the tableize function.
  6. Use the loc function to assign the media type and subtype values ‘Native’ to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘Taboola’.
  7. Use the loc function to assign the media type and subtype values ‘Social’ and ‘TikTok’ respectively to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘TikTok’.
  8. Use the loc function to assign the media type and subtype values ‘Social’ and ‘Meta’ respectively to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘Facebook’.
  9. Use the loc function to assign the media type and subtype values ‘SEM’ and ‘SEM-Bing’ respectively to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘BING’.
  10. Use the loc function to assign the media type and subtype values ‘SEM’ and ‘SEM-Google’ respectively to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘Google’.
  11. Use the loc function to assign the media type and subtype values ‘Display’ and ‘Standard display’ respectively to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘Google’ and the RPT_COL_CAMPAIGN column contains ‘[GDN]’.
  12. Use the loc function to assign the media type and subtype values ‘Display & Online video’ and ‘Demand Gen’ respectively to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘Google’ and the RPT_COL_CAMPAIGN column contains ‘[Demand Gen]’.
  13. Use the loc function to assign the media type and subtype values ‘Online video’ and ‘VIDEO-Instream’ respectively to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘Google’ and the RPT_COL_CAMPAIGN column contains ‘[YT]’.
  14. Use the loc function to assign the media subtype value ‘VIDEO-Bumper’ to rows where the RPT_COL_ACCOUNT_PUBLISHER_NAME column contains ‘Google’, the RPT_COL_CAMPAIGN column contains ‘[YT]’, and the RPT_COL_CAMPAIGN column contains ‘Bumper’.
  15. Repeat steps 11-14 for different media types and subtypes based on specific conditions and rules.
  16. Print the input dataframe using the tableize function.
  17. Copy the media type and subtype values from the input dataframe to the output dataframe.
  18. Print the output dataframe using the tableize function.

Vitals

  • Script ID : 341
  • Client ID / Customer ID: 1306925575 / 60269545
  • Action Type: Bulk Upload
  • Item Changed: AdGroup
  • Output Columns: Account, Campaign, Group, Media sub-type, Media Type
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Tom McCaughey (tmccaughey@marinsoftware.com)
  • Created by Tom McCaughey on 2023-10-12 09:26
  • Last Updated by Tom McCaughey on 2024-04-16 10:38
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
RPT_COL_ACCOUNT = 'Account'
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_GROUP = 'Group'
RPT_COL_MEDIA_SUBTYPE = 'Media sub-type'
RPT_COL_MEDIA_TYPE = 'Media Type'
RPT_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
RPT_COL_ACCOUNT_PUBLISHER = 'Publisher'
BULK_COL_ACCOUNT = 'Account'
BULK_COL_CAMPAIGN = 'Campaign'
BULK_COL_GROUP = 'Group'
BULK_COL_MEDIA_SUBTYPE = 'Media sub-type'
BULK_COL_MEDIA_TYPE = 'Media Type'
BULK_COL_ACCOUNT_PUBLISHER_NAME = 'Publisher Name'
BULK_COL_ACCOUNT_PUBLISHER = 'Publisher'

#outputDf[BULK_COL_MARKET] = "<<YOUR VALUE>>"

TMP_MediaType = 'Unknown Media Type'
TMP_SubType = 'Unknown Media sub-type'
# blank out tmp field
inputDf[TMP_MediaType] = numpy.nan
inputDf[TMP_SubType] = numpy.nan

today = datetime.datetime.now(CLIENT_TIMEZONE).date()
print(tableize(inputDf))


inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Taboola', na=False)) , TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Taboola', na=False)) , TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('TikTok', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('TikTok', na=False)) , TMP_SubType ] = 'TikTok'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_MediaType ] = 'Social'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Facebook', na=False)) , TMP_SubType ] = 'Meta'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('BING', na=False)) , TMP_SubType ] = 'SEM-Bing'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_MediaType ] = 'SEM'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) , TMP_SubType ] = 'SEM-Google'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[GDN\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_MediaType ] = 'Display & Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[Demand Gen\]', na=False)), TMP_SubType ] = 'Demand Gen'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_MediaType ] = 'Online video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT_PUBLISHER_NAME].str.contains('Google', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('\[YT\]', na=False)) & (inputDf[RPT_COL_CAMPAIGN].str.contains('Bumper', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Adform', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_MediaType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('OOH', na=False)), TMP_SubType ] = 'OOH'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_MediaType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Radio', na=False)), TMP_SubType ] = 'Radio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_MediaType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('TV', na=False)), TMP_SubType ] = 'TV'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_MediaType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Print', na=False)), TMP_SubType ] = 'Print'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY\]', na=False)), TMP_SubType ] = 'Standard display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_MediaType ] = 'Display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[DISPLAY-Premium\]', na=False)), TMP_SubType ] = 'Premium display'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_MediaType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Native\]', na=False)), TMP_SubType ] = 'Native'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_MediaType ] = 'Online Audio'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[AUDIO - Podcast\]', na=False)), TMP_SubType ] = 'Audio-Podcast'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Instream\]', na=False)), TMP_SubType ] = 'VIDEO-Instream'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_MediaType ] = 'Online Video'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Offline', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[VIDEO-Bumper\]', na=False)), TMP_SubType ] = 'VIDEO-Bumper'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Newsletter\]', na=False)), TMP_SubType ] = 'Email-Newsletter'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_MediaType ] = 'Email'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Email-Standalone\]', na=False)), TMP_SubType ] = 'Email-Standalone'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_MediaType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[SMS\]', na=False)), TMP_SubType ] = 'SMS'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_MediaType ] = 'Quiz'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Quiz\]', na=False)), TMP_SubType ] = 'Quiz-Primetime'

inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('Affiliate', na=False)), TMP_MediaType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate]', na=False)), TMP_SubType ] = 'Affiliate'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-cashback]', na=False)), TMP_SubType ] = 'Affiliate-cashback'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-campaign]', na=False)), TMP_SubType ] = 'Affiliate-campaign'
inputDf.loc[ (inputDf[RPT_COL_ACCOUNT].str.contains('Email', na=False)) & (inputDf[RPT_COL_GROUP].str.contains('\[Affiliate-partner]', na=False)), TMP_SubType ] = 'Affiliate-partner'

print(tableize(inputDf))

#print(inputDf.index.duplicated())

# copy new strategy to output
outputDf.loc[:,BULK_COL_MEDIA_TYPE] = inputDf.loc[:, TMP_MediaType]
outputDf.loc[:,BULK_COL_MEDIA_SUBTYPE] = inputDf.loc[:, TMP_SubType]

#outputDf = outputDf[inputDf[TMP_MediaType].notnull() & inputDf[TMP_SubType].notnull() & ~inputDf[BULK_COL_CAMPAIGN].str.contains('"')]
print(tableize(outputDf))

Post generated on 2024-05-15 07:44:05 GMT

comments powered by Disqus