Script 1301: Extract Studio and Feed Category and Language Target
Purpose:
The Python script processes a DataFrame to extract and categorize information about campaigns, including studio names, feed categories, and language targets.
To Elaborate
The script is designed to process a DataFrame containing campaign data, extracting specific information to populate a new DataFrame with structured details. It focuses on identifying and categorizing elements such as the studio name, feed category, and language target from the campaign names. The script applies a set of rules to determine these categories based on patterns found in the campaign names, such as identifying whether a campaign is related to a movie or TV show. Additionally, it extracts language targets and assigns a constant value to a ‘c-check’ column. This structured approach helps in organizing campaign data for further analysis or reporting.
Walking Through the Code
- Initialization:
- The script begins by defining the primary data source,
inputDf
, which is a DataFrame containing campaign data. - It initializes an empty DataFrame,
df_out
, with specific columns to store the processed data.
- The script begins by defining the primary data source,
- Processing Each Row:
- The script iterates over each row in the
inputDf
. - For each campaign, it extracts the studio name by splitting the campaign string at hyphens and taking the first part.
- It determines the feed category by checking for specific substrings in the campaign name, such as “- Movie -“ or “- TV Show -“, and assigns a category accordingly.
- It extracts the language target by taking the first two characters after the last hyphen in the campaign name.
- A constant value “YES” is assigned to the ‘c-check’ column for each row.
- The script iterates over each row in the
- Constructing and Appending Rows:
- A new row is constructed as a dictionary with the extracted and processed information.
- This new row is appended to the
df_out
DataFrame.
- Output:
- The processed DataFrame,
df_out
, is returned as the output of the function.
- The processed DataFrame,
Vitals
- Script ID : 1301
- Client ID / Customer ID: 1306912147 / 69058
- Action Type: Bulk Upload
- Item Changed: Campaign
- Output Columns: Account, Campaign, studio_name, Feed Category, Language Target, c-check
- Linked Datasource: M1 Report
- Reference Datasource: None
- Owner: Jeremy Brown (jbrown@marinsoftware.com)
- Created by Jeremy Brown on 2024-07-29 16:51
- Last Updated by Jeremy Brown on 2024-07-30 12:48
> See it in Action
Python Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
##
## name: SCRIPT REPORT: Extract Studio and Feed Category and Language Target
## description:
##
## author: Jeremy Brown
## created: 2024-07-29
##
today = datetime.datetime.now(CLIENT_TIMEZONE).date()
# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_STUDIO_NAME = 'studio_name'
RPT_COL_FEED_CATEGORY = 'Feed Category'
RPT_COL_LANGUAGE_TARGET = 'Language Target'
RPT_COL_CAMPAIGN_LANGUAGE = 'Campaign Language'
RPT_COL_CCHECK = 'c-check'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_IMPR = 'Impr.'
def process(inputDf):
"""
Process the input DataFrame to populate output DataFrame with specific columns
and create new 'studio_name', 'Feed Category', 'Language Target', 'Campaign Language' and 'c-check'
based on the 'Campaign'.
"""
# Initialize an empty DataFrame for output with required columns
df_out = pd.DataFrame(columns=["Campaign", "Account", "Feed Category", "studio_name", "Language Target", "Campaign Language", "c-check"])
# Iterate through each row in the input DataFrame
for idx, row in inputDf.iterrows():
campaign = row["Campaign"]
# Task 1: Extract the studio name
studio_name = campaign.split('-')[0].strip()
# Task 2: Determine the Feed Category
if "- Movie -" in campaign:
feed_category = "Movie"
elif "- TV Show - Seasons -" in campaign:
feed_category = "Seasons"
elif "- TV Show -" in campaign:
feed_category = "TV Show"
else:
feed_category = "" # Default value if none of the conditions match
# Task 3: Extract the first 2 characters after the last hyphen
language_target = campaign.split('-')[-1].strip()[:2]
# Task 4: Insert "YES" into 'c-check'
c_check = "YES"
# Construct the new row as a dictionary
new_row = {
"Campaign": row["Campaign"],
"Account": row["Account"],
"Feed Category": feed_category,
"studio_name": studio_name,
"Language Target": language_target,
"Campaign Language": language_target,
"c-check": c_check
}
# Append the new row to the output DataFrame
df_out = pd.concat([df_out, pd.DataFrame([new_row])], ignore_index=True)
# Print the data changed for debugging purposes
print("Data changed:")
print(df_out)
return df_out
# Trigger the main process
outputDf = process(inputDf)
Post generated on 2025-03-11 01:25:51 GMT