Script 1301: Extract Studio and Feed Category and Language Target

Purpose:

The Python script processes a DataFrame to extract and categorize information about campaigns, including studio names, feed categories, and language targets.

To Elaborate

The script is designed to process a DataFrame containing campaign data, extracting specific information to populate a new DataFrame with structured details. It focuses on identifying and categorizing elements such as the studio name, feed category, and language target from the campaign names. The script applies a set of rules to determine these categories based on patterns found in the campaign names, such as identifying whether a campaign is related to a movie or TV show. Additionally, it extracts language targets and assigns a constant value to a ‘c-check’ column. This structured approach helps in organizing campaign data for further analysis or reporting.

Walking Through the Code

  1. Initialization:
    • The script begins by defining the primary data source, inputDf, which is a DataFrame containing campaign data.
    • It initializes an empty DataFrame, df_out, with specific columns to store the processed data.
  2. Processing Each Row:
    • The script iterates over each row in the inputDf.
    • For each campaign, it extracts the studio name by splitting the campaign string at hyphens and taking the first part.
    • It determines the feed category by checking for specific substrings in the campaign name, such as “- Movie -“ or “- TV Show -“, and assigns a category accordingly.
    • It extracts the language target by taking the first two characters after the last hyphen in the campaign name.
    • A constant value “YES” is assigned to the ‘c-check’ column for each row.
  3. Constructing and Appending Rows:
    • A new row is constructed as a dictionary with the extracted and processed information.
    • This new row is appended to the df_out DataFrame.
  4. Output:
    • The processed DataFrame, df_out, is returned as the output of the function.

Vitals

  • Script ID : 1301
  • Client ID / Customer ID: 1306912147 / 69058
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, studio_name, Feed Category, Language Target, c-check
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jeremy Brown (jbrown@marinsoftware.com)
  • Created by Jeremy Brown on 2024-07-29 16:51
  • Last Updated by Jeremy Brown on 2024-07-30 12:48
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
##
## name: SCRIPT REPORT: Extract Studio and Feed Category and Language Target
## description:
## 
## author: Jeremy Brown
## created: 2024-07-29
## 

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_STUDIO_NAME = 'studio_name'
RPT_COL_FEED_CATEGORY = 'Feed Category'
RPT_COL_LANGUAGE_TARGET = 'Language Target'
RPT_COL_CAMPAIGN_LANGUAGE = 'Campaign Language'
RPT_COL_CCHECK = 'c-check'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_IMPR = 'Impr.'

def process(inputDf):
    """
    Process the input DataFrame to populate output DataFrame with specific columns
    and create new 'studio_name', 'Feed Category', 'Language Target', 'Campaign Language' and 'c-check'
    based on the 'Campaign'.
    """
    # Initialize an empty DataFrame for output with required columns
    df_out = pd.DataFrame(columns=["Campaign", "Account", "Feed Category", "studio_name", "Language Target", "Campaign Language", "c-check"])
    
    # Iterate through each row in the input DataFrame
    for idx, row in inputDf.iterrows():
        campaign = row["Campaign"]
        
        # Task 1: Extract the studio name
        studio_name = campaign.split('-')[0].strip()
        
        # Task 2: Determine the Feed Category
        if "- Movie -" in campaign:
            feed_category = "Movie"
        elif "- TV Show - Seasons -" in campaign:
            feed_category = "Seasons"
        elif "- TV Show -" in campaign:
            feed_category = "TV Show"
        else:
            feed_category = ""  # Default value if none of the conditions match
        
        # Task 3: Extract the first 2 characters after the last hyphen
        language_target = campaign.split('-')[-1].strip()[:2]
        
        # Task 4: Insert "YES" into 'c-check'
        c_check = "YES"
        
        # Construct the new row as a dictionary
        new_row = {
            "Campaign": row["Campaign"],
            "Account": row["Account"],
            "Feed Category": feed_category,
            "studio_name": studio_name,
            "Language Target": language_target,
            "Campaign Language": language_target,
            "c-check": c_check
        }
        
        # Append the new row to the output DataFrame
        df_out = pd.concat([df_out, pd.DataFrame([new_row])], ignore_index=True)
    
    # Print the data changed for debugging purposes
    print("Data changed:")
    print(df_out)
    
    return df_out

# Trigger the main process
outputDf = process(inputDf)

Post generated on 2025-03-11 01:25:51 GMT

comments powered by Disqus