Script 1301: Extract Studio and Feed Category and Language Target

Purpose

The Python script processes a DataFrame to extract and categorize information about campaigns, such as studio names, feed categories, and language targets.

To Elaborate

The Python script is designed to process a DataFrame containing campaign data, extracting specific information and categorizing it for further analysis. The script focuses on identifying and organizing details such as the studio name, feed category, and language target from the campaign names. It also assigns a default value to a column named ‘c-check’. The primary goal is to transform the input data into a structured format that highlights key attributes of each campaign, facilitating easier analysis and reporting.

Walking Through the Code

  1. Initialization: The script begins by defining the primary data source, which is a DataFrame named inputDf. It also initializes an empty DataFrame df_out with specific columns to store the processed data.

  2. Processing Each Row: The script iterates over each row in the input DataFrame. For each campaign, it extracts the studio name by splitting the campaign string at hyphens and taking the first segment.

  3. Determining Feed Category: The script checks for specific keywords in the campaign name to determine the feed category. It assigns “Movie”, “Seasons”, or “TV Show” based on the presence of these keywords, defaulting to an empty string if none match.

  4. Extracting Language Target: The script extracts the language target by taking the first two characters after the last hyphen in the campaign name.

  5. Assigning Default Values: It assigns “YES” to the ‘c-check’ column for each campaign.

  6. Constructing and Appending Rows: A new row is constructed with the extracted and determined values, then appended to the output DataFrame df_out.

  7. Output: The processed DataFrame is returned, containing the structured campaign information.

Vitals

  • Script ID : 1301
  • Client ID / Customer ID: 1306912147 / 69058
  • Action Type: Bulk Upload
  • Item Changed: Campaign
  • Output Columns: Account, Campaign, studio_name, Feed Category, Language Target, c-check
  • Linked Datasource: M1 Report
  • Reference Datasource: None
  • Owner: Jeremy Brown (jbrown@marinsoftware.com)
  • Created by Jeremy Brown on 2024-07-29 16:51
  • Last Updated by Jeremy Brown on 2024-07-30 12:48
> See it in Action

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
##
## name: SCRIPT REPORT: Extract Studio and Feed Category and Language Target
## description:
## 
## author: Jeremy Brown
## created: 2024-07-29
## 

today = datetime.datetime.now(CLIENT_TIMEZONE).date()

# primary data source and columns
inputDf = dataSourceDict["1"]
RPT_COL_CAMPAIGN = 'Campaign'
RPT_COL_ACCOUNT = 'Account'
RPT_COL_STUDIO_NAME = 'studio_name'
RPT_COL_FEED_CATEGORY = 'Feed Category'
RPT_COL_LANGUAGE_TARGET = 'Language Target'
RPT_COL_CAMPAIGN_LANGUAGE = 'Campaign Language'
RPT_COL_CCHECK = 'c-check'
RPT_COL_CAMPAIGN_STATUS = 'Campaign Status'
RPT_COL_IMPR = 'Impr.'

def process(inputDf):
    """
    Process the input DataFrame to populate output DataFrame with specific columns
    and create new 'studio_name', 'Feed Category', 'Language Target', 'Campaign Language' and 'c-check'
    based on the 'Campaign'.
    """
    # Initialize an empty DataFrame for output with required columns
    df_out = pd.DataFrame(columns=["Campaign", "Account", "Feed Category", "studio_name", "Language Target", "Campaign Language", "c-check"])
    
    # Iterate through each row in the input DataFrame
    for idx, row in inputDf.iterrows():
        campaign = row["Campaign"]
        
        # Task 1: Extract the studio name
        studio_name = campaign.split('-')[0].strip()
        
        # Task 2: Determine the Feed Category
        if "- Movie -" in campaign:
            feed_category = "Movie"
        elif "- TV Show - Seasons -" in campaign:
            feed_category = "Seasons"
        elif "- TV Show -" in campaign:
            feed_category = "TV Show"
        else:
            feed_category = ""  # Default value if none of the conditions match
        
        # Task 3: Extract the first 2 characters after the last hyphen
        language_target = campaign.split('-')[-1].strip()[:2]
        
        # Task 4: Insert "YES" into 'c-check'
        c_check = "YES"
        
        # Construct the new row as a dictionary
        new_row = {
            "Campaign": row["Campaign"],
            "Account": row["Account"],
            "Feed Category": feed_category,
            "studio_name": studio_name,
            "Language Target": language_target,
            "Campaign Language": language_target,
            "c-check": c_check
        }
        
        # Append the new row to the output DataFrame
        df_out = pd.concat([df_out, pd.DataFrame([new_row])], ignore_index=True)
    
    # Print the data changed for debugging purposes
    print("Data changed:")
    print(df_out)
    
    return df_out

# Trigger the main process
outputDf = process(inputDf)

Post generated on 2024-11-27 06:58:46 GMT

comments powered by Disqus