Chapter 9 Marketing Media Mix Models#

9.1 Introduction#

In today’s rapidly evolving marketing landscape, the ability to quantify the impact of each dollar spent across various channels is more crucial than ever. Media Mix Modeling (MMM) stands at the forefront of data-driven marketing strategies, offering a sophisticated analytical approach that dissects the performance of each marketing channel in isolation and in synergy. By employing statistical models that incorporate sales data, marketing expenditures, and external factors, MMM provides a comprehensive overview of marketing performance, allowing marketers to understand the effectiveness of their advertising expenditures.

The essence of MMM lies in its ability to not only retrospectively analyze past performance but also to predict how adjustments in the marketing mix can influence future outcomes. This dual capability makes it an invaluable tool for marketers aiming to optimize their strategies in alignment with business objectives, whether it’s increasing overall sales, enhancing brand awareness, or achieving a higher return on investment (ROI).

9.1.1 The Importance of MMM in the Modern Marketing Landscape#

In an era marked by the proliferation of digital channels and increasingly discerning consumers, MMM offers several advantages. It helps organizations navigate the complexity of the modern marketing ecosystem, where traditional media channels coexist with digital platforms, each offering varying levels of engagement and effectiveness. Moreover, MMM addresses the challenge of attribution, determining which channels are truly driving conversions and to what extent. This enables more effective reallocation of marketing budgets, away from underperforming channels and towards those with a proven impact on sales.

9.1.2 The Objective of MMM#

The primary objective of MMM is to decode the contribution of different marketing channels to sales and understand how these channels interact with one another. This involves dissecting the marketing budget to identify the most and least efficient areas of spending. By modeling the relationship between marketing spend and business outcomes, MMM allows for the simulation of different scenarios to forecast future performance. This forward-looking approach aids in budget optimization and strategic planning, enabling marketers to adjust their strategies in anticipation of changing market dynamics and consumer behaviors.

Ultimately, MMM serves as a guiding light for marketers, navigating them through the intricacies of planning and optimization in a multi-channel landscape. It stands as a testament to the power of data-driven decision-making, underscoring the shift from intuition-based to evidence-based marketing strategies. As we dive deeper into the nuances of MMM, we will explore its framework, provide a step-by-step guide on building a model using Python, and demonstrate how these insights can be leveraged to achieve marketing excellence.

9.2 The Basics of Media Mix Modeling#

In the dynamic world of marketing, understanding and optimizing the impact of various advertising efforts on sales and brand recognition is an ongoing challenge. Media Mix Modeling (MMM) emerges as a powerful tool for marketers aiming to decode the efficacy of each dollar spent across marketing channels. MMM is a statistical approach that quantifies the relationship between marketing inputs and their outcomes, usually sales or brand awareness, allowing marketers to forecast the impact of future marketing strategies.

9.2.1 What is Media Mix Modeling?#

Fundamentally, MMM utilizes regression analysis to gauge the additional effects of diverse marketing initiatives on sales or other critical performance metrics. Through the examination of past data, MMM offers perspectives on the contribution of various components of the marketing mix, such as TV and digital advertising, promotions, and pricing strategies, to the overall effectiveness. This analysis helps marketers understand the effectiveness of each channel and optimize their budgets accordingly.

9.2.2 The Role of MMM in Data-Driven Marketing Decisions#

In an era where data plays a crucial role, MMM stands out as a critical tool in the marketer’s arsenal. It provides a comprehensive view of marketing effectiveness, capturing the synergies and dynamics between different media channels. By incorporating external factors such as economic conditions, competitor actions, and seasonal trends, MMM offers a nuanced understanding of marketing performance. This holistic approach enables marketers to make informed decisions, allocate resources efficiently, and maximize return on investment (ROI).

9.2.3 Key Components of MMM#

Understanding the components that MMM analyzes is essential to grasp its functionality and utility:

  • Media Channels: MMM evaluates both traditional (e.g., TV, radio, print) and digital (e.g., social media, search engine marketing, email) advertising channels.

  • External Factors: These include any variables that might impact sales or marketing effectiveness outside of the company’s control, such as economic indicators, weather conditions, and competitive marketing activities.

  • Baseline Sales: This refers to the sales level that would be expected without any marketing activities. Understanding the baseline allows for the measurement of the incremental impact of marketing efforts.

9.2.4 Brief History and Evolution of MMM#

The roots of MMM trace back to the early days of econometric modeling in the 1960s, but it was the advent of more sophisticated computing power and data collection methods in the late 20th century that truly unlocked its potential. Originally focused on traditional media channels, the rise of digital marketing has significantly expanded MMM’s scope. Today, MMM must navigate a far more complex media landscape, integrating a multitude of digital data sources while addressing challenges like consumer privacy concerns and the diminishing effectiveness of cookies.

In conclusion, Media Mix Modeling is an indispensable tool for marketers seeking to navigate the complexities of the modern advertising landscape. By offering a detailed analysis of how various marketing channels and external factors impact sales, MMM empowers businesses to optimize their marketing spend, enhance their strategic planning, and ultimately drive greater ROI. As we delve deeper into this topic, we’ll explore the theoretical underpinnings of MMM and how it integrates with broader marketing and econometric theories to provide actionable insights.

9.3 Understanding the Framework of MMM#

Media Mix Modeling (MMM) operates at the intersection of marketing science and statistical analysis, offering a framework that quantifies the impact of marketing strategies on business outcomes. This section explores the theoretical foundations of MMM, its statistical underpinnings, and how it accounts for the complex interplay between various media channels and external factors.

9.3.1 Theoretical Foundation of MMM#

The theoretical underpinning of MMM is rooted in econometrics, a branch of economics that applies statistical methods to economic data to provide empirical content to economic relationships. MMM applies these principles to marketing, aiming to understand and quantify the relationships between marketing inputs (expenditures across various channels) and outputs (sales, brand awareness).

MMM is grounded in the theory of diminishing returns, a fundamental concept in economics that suggests that as investment in a particular area increases, the marginal gains from that investment will eventually decrease. In marketing, this implies that while initially, investments in a marketing channel might yield significant returns, over time, the effectiveness of additional spending will start to diminish.

9.3.2 Statistical Methods Used in MMM#

MMM primarily relies on regression analysis, a statistical technique for estimating the relationships among variables. The general idea is to use historical data to estimate how changes in the marketing mix and external factors affect sales or other dependent variables. The model might take a simple linear form or a more complex one, incorporating interactions between variables to reflect the synergistic effects between different marketing channels.

  • Linear Regression: The most basic form of MMM, where sales are predicted as a function of marketing spend in different channels. It assumes a direct, linear relationship between marketing spend and sales.

  • Multiple Regression: To capture the influence of multiple factors, MMM often uses multiple regression, which can account for various marketing channels and external factors simultaneously.

  • Bayesian Methods and Time Series Analysis: For more sophisticated MMMs, Bayesian methods can incorporate prior knowledge into the analysis, and time series analysis can account for the temporal dynamics and lag effects of marketing efforts.

9.3.3 Accounting for Diminishing Returns and Synergy#

A critical aspect of MMM is its ability to account for diminishing returns and the synergistic effects between different marketing channels. Diminishing returns are modeled through nonlinear functions or interaction terms that show how the impact of additional spending decreases over time. For synergy, MMM includes interaction terms between different media investments to capture how the combination of two or more channels can have a greater impact than the sum of their individual effects.

  • Diminishing Returns: Represented by a logarithmic function or a polynomial where, after a certain point, increases in advertising spend yield progressively smaller increases in sales.

  • Synergy: Interaction terms in the regression model represent synergy. For example, an interaction term between online and TV advertising might show that the combined effect of these channels is greater than the sum of their separate impacts.

By integrating these complex relationships, MMM provides a nuanced understanding of marketing effectiveness. This analytical framework helps marketers navigate the multifaceted media landscape, making informed decisions about where to allocate their budgets for maximum impact. In the next section, we’ll explore how to translate these theoretical and statistical considerations into practice by building a Media Mix Model in Python, step by step.

9.4 Building a Media Mix Model in Python: A Step-by-Step Guide#

In this section, we will walk through the process of building a basic Media Mix Model (MMM) using Python, leveraging the dataset provided in Meta’s Robyn R package. This dataset provides a simple yet illustrative example of how media expenditures across five different media channels (TV, OOH, print, facebook, and search) impact revenue.

9.4.1 Data Description#

The dataset we will be working with is the one used by Meta’s Robyn package. It comprises weekly media expenditures and revenue data spanning from November 23, 2015, to November 11, 2019, encompassing a total of 208 weeks. The variables included in the dataset are as follows:

date: The beginning date of each week

revenue: The business revenue generated during the week

tv_s: Television advertising expenditure for the week

ooh_s: Out-of-home (OOH) channel expenditure for the week

print_s: Print media advertising expenditure for the week

facebook_i: Facebook impressions count

search_clicks_p: Number of clicks through search results

search_s: Expenditure on paid search advertising

competitor_sales_b: Competitor’s sales figures

facebook_s: Facebook advertising expenditure

import os
os.chdir("marketing-analytics-with-python")
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import math

# compress the warning messages
import warnings
warnings.filterwarnings("ignore")

# Load the dataset
data = pd.read_csv(r"data/Chapter9_mmm.csv")
# trim the column names 
data.columns = data.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
# convert date column to datetime format
data['date'] = pd.to_datetime(data['date'])
print(data.shape)
data.head()
(208, 10)
date revenue tv_s ooh_s print_s facebook_i search_clicks_p search_s competitor_sales_b facebook_s
0 2015-11-23 2.754372e+06 167687.6 0 95463.666667 7.290385e+07 0.000000 0 8125009 228213.987444
1 2015-11-30 2.584277e+06 214600.9 0 0.000000 1.658110e+07 29511.715457 31000 7901549 34258.573511
2 2015-12-07 2.547387e+06 0.0 248022 3404.000000 4.995477e+07 36132.358958 28400 8300197 127691.261335
3 2015-12-14 2.875220e+06 625877.3 0 132600.000000 3.164930e+07 36804.210958 31900 8122883 84014.720306
4 2015-12-21 2.215953e+06 0.0 520005 0.000000 8.802269e+06 28401.744069 27100 7105985 20687.478156

9.4.2 Data Processing#

The purpose of this exercise is to analyze how advertising expenditures by media channels affect business sales revenue. We will include only the media expenditure data, the target feature ‘revenue’, and the ‘date’ column.

Before we prepare the data for model training, we use package ‘sweetviz’ to run an automated EDA analysis. From the generated EDA report, we can see that this dataset has no missing values.

data = data.drop(columns=['facebook_i', 'search_clicks_p', 'competitor_sales_b'])
import sweetviz as sv

# Create a report based on the data
report = sv.analyze(data)

# Save the report to an HTML file
report.show_html('results\Chapter9_eda_report.html')

9.4.3 Feature Engineering#

As we discussed earlier, we need to consider the lagged effects of media advertising expenditures and the interaction effects across media channels.

  1. The adstock effect is often used to account for the carryover effects of advertising on sales. In MMM, the adstock effect is typically captured by including a lagged advertising variable in the model. This lagged advertising variable represents the impact of advertising on future sales, allowing marketers to capture the cumulative impact of advertising over time rather than just the immediate impact of an ad.

The formula for calculating adstock effect is typically expressed as: $\( A_t = X_{t} + \text{adstock\_rate} * A_{t-1} \)$ Where:

\(A_{t}\) represents the adstock at time \(t\)

\(X_{t}\) is the advertising expenditure at time \(t\)

\(\text{adstock\_rate}\) is the percentage of the effect that remains at each time period after the initial advertisement.

We develop a Python function ‘apply_adstock’ as below based on the formula and apply it to transform the media spend features with the adstock effects.

# Function to apply adstock (decay) effect
def apply_adstock(x, decay_rate=0.5):
    adstocked_x = []
    previous = 0
    for value in x:
        adstocked_value = value + (previous * decay_rate)
        adstocked_x.append(adstocked_value)
        previous = adstocked_value
    return pd.Series(adstocked_x)
# adstocking the media spend
media_spend_list = ['tv_s', 'ooh_s', 'print_s', 'search_s', 'facebook_s']

df_adstock = data.copy()
for media in media_spend_list:
    df_adstock[media] = apply_adstock(df_adstock[media])

I need to call out that as we transform the media spend features with adstock effects, the first several rows of the post-transformation do not include the adstock effects of the previous weeks as they are not available from the dataset. Thus, we need to exclude those rows from the training dataset. The number of weeks to exclude is determined by the decay rate and the threshold level that we can consider the adstock effects as ignorable on each of the remaining periods. As the general form of the exponential decay formula is $\( V_t = V_0 \times {(\text{decay}\ \text{rate})}^t \)$

where:

\(V_t\) is the value at time,

\(V_0\) is the initial value,

\(\text{decay}\ \text{rate}\) is the rate at which the value decays per time period, and

\(t\) is the number of time periods.

After n time periods, \(V_t = b \times V_0\) where b is the threshold level. Thus, we can have the formula to calculate \(n\) as below: $\( n = \frac{\ln(b)}{\ln(\text{decay}\ \text{rate})} \)$

import math

decay_rate = 0.5  # Example decay rate
threshold = 0.10  # 5% threshold

t = round(math.log(threshold) / math.log(decay_rate))
print(t)
3

If we consider decay rate = 0.5 and the threshold level = 0.20, the need to exclude the first two rows of dataset to remove the data noise from the data transformation.

# exclude the first t rows
df_adstock = df_adstock.iloc[t:]
print(df_adstock.shape) 
(205, 7)
  1. Interaction Terms can be captured by multiplication of terms or polynomial features. In our use case, we will use the multiplication of two media spend features.

# add interaction terms of medias in media_spend_list
for i, media1 in enumerate(media_spend_list):
    for j, media2 in enumerate(media_spend_list):
        if i < j:
            df_adstock[media1 + '_x_' + media2] = df_adstock[media1] * df_adstock[media2]
  1. Time-related features: Based on the ‘date’ column, we can create a feature ‘month’ to capture the seasonality.

# create month feature from 'date'
df_adstock['month'] = pd.to_datetime(df_adstock['date']).dt.month
# create dummy variables for 'month'
df_adstock = pd.get_dummies(df_adstock, columns=['month'], drop_first=True)
df_adstock.head()
date revenue tv_s ooh_s print_s search_s facebook_s tv_s_x_ooh_s tv_s_x_print_s tv_s_x_search_s ... month_3 month_4 month_5 month_6 month_7 month_8 month_9 month_10 month_11 month_12
3 2015-12-14 2.875220e+06 700488.475000 124011.0000 146234.958333 53850.000 184951.742781 8.686828e+10 1.024359e+11 3.772130e+10 ... False False False False False False False False False True
4 2015-12-21 2.215953e+06 350244.237500 582010.5000 73117.479167 54025.000 113163.349546 2.038458e+11 2.560898e+10 1.892194e+10 ... False False False False False False False False False True
5 2015-12-28 2.569922e+06 424311.918750 291005.2500 275976.072917 55312.500 216691.829975 1.234770e+11 1.170999e+11 2.346975e+10 ... False False False False False False False False False True
6 2016-01-04 2.171507e+06 217308.859375 570665.6250 137988.036458 55456.250 217036.281974 1.240107e+11 2.998602e+10 1.205113e+10 ... False False False False False False False False False False
7 2016-01-11 2.464132e+06 130796.129687 285332.8125 245853.684896 59728.125 232412.625582 3.732043e+10 3.215671e+10 7.812208e+09 ... False False False False False False False False False False

5 rows × 28 columns

9.4.4 Model Training#

# Predictor variables with adstock and interaction terms
X = df_adstock.drop(columns=['date', 'revenue'])
y = df_adstock['revenue']  # Response variable
X = sm.add_constant(X)  # Adds a constant term to the predictor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, train_size=0.8)
# build more complex models
import xgboost
 
from sklearn.ensemble import AdaBoostRegressor
from xgboost.sklearn import XGBRegressor 
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn import neighbors
from sklearn.linear_model import LinearRegression

ada = AdaBoostRegressor()
xgb = XGBRegressor()
rfr = RandomForestRegressor()
dtr = DecisionTreeRegressor()
gbr = GradientBoostingRegressor()
knn = neighbors.KNeighborsRegressor() 
lr = LinearRegression()
names = ["LinearRegression", "AdaBoostRegressor","XGBRegressor","RandomForestRegressor","DecisionTreeRegressor",
        "GradientBoostingRegressor","KNeighborsRegressor"]
models = [lr, ada, xgb, rfr, dtr, gbr, knn]
def train_model(model_name, model, X_train, y_train, X_test, y_test):
    model = model.fit(X_train, y_train)
    
    # Predict the test data
    y_pred = model.predict(X_test)

    # Calculate the R-squared, MAE, and RMSE
    r2 = r2_score(y_test, y_pred).round(3)
    mae = mean_absolute_error(y_test, y_pred).round(3)
    rmse = mean_squared_error(y_test, y_pred, squared=False).round(3)
        
    return model_name, model, r2, mae, rmse
results = []

for model_name, model in zip(names, models):
    model_name, model, r2, mae, rmse = train_model(model_name, model, X_train, y_train, X_test, y_test)
    results.append((model_name, model, r2, mae, rmse))
results[0]
('LinearRegression', LinearRegression(), 0.766, 229173.994, 303033.456)

9.4.5 search for the best model#

import pandas as pd
import math
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split

def apply_adstock(x, decay_rate=0.5):
    adstocked_x = []
    previous = 0
    for value in x:
        adstocked_value = value + (previous * decay_rate)
        adstocked_x.append(adstocked_value)
        previous = adstocked_value
    return pd.Series(adstocked_x)

def apply_synergy(data, media_spend_list = ['tv_s', 'ooh_s', 'print_s', 'search_s', 'facebook_s']):

    df = data.copy() 
    # add interaction terms of medias in media_spend_list
    for i, media1 in enumerate(media_spend_list):
        for j, media2 in enumerate(media_spend_list):
            if i < j:
                df[media1 + '_x_' + media2] = df[media1] * df[media2]
    return df 

def trim_threshold(data, decay_rate=0.5, threshold=0.05):
    t = round(math.log(threshold) / math.log(decay_rate))
    return data.iloc[t:]

def add_ts_features(data):
    data['month'] = pd.to_datetime(data['date']).dt.month
    data = pd.get_dummies(data, columns=['month'], drop_first=True)
    # drop date column
    data.drop(columns=['date'], inplace=True)
    return data

def train_data(data, target='revenue',train_size=0.8):
    X = data.drop(columns=[target])
    y = data[target]  # Response variable
    X = sm.add_constant(X)  # Adds a constant term to the predictor             
                
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, train_size=train_size)
    return X_train, X_test, y_train, y_test

def data_pipe(data, media_spend_list= ['tv_s', 'ooh_s', 'print_s', 'search_s', 'facebook_s'], decay_rate=0.5, threshold=0.1, target='revenue', train_size=0.8): 
    df = data.copy()
    for media in media_spend_list:
        df[media] = apply_adstock(df[media], decay_rate=decay_rate) 
    df = apply_synergy(df, media_spend_list)
    df = add_ts_features(df)
    df = trim_threshold(df, decay_rate=decay_rate, threshold=threshold)
    X_train, X_test, y_train, y_test = train_data(df, target=target,train_size=train_size)
    return X_train, X_test, y_train, y_test    
decay_rates = [round(0.05*i, 2) for i in range(1, 20)]
results = []
for decay_rate in decay_rates:
    X_train, X_test, y_train, y_test = data_pipe(data, media_spend_list= ['tv_s', 'ooh_s', 'print_s', 'search_s', 'facebook_s'], decay_rate=decay_rate, threshold=0.1, target='revenue', train_size=0.8)

    for model_name, model in zip(names, models):
        model_name, model, r2, mae, rmse = train_model(model_name, model, X_train, y_train, X_test, y_test)
        results.append((decay_rate, model_name, model, r2, mae, rmse))
results[0]
(0.05, 'LinearRegression', LinearRegression(), 0.785, 209114.346, 289760.622)
# get all R-squared from results
r2_scores = [result[3] for result in results]
# model with the highest R-squared
best_model_index = r2_scores.index(max(r2_scores))
# get the best model in terms of R-Square
best_model = results[best_model_index]
best_model_decay_rate = best_model[0]
best_model_name = best_model[1]
best_model_r2_score = best_model[3]
best_model_trained = best_model[2]

print("Decay Rate for the best model:", best_model_decay_rate)
print("Name of the best model:", best_model_name)
print("R2-Score of the best model:", best_model_r2_score) 
Decay Rate for the best model: 0.9
Name of the best model: LinearRegression
R2-Score of the best model: 0.876

9.4.6 Media Expenditure Optimization#

def objective(spend, base_data=X,  model=lr, media_spend_list= ['tv_s', 'ooh_s', 'print_s', 'search_s', 'facebook_s'], decay_rate=0.5, threshold=0.1):
    """
    Objective function to be minimized.
    
    Parameters:
    - spend: Array of spend by media channel.
    - model: Trained GAM model.
    - base_data: DataFrame containing base values for all features except media spends.
    
    Returns:
    - Negative predicted revenue to be minimized.
    """
    # Copy base_data to avoid modifying the original
    df = base_data.copy()

    for i, media in enumerate(media_spend_list):
        df[media] = spend[i]
        df[media] = apply_adstock(df[media], decay_rate=decay_rate) 
    df = apply_synergy(df, media_spend_list)
    df = add_ts_features(df)
    df = trim_threshold(df, decay_rate=decay_rate, threshold=threshold)
    
    # Predict revenue with the updated media spends
    predicted_revenue = model.predict(df)
    
    # Return negative revenue because we are minimizing
    return -np.sum(predicted_revenue)
data.head()
date revenue tv_s ooh_s print_s search_s facebook_s
0 2015-11-23 2.754372e+06 167687.6 0 95463.666667 0 228213.987444
1 2015-11-30 2.584277e+06 214600.9 0 0.000000 31000 34258.573511
2 2015-12-07 2.547387e+06 0.0 248022 3404.000000 28400 127691.261335
3 2015-12-14 2.875220e+06 625877.3 0 132600.000000 31900 84014.720306
4 2015-12-21 2.215953e+06 0.0 520005 0.000000 27100 20687.478156
spend = data[media_spend_list].copy()
spend.head()
tv_s ooh_s print_s search_s facebook_s
0 167687.6 0 95463.666667 0 228213.987444
1 214600.9 0 0.000000 31000 34258.573511
2 0.0 248022 3404.000000 28400 127691.261335
3 625877.3 0 132600.000000 31900 84014.720306
4 0.0 520005 0.000000 27100 20687.478156
from scipy.optimize import minimize

# Total budget constraint
total_budget = data[media_spend_list].sum().sum() # Example total budget

# Constraints function ensuring the sum of spends equals the total budget
constraints = ({'type': 'eq', 'fun': lambda spend: total_budget - sum(spend)})

# Bounds for each channel (e.g., min and max spend for each)
bounds = [(0, total_budget) for _ in media_spend_list]  # Example bounds

decay_rate = best_model_decay_rate
threshold = 0.1
t = round(math.log(threshold) / math.log(decay_rate))
# Initial guess (e.g., evenly distribute the total budget across channels)
initial_guess = [total_budget / len(media_spend_list) for _ in media_spend_list]

# Run the optimization
result = minimize(fun=objective, x0=initial_guess, constraints=constraints, bounds=bounds, method='SLSQP')

# Check the optimal spends found
if result.success:
    optimal_spends = result.x
    print("Optimal spends by media channel:", optimal_spends)
    # Calculate the predicted revenue with the optimal spends
    optimal_revenue = -result.fun
    # Print the predicted revenue with the optimal spends
    print(f"Predicted revenue with optimal media spend:, {optimal_revenue:,}")
    # print the original media spend
    print("Original media spend: spend.sum().values[0]")
    # print the predicted revenue with the original media spend
    print(f"Predicted revenue with original media spend: {objective(spend.sum().values):,}")
    # print the actual revenue with the original media spend
    print(f"Actual revenue with original media spend: {data['revenue'].sum():,}")
    # print the difference between the original and optimal revenue
    print("Revenue difference:", optimal_revenue - objective(spend.sum().values))
else:
    print("Optimization was not successful. Reason:", result.message)