machine learning

Your eCommerce product performance reports are probably misleading you | by Hattie Biddlecombe | Oct, 2024

October 18, 2024

Why single metrics in isolation fall short and how Weighted Composite Scoring can transform your business insights

A stickman stands at the top of a tall ladder, peering over a wall. Another stickman with a shorter ladder can’t see over the wall. Beyond the wall are answers to a business’s true product value. The rungs of the ladders represent different metrics, allowing the taller ladder to provide more visibility with additional metrics.

In the world of e-commerce, relying on individual metrics to assess product and brand performance can be misleading. Metrics, in isolation, can create a false sense of success, leading to overinvestment in products that appear profitable but are actually draining your business’s resources or, conversely, undervaluing items with untapped potential.

To stay ahead, you need a holistic view — one that evaluates product and brand performance across several key metrics like ‘gross revenue’, ‘conversion rate’, ‘gross margin’, ‘customer acquisition cost’, ‘repeat purchase rate’, ‘fulfillment costs’ and ‘return rate’.

Below is a typical example of some eCommerce data that many of my clients work with. To protect client confidentiality and ensure privacy, the data shown here is synthetic, generated using AI. Although it includes a variety of important metrics, teams often only focus on the metric most relevant to their goals which can obscure the bigger picture. For instance, sorting by sales_gross_amount makes ‘Towel 17’ appear to be the top performer:

Table 1: eCommerce products sorted by gross sales amount

However, when we sort by a custom score that considers all the metrics equally, we find that ‘Cushion 152’ emerges as the best-performing product, while ‘Towel 17’ drops significantly to position 213 out of 500 products:

Table 2: eCommerce products sorted by weighted composite score

Side note: In practice, I probably wouldn’t use this many metrics simultaneously, as it can overcomplicate decision-making. However, I wanted to give you a complete picture of the different factors you could consider. Also, you may have noticed that I haven’t included Add to Basket as one of the metrics in the table. While it’s a useful early-stage indicator of customer interest, it doesn’t always translate into final sales or long-term product performance. However, some may still find value in tracking this metric.

To avoid these pitfalls of single metric assessment and to gain a more accurate evaluation of product and brand performance across multiple metrics, we use a method called Weighted Composite Scoring.

A Weighted Composite Score combines multiple metrics into a single, insightful metric that provides a comprehensive view of each product’s value across various dimensions. Think of it like your final grade in school — each subject may be assessed on a different scale, but ultimately they are combined into one overall score.

This composite score can also be weighted to emphasise specific metrics, allowing you to align with particular business goals such as prioritising profitability over growth or reducing return rates.

Next, let’s explore how to implement a Weighted Composite Score using Python:

import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScalerproduct_df= pd.read_csv('product_data.csv') # This is a set of artificially generated data
product_df.head()

There are many scaling techniques you can apply, but for this dataset, Z-Score Normalisation is the most effective scaling method. Here’s why:

Balances different scales: Z-Score Normalisation converts each metric to have a mean of 0 and a standard deviation of 1. This levels the playing field for metrics that vary significantly in scale — whether it’s thousands in revenue or single-digit conversion rates. Ultimately, this makes it easy to compare products across different dimensions.
Handles outliers better: Unlike Min-Max scaling, which can be distorted by extreme values, Z-scores reduce the influence of outliers, ensuring fairer representation of all metrics.
Identifies above / below average performance: Z-scores allow us to see whether a value is above or below the mean, using positive or negative values (as you can see in Table 4 below). As we’ll see, this insight will be useful later on for understanding how individual products perform relative to the mean.

Refining with Min-Max Scaling

While Min-Max scaling alone wouldn’t have been suitable for scaling the raw data in this dataset, we applied it after Z-Score Normalisation to transform all the values into a consistent range between -1 and 1. By doing this, it becomes easier to fairly compare metrics as all values are now on the same scale, ensuring that each metric contributes equally to the final analysis.

The code below demonstrates how to apply the scaling methods to our dataframe:

# Select numeric columns and create corresponding scaled column names
numeric_cols = product_df.select_dtypes(include=['float64', 'int64']).columns
scaled_cols = ['scaled_' + col for col in numeric_cols]# Apply Z-Score Normalisation and then Min-Max scaling in one go
scaler = MinMaxScaler(feature_range=(-1, 1))
product_df[scaled_cols] = scaler.fit_transform(StandardScaler().fit_transform(product_df[numeric_cols]))
product_df.head()

*Table 4: Product dataframe showing scaled metrics*

Next, we want to provide the option for our end users to add weights to certain metrics. This allows the user to give greater importance to certain metrics based on business priorities or objectives. Different departments may prioritise different metrics depending on their focus. For example, the Marketing team might be more interested in customer acquisition and conversion, where conversion rate, customer acquisition cost (CAC), and repeat purchase rate are key indicators of success.

Metrics like fulfillment costs, CAC, and return rate represent negative factors for a product’s performance. By applying negative weights, we ensure that higher values in these metrics lower the overall composite score, reflecting their adverse impact:

# Example user-provided weights (this can be dynamic based on user input)
user_weights = {
'scaled_conversion_rate': 0.14,
'scaled_sales_gross_amount': 0.14,
'scaled_gross_margin': 0.14,
'scaled_customer_acquisition_cost': -0.14, #notice negative weight here
'scaled_fulfillment_costs_per_unit': -0.14, #notice negative weight here
'scaled_return_rate': -0.14, #notice negative weight here
'scaled_repeat_purchase_rate': 0.14
}# Calculate weighted composite score
product_df['weighted_composite_score'] = sum(product_df[col] * weight for col, weight in user_weights.items()) / sum(user_weights.values())

Weighting Metrics with Regression Analysis

Just as a side note, a more data-driven approach to assigning weights in a composite score is to use regression analysis. This method assigns weights based on each metric’s actual influence on key outcomes, such as overall profitability or customer retention. By doing so, the most impactful metrics naturally carry more weight in the final composite score.

As you can see in the table below (and also shown at the beginning of this blog), when we order by scaled_sales_gross_amount the product ‘Towel 17’ is in top position:

However, when we order by our new weighted_composite_score , ‘Cushion 152’ comes in top position, whereas the Towel 17 falls all the way down to position 213 out of 500:

Thanks to the positive and negative Z-scores, we can clearly see in Table 1 that while Towel 17 excels in sales and profitability, it struggles with repeat purchases and has a high return rate — potential indicators of quality or customer satisfaction issues. Addressing these challenges could result in significant improvements in both profitability and customer loyalty.

In Table 2, we can see that Cushion 152 performs exceptionally well in terms of profitability (high gross margin and low costs), with solid conversion rates and a low return rate. While it doesn’t have the highest sales, it stands out as a top performer overall due to its efficiency and customer satisfaction. I would recommend that this website increase this product’s visibility through targeted marketing campaigns and feature it more prominently on the site to drive additional sales.

I also analysed the brands in the dataset, and once again, a different picture emerges when we analyse data through the lens of a Weighted Composite Score.

At first glance, EcoLiving appears to be the top performer based solely on sales_gross_amount. However, our Weighted Composite Score, which balances all key metrics equally, reveals that PureDecor is the most valuable brand overall. This approach allows us to identify the brand delivering the greatest all-around value, rather than focusing on a single metric or dimension of performance:

*Table 5:* eCommerce products sorted by weighted composite score

In conclusion, implementing a Weighted Composite Score is a simple yet highly effective method for analysing complex datasets that can be easily integrated into your existing reporting tools.

For my clients, this approach has had a significant impact — it has prevented unnecessary cuts to products & brands that were mistakenly thought to be underperforming. It has also helped reallocate resources away from products & brands that were draining budgets without delivering proportional value.

Weighted Composite Scoring can be applied to any area where multiple important metrics need to be balanced. For example, it can help optimise web content, enhance SEO strategies & improve customer segmentation, making it a transformative tool across multiple areas of your business.

If you’d like a hand with implementing a weighted scoring system or just want to chat about your data woes, feel free to reach out to me via email, my website, or LinkedIn.

Unless otherwise noted, all images are by the author