recent news
Aggregation vs. Pooling

Aggregation and pooling are subjects of debate in the marketing modeling community: we know both are important to get right yet there is wide disagreement as to what is right. Coefficient Generator™ provides facilities for testing various methods of aggregating and pooling along the product and geography dimensions.

Aggregation and pooling are often confused. Product aggregation means adding or otherwise combining the data for multiple products into one product. Product aggregation issues include the question of brand aggregate versus item-level models, as well as the issue of which UPCs to combine into “items”.

Product pooling means keeping the products separate, but forcing one or more of their coefficients to be the same or similar. Pooling issues revolve around the proper balance of model richness versus reliability. Shrinkage estimators could be considered to be somewhere between pooling and not pooling, which is why shrunk coefficients are always somewhere between the pooled and “unpooled” coefficients.

Geography aggregation and pooling issues are completely analogous to those for products but instead of UPCs/items/brands, geography aggregation and pooling is concerned with stores/chains/markets, etc.. Aggregation and pooling can also apply to time periods, coupons, merchandising conditions, and many other potential marketing “dimensions”.

General Recommendations

Aggregation and pooling issues are best resolved by striking the proper balance between possibly conflicting objectives:

Disaggregate data --> Accurate coefficients

Aggregate data --> Manageable database

Unpooled observations --> Detailed coefficients

Pooled observations --> Reliable coefficients

Basically, the idea is to easily derive reliable coefficients by aggregating as much as possible without sacrificing accuracy and pooling as much as possible without losing important detail. Generally this means aggregating together products and geographies which are homogeneous in marketing activities and for which separate response estimates are not required. Further combination should be through pooling, again without combining things (products, geographies, coupons, advertising copy executions, etc.) requiring separate response estimates.

Homogeneity in marketing activities implies that the products or geographies are subject to the same pattern of pricing, merchandising, couponing and advertising. For example, suppose a brand consists of 20 UPCs which fall into 4 “price groups”. All marketing is done either for the entire brand or for an entire price group - e.g. never is a single UPC put on promotion. In this case, if estimates are required by brand, it would probably be best to aggregate UPCs into price groups and then pool price groups by brand. See Cooper and Nakanishi (1993) for a lucid discussion of when it is acceptable to aggregate products and geographies.

Note that homogeneity in marketing activities need not necessarily imply homogeneity in response to those marketing activities. The general consensus is that homogeneity in marketing activities is more important than homogeneity in response (see Allenby and Rossi, 1991), though credible arguments have been made for both (see Blattberg and Neslin, 1990).

Discussion

There are several reasons why accuracy can be lost when aggregating, but most of them can be traced to problems encountered when combining products, geographies (or time periods) which are heterogeneous in marketing activities. Problems caused by improper aggregation are generally lumped together and referred to as “aggregation bias”.

For example, if weekly data is combined into months and there is exactly one 2-week promotion per month, all sales variation due to the promotion will be lost. However, aggregating into 2-week periods, one promoted and one non-promoted, may be acceptable because the time periods are homogeneous in marketing activities.

If responses are nonlinear, it is easy to see that aggregation can cause problems. That is, a 20% discount may work more than twice as well as a 10% discount. Aggregation would distort this nonlinearity, unless homogeneity was maintained by aggregating only observations with similar discount levels.

When stores are aggregated together, problems have been encountered estimating responses which are multiplicative at the store level. This is really a form of model misspecification that occurs when using “% of stores” type variables in a multiplicative model. The response is being modeled as one multiplicative function, but it is really the sum of two multiplicative functions (promoted stores and non-promoted stores). This leads to predictable, large positive biases in promotion response estimates. Again, the problem can be eliminated by aggregating only stores with homogeneous marketing activities. This is the essence of Marketing Analytics’ Store Group model.

For an introduction to aggregation and pooling in general, see Blattberg and Neslin (1990), Hanssens et. al (1990). For information on aggregation bias and solutions, see Wittink et al. (1994), Link (1995), Bender and Link (1994). For a discussion of when pooling is appropriate, see Bass and Wittink (1975).