The Challenge of Set-top Box Data in Programmatic TV

By August 27, 2014 September 4th, 2014 clypd Blog, Technology

In our last post, we introduced the importance of data management platforms (DMPs) in the television industry. This month, we’ll discuss the importance of set-top box (STB) data in DMPs and programmatic TV.

As we know, data is a core tenet of programmatic TV. The layering of data sources on top of the media activity is essential in understanding the audience composition for the best data-enhanced decisioning.

In the linear TV world, of the many data sources available, perhaps none is more important or particular, than the second-by-second viewership activity from the set-top box. STB data can be used to measure all the activity, including that which is not measured by Nielsen. This long tail inventory primarily being consumed on cable networks constitutes greater than 40% of TV viewership. The challenge lies in the different rules, technologies, and protocols that exist when looking to utilize that STB data in a consistent, coherent manner.

What is set-top box data?

There are dozens of STB manufacturers and models and just as many ways to collect and measure the viewership data. However, the same basic information is gathered from each manufacturer/model combination with hardware address (MAC address) being used as the unique ID for the box.

The other common STB data captured includes:

  • MAC address: a unique identifier for the set-top box
  • Channel ID: the channel identifier
  • Time on: the local time tuned to that channel
  • Time off: the local time of the tune off of the channel

All of this data is considered to be Personally Identifiable Information (PII), and is strictly regulated from being shared outside of the cable plant, but it is vital to the programmatic industry for targeting and verifying impressions with specific audiences.

We will be going into detail on how to effectively share PII data externally in a forthcoming post about privacy compliance in television.

How does set-top box data travel to its destination?

Data is stored on the STB until it can be sent to a server for collection. The method for sending to the server is different for each manufacturer and/or model, but generally, it is sent upstream in one of three ways:

  • Polled: the head-end server sends a “poll message” to each box on the node and asks for its stored data
  • Transmitted when STB upstream data buffer is full – on an allocated upstream frequency when buffer is full, or on a daily basis
  • Transmit on broadband channel – an almost real-time delivery of set-top box data for boxes with broadband capability

This means that depending on the delivery method, the data could be delivered in a matter of seconds (or days) from the time of the event. This inconsistency is important to recognize and account for as a platform works to normalize the data.

Lack of standardization in the collection of set-top box data

Because of the various methods of collection, as well as the different manufacturers of the set-top boxes, there exists a lack of standardization in the way that the data is reported, measured and delivered.

Take for example, these inconsistencies of the data, according to the business rules of each manufacturer:

What defines an impression? How long must a STB remain on a particular channel to justify counting activity, or what we call dwell time? 20 seconds? 10 seconds? We try and institute a household quarter-hour impression count per network.

Counting all impressions. Each STB per household should count as a unique impression, but is inconsistent across cable operators.

Zombie Activity. At what point should a STB be considered idle and thus, activity from that box should be ignored? This is important to allow for filtering out any STB-driven, of forced channel changing

After this data is normalized, then it needs to be married with off-box information, such as a unified time source and ubiquitous channel identifications.

Marriage of STB data with household identification data

Where these rules are being determined, whether at the STB level or on the server level, is essential in the interpretation of the data. Each of the detailed nuances of the collection rules should be carried to the external world along with privacy compliant characteristics of the household.

For companies performing audience modeling, they must capture the characteristics of the household along with a unique household identifier.

Here are some household characteristics that companies take into account:

  • A unique encrypted ID (UID) that cannot be reverse translated to the original MAC address
  • Geographic data, such as zip code
  • Attributes of the household – a set of household descriptors (ex: presence of children, household income, education, etc) that can be used to better understand the audience and to build an audience model for historical and predictive analytics
  • Subscriber information – such as service tier, DVR and VOD availability, as well as broadband service level.

This information helps relate the audience attributes with TV viewing habits, which allows for media planning, targeting and reconciling delivery.

With all that said, no STB data collection is without issues. Even if there existed 100% coverage for all TV media consumption (a pipedream, given the operational, handling and storage costs of doing that), there would still exist various issues that would have to be addressed, including:

When is a person watching? All that is known is that the STB is powered on; that someone is actually in front of the TV is not evident from the STB data itself.

Who is the person on the sofa? Which member of the household is watching the TV? Is it household matriarch or the 3-year-old child?

What is the person watching? Knowing if the TV is actually on or if the viewer is watching something other than the channel reported in the STB data, like a DVR or Blu-ray player is not known from the STB data.

Is the measured audience representative of the universe? The STBs in the panel are not known to be exactly representative of total viewing universe. The assumptions made must take this into account.

To answer any of these questions, the data must be joined with with other third-party data, often using a sprinkling of data science in a DMP to accurately determine the true reach of any campaign.

With that in mind, even with its flaws, if the STB data is correctly captured, measured, mapped, and then modeled, a large sample of STB data can more accurately capture the true viewership characteristic of the viewing universe, especially in this era of ever-changing media consumption habits.

To learn more about clypd and the intricacies of television data, stay tuned for the next installment of this monthly blog series.

Michael is the Director of Product Solutions at clypd and is based in Atlanta.

Leave a Reply