1 upvote 0 discussions

“Forget big data, focus on good data.” This is one of the top 2018 trends cited in a recent report by Outsell.

But how do you know which data is good?

Big data became a buzzword a few years ago, and now we’re close to being able to use aspects of it for advertising. Through this transition, marketers have grown skeptical of third-party data, because they can rely more on their first-party data to drive their advertising buys.

But first-party data does have its limits, and third-party data is still a necessary component to facilitate advertising campaigns at scale. After all, first-party data doesn’t reveal much about interests beyond the specific brand to which the data belongs.

As is the case with many emerging fields, standards typically do not exist in the period of innovation. Right now, there aren’t any industry standards to help guide marketers on online data.

Some initiatives have been proposed, but there are no universally accepted auditing measures or official authorization bodies that certify data. With that said, there are certain best practices that help to ensure data quality.

1. High-quality data scores

Having an understanding about where data is sourced is a good way to discern good quality data from the bad.

Marketers should be asking about the data provider’s methodology for incorporating data attributes and ensuring that data is being compiled from trusted and verified sources.

2. Consumer privacy

Does your data provider take a proactive approach toward protecting consumer privacy?

While there aren’t any universally agreed upon standards for third-party data, privacy standards have been in place for a long time, and it’s important to use data that upholds consumer privacy guidelines.

While there are many important principles for ensuring privacy-compliant third-party data collection, two important ones are:

  • Avoiding PII (Personally Identifiable Information) and enabling consumer opt-out from data collection.
  • Avoiding sensitive data, which is information about topics such as medical conditions, sexual orientation and race can be abused and are best omitted from the widely available taxonomies of data providers.

3. AI and machine learning (complemented by a human touch)

Speaking of buzzwords, AI and machine learning are definitely both in that category. But they aren’t all hype. Actually, they’re important components to data quality.

Because there is such a large volume of data available, humans can’t possibly parse through all of it. Data companies rely on machine learning algorithms to create usable information.

With that said, machine learning is nowhere near as sophisticated as human intelligence. Marketers should be inquiring about the level of human involvement with the data to help verify its worth.

4. QA data on a regular basis

Having an effective QA system in place is also essential for ensuring data quality and accuracy. While the QA process is often complex and multi-layered, providers should be incorporating both QA of the data classification process and QA of the logic behind the classification.

Through this process, inaccuracies are prevented. For example, this ensures that the same user is not classified in multiple income segments during same time period.

It’s also important to ensure that a process is in place to prevent contradictory information from entering the segment, such as a user being classified by an external source as having a certain income that contradicts the user’s own self-declared income. Tracking the same users on an ongoing basis to ensure this consistency is essential.

Finally, leveraging both automated algorithms and manual/human QA resources is also critical for maintaining quality control.

5. Fraud

Fraud is a huge problem. In fact, recent reports estimate that fraud losses could total around $6.5 billion in 2017.

This waste mostly accounts for ads served to fraudulent sites or to fake clicks by bots, but poor quality data also contributes to this figure.

Checking the amount of received cookies from your data provider that may be identified later in your programmatic bids is a helpful exercise.