What Fraud in Online Research Really Looks Like, and How We’re Fighting It
Fraud is a persistent problem in the market research industry and maintaining the status quo to mitigate fraudulent behavior is not going to cut it. The same technologies that are making our industry more effective and efficient – including in the detection of fraud – are also making it easier for fraudsters to engage in disingenuous behavior. In this three part article series, we will address: what fraud looks like in online research; how to combat it beyond traditional detection techniques; and how advanced technologies like artificial intelligence (AI) can be used for fraud reduction.
Part 1: What Fraud in Online Research Looks like
A useful framework for thinking about the existence of fraud in online research are the foundational elements of U.S. criminal law: means, motive, and opportunity.
Motive: Exactly as you would suspect
It’s a story as old as time. In general, people cheat for some kind of perceived gain. In market research, fraudsters are principally looking for financial gain. This space is targeted because there are potentially higher payouts without a need to actually buy anything, unlike other allied sectors of the online economy (e.g., online advertising and customer acquisition).
Opportunity: Technology & depersonalization makes it easier
Fraudsters are always looking for scale on their fraud – the big payoff. Just duping one study, and its associated small payout, is pointless for the “professional.” The real fraudsters want to replicate success hundreds and even thousands of time in order to cash in. Market research provides them with this opportunity. And, as reliance on data increases, this opportunity also increases to scale.
The technology and methods that make it easier than ever for us to collect and analyze data has a side effect: depersonalization. This phenomenon removes the difficulty that would come in lying right to a person’s face – fraudsters can commit fraud with no human interaction whatsoever. And they can do it faster and more efficiently than ever before.
Detecting and defeating fraud like this is harder than ever and, even worse, many are paying insufficient attention to this movement. A metaphor we use frequently at P2Sample is that fraud is like a hardened city criminal moving into a country neighborhood where nobody locks their doors. Opportunity, therefore, is increased because barriers are low.
Means: Techniques and where fraud usually happens
Let’s be clear: there is invariably a human being behind the process of fraud. Yes, automation is making their job a bit easier, but the fraudulent behavior itself is not automated. The “bots” (essentially automation) we often talk about during a discussion on fraud are not 100% “bots.”
The humans behind the automated “bot” behavior find the most effective use of this technology in the registration process. We see a range of “bot” type of behavior that goes beyond the same individual simply creating multiple accounts and filling out the same information with a dynamic IP address. Some more sneaky fraud behavior manifests itself as:
- Scripted behavior surrounding the creation of automated emails and randomized user information (first name, address etc.) which appears clean. Scripts can simply scrape information from internet of REAL people that live close to those IP addresses (or use hacked computers), therefore fooling geo-based security systems, as well as any tools that run this information through third-party databases to verify its real.
- Creation of email accounts automatically with patterns that are random, yet with smart algorithms so that even a human has a hard time finding out whether its a fake email. (We can assume an email with asdfqwerty@ is fake, but you can create real looking emails with some nice automation.) Also, those emails can be automatically created on Gmail, allowing the double opt in every account to be fully automated.
But in the context of the survey itself, we find mainly non-automated behavior. We have been studying and monitoring the tools fraudsters use over time and had numerous conversations with people in community. None of them provide any sort of automated survey completion. They may offer some scripting/automation to get around fraud detection by using account information that is scripted and well-matched; but nothing is automated at the actual survey level. It is usually just someone from a lower income country who just spends hours on filling out that same survey, punching the same answers, sometimes with the help of tools.
It’s important to remember that everyone is facing fraud – it is built into the supply chain. We see this in the affiliate marketing and performance marketing sectors that many panel companies use to augment panels. For some, fraud is baked right into the margins and CPAs). But the situation, though daunting, is not hopeless. There is no perfect solution, but there are ways to make your your house harder to break into than your neighbor’s when it comes to fraud mitigation.
Part 2: Traditional tools won’t cut it
Detecting fraud at the registration stage is tough. The accounts usually look quite “clean” from a device/geo point of view because that part is scripted. Third-party fingerprinting and fraud detection tools do not detect this type of fraud, nor do most “home-made” solutions.
As fraud advances, common techniques to eliminate it will no longer work. While each have their place and can help detect fraud, they need to be coupled with more advanced solutions to have higher success rates. Some of these traditional approaches include:
- Captcha: common solution that introduces artificial blocks which require human intervention. However, this typically only works to ‘kill’ a fully automated process. Humans can solve the Captcha request easily and then hand over the rest of the process to a machine. By inserting these requests at random (not just at the start of a survey, for example), Captcha can have greater success in finding fraud.
- Honeypots: use of things that only machines would see, humans wouldn’t. This approach resupposes non-human automation at this stage.
- Open End Questions: this technique can offer fraud detection by finding statistically unrealistic results, caused by the same answer given by this one individual over and over. Unlike the ease of solving Captcha requests by a human, it is more difficult for most people (especially non-native speakers) to create real-looking, genuine, non-spammy open-ended answers on a large scale. Detection through open end questions can therefore work well, especially if there is good, solid pattern recognition behind it. If the pattern technology is subpar, good people could end up being blocked and/or fraud will still be let in.
Part 3: AI and machine learning for fraud detection
So how can we overcome these problems? By marrying traditional solutions with brand new techniques, like artificial intelligence (AI). AI isn’t all robots and Hollywood films. In fact, it can be very useful on the fraud detection front.
What AI is, and isn’t.
When it comes to fraud detection, what a lot of people call AI is not really AI. Fraud systems that make algorithmic decisions, albeit complex, are not truly based in AI. Because the terminology is popular, many use the term for marketing purposes, with no real basis in the technology itself. AI is getting computers to act or do “smart” things that they are not explicitly programmed to do. In short, as John McCarthy – the scientist who coined the phrase “artificial intelligence” – said, AI is “making a machine behave in ways that would be called intelligent if a human were so behaving.” There are several practices that fall under the AI umbrella when it comes to fraud detection including machine learning or self-learning, which is the ability for a machine to constantly adapt to new fraudulent patterns.
Dell Technologies has developed a useful graphic representation of AI. To view the whole infographic click on the image to the right or here.
Fraud detection essentially involves trying to identify actions that have a high probability of being fraudulent based on historical fraud patterns. Unpack this and we need three things:
- A method: the decision-making, heuristic or algorithmic, to decide whether a transaction is real or fraudulent.
- Historical data: a good fraud detection model requires large amounts of data to help allow accurate classification.
- Domain experience: traditionally overlooked, this piece is critically important in detecting fraud. This gives fraud detection techniques – even machine learning – a place to start.
At P2Sample we are addressing these three core issues head-on.
The method: We use AI, and machine learning specifically, to study patterns in real-time. What exactly does this mean? Here’s a simple example: Market research surveys are usually looking for fairly targeted users. To qualify for a study, you usually need a rather unique set of criteria. This is where pattern behavior comes into play. The right system can analyze billions of data components very quickly. It then detects anomalies such as large surges of: certain users going into a study; users with specific demographics; users in a certain ip range; or completions in a time frame on a specific study. All those things can be detected by looking at anomalies within patterns.
For example 500 females participating in a survey is fine. But 500 females joining from Alpharetta, Georgia, who own a dog and earn $250k can send up a red flag. Next time, it could be 200 males who drive Pink Cadillacs. The machine learns to adapt. This is a critical point. As fraudsters change tactics, traditional fraud detection tools, such as expert rules, fall down because they only know how to stop yesterday’s fraud. Advanced machine learning is the only way to stay on top of evolving fraud trends.
The data. In one year, we will collect 250 billion data points (demographics, survey responses, behaviors, device information, you name it) from millions and millions of panelists. All this data goes into the model to help the machine learn. More data means that machine learning models are better at classifying a survey complete as good or fraudulent.
Domain knowledge. This is one of least appreciated aspects of good AI models. Machine learning is not simply learning how to program a machine learning model and then slapping it on top of data. I would estimate ninety percent of machine learning is data science (understanding your data and what it represents), and 10 percent is the actual machine learning implementation. Knowing which data to use, as well as which data represents fraudulent behavior, is therefore essential to making the algorithm work as efficiently as possible. Our team has been in the performance marketing business for more than a decade. Without wanting to boast, we have a great deal of expertise in the field and are constantly updating our knowledge.
Is the P2Sample approach to fraud mitigation successful? While it is impossible to fully eliminate all fraud, our solutions have taken fraud detection and elimination to the next level. Our blue-chip clients give us our best indicator: they have repeatedly told us that, while they find high levels of fraudulent completes with most of their suppliers, they have not seen such levels from our panelists. Their testimonials are good indicators of our overall traffic quality.
So how do we lock the doors against fraud? By employing several layers of deterrents. A criminal will break into an open house before trying the one with steel doors, deadbolts, bars on the windows and a locked door. By marrying traditional and new techniques for fraud mitigation with deep expertise, we aim to have the safest house on the street.
- Wrapping up 2019 with a Trip to Bangkok December 12, 2019
- Getting to Know the Faces of P2Sample: Tommy Day December 10, 2019
- We’ll be in Bangkok Next Week for Affiliate World Asia 2019 November 26, 2019
- We are talking AI at the MR and Marketing Tech Conference this week November 20, 2019
- Making Bets on the Future of Market Research at TMRE in Las Vegas November 14, 2019