Natural Patterns in Cookie Stuffing, Pt. 3
Recently someone bumped a 6 month old thread I wrote on the BlackHatWorld forums discussing a framework we were developing to avoid detection while cookie stuffing. The same article was also posted on our blog. We created a quick implementation of this framework, but not anything worth releasing. It also became apparent that while it is true that controlling click-through rates is an important aspect of successful cookie stuffing, conversions rates and other metrics are probably equally important. People have brought up some good points in the thread that I think we can validate and work from.
Of the importance of conversion rates, Genjutsu writes:
The one unatural thing no one has been able to account for is the conversion rate. This is what they will ultimately look at when deciding to ban your account. You can make your traffic and stuffing look organic as possible, but when they look at the data they recieve and you only convert .01% of the clicks you sent, they will find something fishy.
This is why ebay stuffing works, your stuffing users ON ebay. Your conversion rate will look closer to a whitehat affiliate, even better than 90% of all whitehats. Why do you think so many got dismissed from the EPN network for not having involved traffic.
Well, work on targetting and we wont have to worry about traffic patterns. Real traffic (read clicks) blended with stuffs will always yield the best results and be the hardest to find a pattern.
We agree that this is true and that padding statistics with real traffic absolutely is the best case scenario. The more real human traffic we can mix in with our stuffed traffic, the better the campaign will fair. Perhaps one solution is to develop one or two natural sources of real human traffic and then fork their legitimate clicks to multiple, different EPN accounts. The probability of human click redirection could be modified by the number of stuffs each campaign has sent in the past week or 24 hours.
So, for example, if you’ve stuffed 200 users with Campaign A and 300 users with Campaign B, you’d set t If you weren’t stuffing a lot of people, you wouldn’t need that many natural clicks, so long as they were relatively evenly distributed by need. This might be a little advanced for new marketers, but I think that the theory holds that taking care of referrers and padding are the best forms of protection.
Natural randomness is introduced with great abundance by human interaction with any automated system. While this may seem intuitive, many people without a technical background forget that random calls in programs aren’t that random. Almost every statistical system we can design that could be rendered useful in this kind of solution will have a well-defined footprint. Surprisingly enough, this is at least partially dependent upon the language the script is written in, compounded with the operating system the server runs on.
We developed a little proof-of-concept to show what we’re talking about. Visualization can be a great tool to use when doing research. Our eyes and brain pick up on patterns that might not come to us intuitively and as such may act as an extension of our own logical systems. To show how random isn’t always necessarily random, we can use the GD library and PHP to create images that show how random isn’t always random. In an ideal situation, the image created would appear to look like static and have no well defined pattern or path. However, because PHP’s rand() function is merely a wrapper and calls the system level random function, this isn’t always the case.
Image generated by Windows w/ PHP and GD Library:

As you can plainly see, this isn’t exactly random. In fact, there is an easily discernible pattern. This is not PHP’s fault, but rather a problem with Windows, which has never handled random numbers properly. Luckily, most black hatters are hosted on server’s running *NIX based operating systems where randomness is handled much more cleanly.
Image generated by CentOS w/ PHP and GD Library:

Much better. As you can see, the static-like pattern that we’d expect to see is apparent. To up the ante even more, we can introduce the Mersenne prime method of randomness.
Image generated by CentOS w/ PHP and GD Library using mt_rand():

While this image may look a lot like the first CentOS random image, it is, in essence, more random. (blackhatzen apologies to mathematics professors and researchers everywhere for using those two words in sequence.)
This is an example of the PHP code used to generate these images. The GD library must be installed on the host computer, or this code will fail when it attempts to make the imagecreatetruecolor() call.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | < ?php // HTTP Header for PNG header("Content-type: image/png"); // Creates a new 512x512 image with GD // For more information on GD: http://www.libgd.org/ $newImage = @imagecreatetruecolor(512, 512) or die("Install the GD Library!"); $colorWhite = imagecolorallocate($newImage, 255, 255, 255); // This is where the "magic" happens. // One could replace the rand() call, with this: // if (mt_rand(0,1) === 1) { // The Mersenne twister random call (mt_rand) is "more random." // For more information on Mersenne twister: // http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ewhat-is-mt.html for ($yAxis=0; $yAxis<512; $yAxis++) { for ($xAxis=0; $xAxis<512; $xAxis++) { if (rand(0,1) === 1) { imagesetpixel($newImage, $xAxis, $yAxis, $colorWhite); } } $xAxis=0; } // Displays the image we created. imagepng($newImage); // Destroys the image we created. imagedestroy($newImage); ?> |
If you have trouble copying and pasting, you can download an archive of this PHP code here.
Aside from cookie stuffing, another purpose of developing these frameworks is to ascertain what pattern recognition algorithms affiliates are utilizing so that we may more effectively avoid detection on all kinds of campaigns. The purpose of showing that random isn’t necessarily random is to try and help people to think about what they’re doing and hopefully begin to question their assumptions with regards to how simple certain kinds of pattern recognition can be in larger systems.
We’ll try and release some code on the blog in the next few weeks to show users how to develop the click-forking method we described earlier in this post.
Author’s Note:
Buckets of respect go out to Thomas Boutell and all those who contribute to the GD library as well as to Makoto Matsumoto and Takuji Nishimura for developing the Mersenne primes randomness algorithm.
Trackbacks & Pingbacks
- Pingback by blackhatzen :: Natural Patterns Pt. 2 on December 10, 2008 @ 3:20 pm
Just another little note for the PHP geeks out there that may try to find fault with this snippet, PHP now recommends you do NOT seed the randomizer yourself, it will do this automatically, seeding the randomizer yourself will generally provide at best about 1million variations, generally a lot less though, if you happen to be one of the users out there smart enough to combine PHP with suhosin it will not only automatically ignore user seeded randomizer numbers but it will seed it with a more random source than the default PHP version. Just because they run PHP on Windows and may have a predictable randomizer doesn’t mean they have the same setup as you do!
Just some more food for thought on this, granted, most people screw up random numbers in a major way and will end up srand()’n from time() or something stupid like that but well i’m ranting and you get the idea.