How To Prevent Spy Bots From Snooping Around Your PPC Campaigns
As PPC affiliates, we spend countless hours researching our market, collecting keywords, building landing pages, and writing ads. After endless splits tests and tweaks, we finally find that grand slam “campaign” that seemingly deposits money into our bank, hand over fist. Month after month, the campaign is running strong, until one day for no apparent reason, the money dries up. What happened?
A number of things “could’ve” happened, but there is no time for speculation. There are “intelligence gathering” spiders out there, crawling and collecting all of our hard work into a centralized repository, for other affiliates to simply shortcut their way to your hard earned riches. Can anything be done to stop them?
Stopping these spy bots 100% of the time may or may not be possible, so prevention at the basic level is our best defense. We can construct a basic line of defense using our server’s .htaccess file.
If you aren’t familiar with .htaccess, you need to get educated quickly. Having a look at this article will get you up to speed with the rest of this tutorial: Apache Tutorial: .htaccess files
When I talk about “spy bots”, two that are widely in use among PPC affiliates comes to mind: KeywordSpy & SpyFu. Without giving you the whole breakdown of what these two services do, the reader’s digest version is: They steal your shit!
Luckily though, with a few .htaccess mods, we can block their spiders from ever visiting your site, therefore prevent the further analysis of your campaigns.
Here’s what you need to do:
1) Make a backup of your current .htaccess file. Modding existing .htaccess files can be tricky, especially if they already contain data. Save your ass and your site with a backup. Trust me.2) Open the existing .htaccess or create a new one with notepad.
3) Add the following text on a new line:
<Limit GET HEAD POST>
order allow,deny
deny from 74.53.36.242
deny from 65.39.72.142
deny from 66.34.204.26
deny from 66.34.0.
deny from 66.34.255.
deny from keywordspy.com
deny from keywordspypro.com
deny from spyfu.com
deny from spyfoo.com
deny from foospy.com
deny from fuspy.com
allow from all
</LIMIT>
What these lines do is blacklist the specified IPs and domains from being served a website upon request to your server. Notice that not all of the “deny” lines include full IP addresses, but just the first 3 octets of the address. This specification prevents the entire range of IP addresses on the given network from accessing your site in case other servers not listed might actually be the culprit data mining servers.
4) We’re not done yet. Again, on a new line add the following:
RewriteCond %{HTTP_REFERER} keywordspy\.com [NC,OR]
RewriteCond %{HTTP_REFERER} keywordspypro\.com
RewriteCond %{HTTP_REFERER} spyfu\.com [NC,OR]
RewriteCond %{HTTP_REFERER} foospy\.com [NC,OR]
RewriteCond %{HTTP_REFERER} fuspy\.com [NC,OR]
RewriteCond %{HTTP_REFERER} spyfoo\.com [NC,OR]
RewriteRule .* – [F]
What we’re doing with these lines is returning a 403 Forbidden Error to the server when the referrer is equal to the domains specified. In this case, I am “forbidding” KeywordSpy.com and SpyFu.com, along with a few other domains owned by these companies from collecting data.
Note: In order to make use of rewrites, the Apache modrewrite module must be enabled on your server. In most cases this is enabled by default, but with some web hosts, it won’t be. Add “RewriteEngine On” to the top of your .htaccess (w/o quotes)
5) Save your file as .htaccess and upload to the root directory of your server. If Windows gives you problems with naming your file .htaccess, simply call it “htaccess.txt”, upload it with FTP, then rename it once it’s on the server.
Note: Depending upon your server security, youre .htaccess file could be viewable by others. To prevent this, add the following lines and also CHMOD your .htaccess to 644 permissions:
<Files .htaccess>
order allow,deny
deny from all
</Files>
If you want to see a complete working example of a .htaccess file you can use, download here (rename to .htaccess) :
Now what we’ve just done is add two lines of defense. Not only are we blocking the named IP addresses and ranges from accessing our servers, we are also denying them based on domain name. The IP addresses may change at some point in time, but most likely the domain names will stay the same. For example, by adding “deny from keywordspy.com”, we are blocking Keyword Spy from visiting our site, no matter which IP address keywordspy.com may be assigned. So blocking both domain names and IPs will provide extra protection from future changes.
Of course the above mods aren’t an end all solution. The IP addresses in the deny section may change over time, so it will be your job as a defensive marketer to continually monitor which online spy bots are out there, get a list of their IP addresses and possibly ranges, and update your .htaccess file accordingly.
Also, check your server logs regularly for suspicious queries and new spiders that may be accessing your site. If your landing pages, keywords, and ads are already included in the databases of SpyFu or KeywordSpy, then I’m afraid you’ll just have to wait for the data to be removed, if at all.
I’m not 100% sure, but once the bots are denied access, I believe that over time these services will consider your campaign “offline”, and purge the collection of data from your site. But then again, KeywordSpy has a “TimeMachine” feature that can go back in time and pull cached data.
Again, the above mentioned mod won’t be a fix all to your spy bot problems, but just one protective measure you can take to better defend yourself from “clone” affiliates copying your campaigns. But if you’re on the Internet, you’re vulnerable. Period.
Luckily though, the data collected by these bots is often either outdated or simply wrong. They can monitor how long your keywords have been running, but not necessarily detect which specific keywords are profitable for you.
Also, because of numerous factors such as CPCs, commission rates, upsells, etc, they really have no idea how accurate your ROI is per keyword. So if you use these tools yourself to collect information, take it with a grain of salt and test before you get excited thinking you’ve just found yourself a profitable campaign.
There are so many factors at play influencing whether or not a campaign is profitable, even having a list of of keywords you know for a fact that someone is making money with, isn’t necessarily the magic bullet toward the riches. Testing will out win “copying” any day.
If you have other methods or insite into how spy bots collect data, be sure to let me know so I can update this post (and of course give credit where credit is due).
Way of the Warrior – Tip of the Day: Learn how to use and master .htaccess. It can be your friend when in need. But remember to make a backup before making any changes. Mistakes in your .htaccess can be disasterous to your website.
Tagged with: .htaccess • affiliate elite • affiliateelite • apache • block spiders • block spyfu • compete.com • competitive intelligence • hexa track • hexatrack • how to stop keyword spy • ispionage • ispionage.com • key compete • keycompete • keyword spy • keyword spy ip block • keywordspy.com • modrewrite • pay per click • ppc bully • ppc web spy • ppcbully • ppcbully.com • spy bots • spy fu ip block • spyfu • spyfu.com • stop keyword spy • Tutorials
Filed under: Pay Per Click • Tips & Tricks • Tutorials
Like this post? Subscribe to my RSS feed and get loads more!
Possibly related posts
- How To Install Prosper202 On An SSL (HTTPS) Server
- Prosper202 Self-Hosted Apps: 10 Best Practices To Securing Your Prosper202 Installation
- Importing Negative Keywords Into MSN AdCenter
- Content Stuffing Your Landing Pages For A Little Extra Google Loving
- Improving Google Quality Score With Hidden Navigation




I gotta say… most people have junky, overused tutorials on their blogs. This was awesome. I’ll definitely be trying this soon.
Thanks for the great info!
So if I wanted to block http://www.sitename.com I would add:
RewriteCond %{HTTP_REFERER} sitename\.com [NC,OR]
??
You got it! You can use .htaccess like your own little firewall.
Now though, if a site like Keyword Spy is visiting you but first erasing their referrer, then we might have a problem. But its better than no defense at all.
@Clint Lenard: thanks for the comment. I’ve been seeing a void in affiliate blogs lately so I’m hoping to put some more advanced stuff that might be useful, especially tutorials.
Wes —
Thanks, this was JUST what I needed as we’ve got a decent PPC campaign in a new niche ready to expand from the test stage.
I appreciate the good ACTIONABLE advice on your blog — please keep it up!
Thanks for this! I searched specifically for a way to block this bot and you were the only site that actually came up with anything at all practical and useful!
I take it though that as and when they change their IP address, we will have to change the .htaccess file?
@welshnoonoo: Glad you found it useful. You are right in that if they change their IPs, or add other datacenters from which they spider sites, the .htaccess will need to be updated.
I’ve updated the post though with more information on blocking Keyword Spy and Spy Fu via domain name as well as IP, and preventing others from viewing your .htaccess file. See the sample .htaccess file for format.
Wes,
Just to be sure I’ve got this right:
If I wanted to block spy bots from other competing services (such as iSpionage, PPC Bully/MyAdWise, Hexatrack, Affiliate Elite) as they crop up, is the process to add to the htaccess file:
1) deny (their URL)
2) deny (their IP)
3) RewriteCond %{HTTP_REFERER} (their URL) [NC,OR]
Look right?
Best and thanks again for an awesome post,
Alan
If you can track down the IP blocks each of the services use, then yes…just continue to add them into your .htaccess file.
If you email me any updates you find, I’ll surely update this post with the new info.
I know this mod works though, as the spy tools once had one of my campaigns, and now its nowhere to be found. Chaching!
awesome! i’ve been looking for this for a while. thanks!
Hi Wes,
Thanks for your reply!
I added the following to my file to try to block 3 of the other spy tools:
deny from 65.39.221.16
deny from 66.39.157.106
deny from 208.73.48.154
deny from ppcbully.com
deny from affiliateelite.com
deny from ispionage.com
RewriteCond %{HTTP_REFERER} ppcbully\.com [NC,OR]
RewriteCond %{HTTP_REFERER} affiliateelite\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ispionage.com\.com [NC,OR]
That’s the idea, though I’m not too sure about Affiliate Elite. Since Affiliate Elite is a desktop tool, it runs from whatever the IP of the user who is using it. The actual affiliateelite.com website is simply the site that sells the software.
Keep sending me any new IPs/Domains you find though. I wanna keep this .htaccess file updated against all the spy tools we can find.
The keywords that I am using for my campaigns come from Google AdWords. In your example you are only blocking the keyword Spytools the access to my advertising site. But couldn’t they still see my keywords by accessing the Google API?
If the tools have access to some sort of API backend, then yes perhaps they could still get some of your data. But I know alot of them rely on spidering your site, as this can be seen in the weblogs. The best thing in addition to blocking with .htaccess that I can suggest is that you run your campains with multiple domains so no seeable patterns can be found in your marketing.
Wes,
Great tip. People should be concerned about protecting their cash cows.
wes,
here’s two more spybots to add to your list:
www ispionage com
www adtextgenerator com
can anybody create a script that we can put in our landing pages so we can trap spybots when they visit? or are the logs good enough?
thanks for this very valuable info.
manuel
here’s another one – http://www.myppccompetitor.com which redirects to http://www.keywordcompetitor.com
I will appreciate if you provide more details on this. Thanks.
What more details did you want explained?
Cool tip, Wes. Should an [NC,OR] follow keywordspypro also? From your post:
RewriteCond %{HTTP_REFERER} keywordspy\.com [NC,OR]
RewriteCond %{HTTP_REFERER} keywordspypro\.com
RewriteCond %{HTTP_REFERER} spyfu\.com [NC,OR]
Will this work on Brad Callen’s PPC Web Spy Firefox Plug-in?
Awesome! Thank you for this information. This is helpful in many ways.
@Sean: Not 100% sure about the PPC Spy plugin. I’ve used the plugin, but haven’t explored it enough to be certain.
But from what I’ve noticed…it looks like it is scraping data the exact same way as every other ppc spy tool. So if someone can figure out what its footprint is…it can probably be blocked.
Oh I like this article! Damn those spies.
Perfect info.
Cheers,
N
Me too! I’ve been noticing that these have been all over my site lately, and my sales have been slumping.
Can anything be done to threaten these services? I mean on most websites this would be a violation and prohibited behavior- In my terms and conditions noone is to use the website for commercial purposes.
Any ideas?
Does anyone know if this is how Compete.com or any of the similar keyword purchasing sites work? Does anyone know how to block Compete.com from collecting keyword data?
Any advice is greatly appreciated.
thanks,
Matt
I added compete in here:
RewriteCond %{HTTP_USER_AGENT} ^.*(keywordfinder|wordbutler|ispionage|affiliateelite|ppcbully|compete|keywordcompetitor|myppccompetitor|rapidkeyword|winhttp|HTTrack|clshttp|archiver|keywordtracker|keywordhunter|keywordspy|keywordspyhunter|loader|email|harvest|extract|grab|miner).* [NC,OR]
also:
deny from compete.com
@aj: thanks for that. I think that will help also block alot of the spy tools.
In my experience with spy tools, Compete is the one that seems to be most accurate and can harm you the most if someone mines your data. I’ve tried some others, like KeywordSpy, but most times the data is so massive, it really doesnt provide useful information. I got more out of the free version of Compete than the paid version of Keyword Spy.
Wes,
What an excellent tutorial. This could’ve been a $97 ebook with a bit of padding. Thanks for sharing
You might find this resource helpful:
http://www.yougetsignal.com/tools/web-sites-on-web-server/
It’s a free reverse IP lookup service.
Enter the main domain of the keyword spiders and get all the other domains hosted at that server, as well as the IP address.
This should help build a comprehensive list and stop any IP’s/domains slipping through the net.
Cheers,
George
Does RewriteRule .* – [F] forbid all other http referers? If so, won’t my visitors from ppc get a 403 too?
No. You’re using that rule in conjunction with first checking the HTTP referrers. For example:
RewriteCond %{HTTP_REFERER} spyfoo\.com [NC,OR]
RewriteRule .* – [F]
You’re saying “if referrer = spyfoo, then error 403″. All my sites have this setting and I havent had any issues with PPC visitors being blocked.
hi Wes, can you put up a updated sample .htaccess file with the new IPs discovered?
Anyone knows where these ips from:
174.37.52.37
64.81.44.188
hi Wes, could you post a updated script with the new ips that we want to block?
I think i got spammed by this ip 174.37.52.37
Greetings Wes,
Loving your site: There’s some really bang-on-point content on here which covers many of the exact questions I have been generating recently on my mission to conquer CPA marketing!
I find the above post particularly fascinating and also very reassuring to know that there is at least some way that we can try to defend our hard work against the plethora of Spy Tools out there.
I found an interesting long running thread on the subject of .htaccess and bad robots named “A close to perfect .htaccess file” which ran for around 3 years (2001 to 2004 I think) and a nice piece of code derived from this here:
http://www.javascriptkit.com/howto/htaccess13.shtml
(How useful do you think this would still be today?)
It would certainly be very useful, to a lot of people I believe , to get a similar sort of thread going which relates specifically to guarding against PPC spy tools and is kept up to date regularly with as many IP address/site names as possible, listed for blocking purposes.
This looks to me like the start of something great, so thanks for the info, and anything I can do to help or contribute, just let me know..
Warm regards, Matt.
Wes,
This is exactly the information I need. Thanks!
It was discouraging to think there are so many spy tools out there, so at least this gives some protection to blocking out them bots.
By the way, there is also gcdetective.com
(Google Cash Detective) which you should add to your list.
Found this on Warrior Forum
“I did find another way to block them that should work which is using the Google ip exclusion tool in your Adwords account to block their bot ip’s, assuming you can get a hold of all the spy tool ip’s…
So I guess I would try using both .htaccess and the Google ip exclusion tool as an extra security measure…”
Hi Wes.
Fantaastic post.
I spend countles HOURS well into the early mornings doing keyword research and building lists for PPC, and crunching numbers on my returns. and that is my business my life!! sadly ! So Im thrilled to be able to block every darn data miner from my lists !
)
Thanks
Great Blog!
right
back to my lists….
oops… sorry
wrong way.
(wrong door!)
(that was the womens pre-natal class in there!)
Bye again..
tip-toes out quietly……
Forgive me o Masterless One..
We all know the Legend of the Ultimate .htaccess Block List..
But where to begin on our Quest to discover it..?
Please share with us your most valuable insight.
back again..
(soaking wet with bits of seaweed hanging off, and soggy sweater draping along)
you shoudda said that was the exit to the harbour.!
.anyway.,(coughs and pull a small fish from mouth)
Ive been over to webmaster world (short taxi ride £9.90)
check out this thread
http://www.webmasterworld.com/forum92/205.htm
# this ruleset is for unwanted useragents… possibly email harvesters
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Browse\s[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Eval[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Surf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Harvest [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ^.*libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LWP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*prospector[NC,OR]
RewriteCond %{HTTP_USER_AGENT} AsiaNetBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ASSORT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} attache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ATHENS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} autohttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bew [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} curl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} devsoft’s\ http\ component [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Deweb[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Digimarc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Digger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} digout4uagent[NC,OR]
RewriteCond %{HTTP_USER_AGENT} DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DISCo[NC,OR]
RewriteCond %{HTTP_USER_AGENT} dloader(NaverRobot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ecollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Educate\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf[NC,OR]
RewriteCond %{HTTP_USER_AGENT} EO\ Browse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} fastlwspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FEZhead[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Fetch[NC,OR]
RewriteCond %{HTTP_USER_AGENT} FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Franklin\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Full\ Web\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Getleft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetURL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetWebPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Gozilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go-ahead-got-it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTML\ Works [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IBM_Planetwide [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Sucker[NC,OR]
RewriteCond %{HTTP_USER_AGENT} IncyWincy[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Industry\ Program[NC,OR]
RewriteCond %{HTTP_USER_AGENT} InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Explore\ 5\.x [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Irvine [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} KWebGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} leech[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MCspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missauga\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Monster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla.*NEWT[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/3\.0\.\+Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/3.Mozilla\/2\.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/4\.0$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozzilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} netattache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetCarta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NICErsPRO[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OpaL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Openfind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OpenTextSiteCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PackRat [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pavuk[NC,OR]
RewriteCond %{HTTP_USER_AGENT} pcBrowser[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Plucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Production\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Program\ Shareware [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PushSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ReGet[NC,OR]
RewriteCond %{HTTP_USER_AGENT} RepoMonkey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Rover[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Rsync[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ScoutAbout [NC,OR]
RewriteCond %{HTTP_USER_AGENT} searchterms\.it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} semanticdiscovery[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Shai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitecheck[NC,OR]
RewriteCond %{HTTP_USER_AGENT} SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SmartDownload[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Spegla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SpiderBot[NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperHTTP[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SurfWalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tarspider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Telesoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Templeton[NC,OR]
RewriteCond %{HTTP_USER_AGENT} UtilMind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} w3mir[NC,OR]
RewriteCond %{HTTP_USER_AGENT} web.by.mail [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebBandit[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebEMailExtrac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebMiner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSauger[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSnake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webvac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webwalk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebZIP [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WhosTalking [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Widow[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WUMPUS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} www\.pl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} XGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Zeus.*Webster[NC]
#RewriteCond %{HTTP_USER_AGENT} test[NC]
RewriteCond %{REQUEST_URI}!^/badUA\.html [NC]
RewriteRule .* /badUA.html [L,E=HTTP_USER_AGENT:BAD_USER_AGENT]
# this ruleset is to stop blank user agents with blank referrers
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* /cgi-bin/noagent.cmd [L,T=application/x-httpd-cgi]
lot of info there on blocking un wanted bots
a;lso
can you explain a bit on this command:
aj says:
April 10, 2009 at 9:50 AM
I added compete in here:
RewriteCond %{HTTP_USER_AGENT} ^.*(keywordfinder|wordbutler|ispionage|affiliateelite|ppcbully|compete|keywordcompetitor|myppccompetitor|rapidkeyword|winhttp|HTTrack|clshttp|archiver|keywordtracker|keywordhunter|keywordspy|keywordspyhunter|loader|email|harvest|extract|grab|miner).* [NC,OR]
are the vertical pipes simply seperators? and is this just blocking domain ranges with these words in domain ?
also….
myppccompetitor.com 97.74.215.46 (206 domains on this IP) do u have to list all 206 domains, or just block the IP?
simurally (god my spellin)
compete.com 66.151.234.20 (ALL)
(————-
competeinc.com
data.compete.com
datahub.compete.com
grapher.compete.com
home.compete.com
lists.compete.com
my.compete.com
referralanalytics.compete.com
searchanalytics.compete.com
siteanalytics.compete.com
snapshot.compete.com
toolbar.compete.com
tools.compete.com
http://www.compete.com
http://www.consumerinput.com
http://www.everytimeidie.com ———)
all these domains from compete. !
regards
,
Back again.!
I have uploaded my new .htaccess
chmod to 644
now on my website I get a page “Fedora Core Test Page”
“This page is used to test the proper operation of the Apache HTTP server after it has been installed. If you can read this page, it means that the Apache HTTP server installed at this site is working properly.”
thing is my website has gone , and its displaying this page instead.
anyone know if this is due to error in my .htaccess
or
has my .htaccess triggered Apache fedora thing on my server? I run Linux. manage through plesk. and cant find anything in the help files.
Is this in the htaccess somethin gto do with it
RewriteEngine on
??????????
I have narrowed it down to that its to do with MY
deny from 00.00.00.00 <-IP ENTRYs
RewriteCond %{HTTP_REFERER} domains\.com [NC,OR]
works fine with Wes htaccess sample posted here.
but when I add some of my own IP's and domains of a list of known ppc spies – domains and IPs.
the whole thing stops working.
do you have to have a corrsponding RewriteCond per each deny domain ? ??
any ideas
many thanks
sorted i reckon.
had to type each line in individually.
I guess whats happened is pasting from notepad did something i could not see.
so tip for everyone is to type each line in, and not go at it guns blazing copy n paste a big list in at once.
Upload the sample .htacess Wes offers here, then add lines from that. checking now and then it still works and dont mess up ur server config.
Right…im firewalled !!
Back to list building and ppc
hahaha
Thanks
Hi I have tried to get the above sample .htaccess to work.
after adding a few of my own , I just got some default apache page instead of my website.
Its working now, but could you enlighten me on the significance of
RewriteCond %{HTTP_REFERER} keywordspy\.com [NC,OR]
RewriteCond %{HTTP_REFERER} keywordspypro\.com
RewriteCond %{HTTP_REFERER} spyfu\.com [NC,OR]
keywordspypro\.com <— does not have [NC,OR]
When I put the [NC,OR] on this line as all the others , thats when it messed up my website and give me the apache page.
here is another to go on the list: Adgooroo. they look pretty serious, so i want to block them.
M
Figured it. the last RewriteCond. must not have the [NC,OR]
the NC is no capitals – either with/without capitals. and the OR I guess is or command to next line. so Omit the ,OR
M
Wow, its been a year since I wrote this post and have gotten quite the nice feedback. Thank you all.
I’ll put on my todo list that I need to update this post a bit with any new techniques I’ve discovered for blocking bots and provide an updated .htaccess list.
Everyone has different server configurations so the code examples I provided may not work for everyone in all cases. You’re best bet if something is working is the good ol’ Google search.
Does this method prevent alexa from showing any stats for our site, and how about sites that trace backlinks, namely backlink watch (dot) com