|
|
|
| Здравствуйте.
Подскажите, пожалуйста, каким образом можно отсечь статистику просмотров от поисковых систем, которые индексируют страницы сайта? Например, адреса, которые начинаются с 74.X.X.X - вроде, принадлежат поиск. системе yahoo. Есть ли такие базы по всем системам, чтобы их можно было не учитывать на сайтах как просмотр страниц? Ведь сами счетчики посещаемости их не учитывают?
С уважением. | |
|
|
|
|
|
|
|
для: jaroslav
(07.08.2008 в 10:50)
| | >...Например, адреса, которые начинаются с 74.X.X.X - вроде, принадлежат поиск. системе yahoo...
О_о
У yahoo >16 миллионнов айпишников? )))) | |
|
|
|
|
|
|
|
для: jaroslav
(07.08.2008 в 10:50)
| | Могу предложить список роботов
Список роботов поисковых систем (определяем по HTTP_USER_AGENT), GoogleBot, Yandex, Rambler Crawler и т.д.
|
3.3.vscooter
AITCSRobot/1.1
AO/A-T.IDRG v2.3
ASpider/0.09
ATN_Worldwide
AURESYS/1.0
Ahoy! The Homepage Finder
AlkalineBOT
AnthillV1.1
Arachnophilia
Araneo/0.7 (araneo@esperantisto.net; http://esperantisto.net)
ArchitextSpider
Atomz/1.0
BSpider/1.0 libwww-perl/0.40
BackRub
BaySpider
Big Brother
Bjaaland/0.5
BlackWidow
CACTVS Chemistry Spider
CMC/0.01
Calif/0.6 (kosarev@tnps.net; http://www.tnps.dp.ua)
Checkbot/x.xx LWP/5.x
ComputingSite Robi/1.0 (robi@computingsite.com)
CoolBot
Cusco/3.2
CyberSpyder/2.1
DIIbot
DNAbot/1.0
DWCP/2.0
Deweb/1.01
Die Blinde Kuh
Digger/1.0 JDK/1.3.0
Digimarc CGIReader/1.0
Digimarc WebReader/1.2
DragonBot/1.0 libwww/5.0
Duppies
EIT-Link-Verifier-Robot/0.2
EMC Spider
ESIRover v1.0
ESISmartSpider/2.0
EbiNess/0.01a
Emacs-w3/v[0-9\.]+
Evliya Celebi v0.151 - http://ilker.ulak.net.tr
FAST-WebCrawler
FastCrawler 3.0.X (crawler@1klik.dk) - http://www.1klik.dk
FelixIDE/1.0
Fish-Search-Robot
Freecrawl
FunnelWeb-1.0
GetURL.rexx v1.05
Gigabot/1.0
Golem/1.1
Googlebot/2.1
Googlebot/2.X (+http://www.googlebot.com/bot.html)
Gromit/1.0
Gulliver/1.1
Gulliver/1.2
Gulliver/1.3
Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot)
HKU WWW Robot,
HTMLgobble v2.2
HTTP Fromyes
Hometown Spider Pro
Hдmдhдkki/0.2
I Robot 0.4 (irobot@chaos.dk)
IAGENT/1.0
IBM_Planetwide,
INGRID/0.1
IncyWincy/1.0b1
InfoSeek
InfoSeek Robot 1.0
InfoSpiders/0.1
Informant
Infoseek Sidewinder
Internet Cruiser Robot/2.1
Iron33/0.0
IsraeliSearch/1.0
JBot (but can be changed by the user)
JCrawler/0.2
JavaBee
Jeeves v0.05alpha (PERL, LWP, lglb@doc.ic.ac.uk)
JoBo (can be modified by the user)
Jobot/0.1alpha libwww-perl/4.0
JoeBot/x.x,
JubiiRobot/version#
KDD-Explorer/0.1
KIT-Fireball/2.0 libwww/5.0a
KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html)
Katipo/1.0
LWP
LWP::
LabelGrab/1.1
LinkScan Server/5.5 | LinkScan Workstation/5.5
LinkWalker
Linkidator/0.93
Lockon
Lycos
Lycos_Spider_(T-Rex)
MOMspider/1.00 libwww-perl/0.40
Magpie/1.0
MediaFox/x.y
MerzScope
MindCrawler
Monster/vX.X.X - ()
Motor/0.2
MwdSearch/0.1
NEC-MeshExplorer
NHSEWalker/3.0
Nederland.zoek
NetCarta CyberPilot Pro
NetMechanic
NetScoop/1.0 libwww/5.0a
Nomad-V2.x
NorthStar
Nutscrape/1.0 (CP/M; 8-bit)
Occam/1.0
Open Text Site Crawler V1.0
Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)
Orbsearch/1.0
PGP-KA/1.2
PackRat/1.0
PageBoy/1.0
ParaSite/0.21 (http://www.ianett.com/parasite/)
Patric/0.01a
Peregrinator-Mathematics/0.7
PerlCrawler/1.0 Xavatoria/2.0
PiltdownMan/1.0 profitnet@myezmail.com
Pioneer
PlumtreeWebAccessor/0.9
Poppi/1.0
PortalBSpider/1.0 (spider@portalb.com)
PortalJuice.com/4.0
RHCS/1.0a
Resume Robot
Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl)
Robbie/0.1
RoboCrawl (http://www.canadiancontent.net)
Robofox v2.0
Robot du CRIM 1.0a
Robozilla/1.0
Roverbot
RuLeS/1.0 libwww/4.0
SG-Scout
SLCrawler
SafetyNet Robot 0.1,
Scooter/1.0
Scooter/2.0
Scooter/2.0 G.R.A.B. V1.1.0
Senrigan
Shagseeker at http://www.shagseek.com /1.0
Site Valet
SiteTech-Rover
Slurp/2.0
Snooper/b97_01
Snoopy v1.01
Solbot/1.0 LWP/5.07
Spanner/1.0 (Linux 2.0.27 i586)
Speedy Spider ( http://www.entireweb.com/speedy.html )
SpiderBot/1.0
SpiderMan 1.0
StackRambler/2.0
TITAN/0.1
TLSpider/1.1
Tarantula/1.0
TechBOT
Templeton/{version} for {platform}
TitIn/0.2
TurnitinBot/1.5 (http://www.turnitin.com/robot/crawlerinfo.html)
UCSD-Crawler
URL Spider Pro
UdmSearch
UdmSearch/2.1.1
Ultraseek
VWbot_K/4.2
Valkyrie/1.0 libwww-perl/0.40
Verticrawl
Victoria/1.0
Voyager/0.0
W3M2/x.xxx
WOLP/1.0 mda/1.0
WWWWanderer v3.0
WallPaper/[current version number] (Win[???];
WebBandit/1.0
WebCatcher/1.0
WebCopy/(version)
WebFetcher/0.8,
WebLinker/0.0 libwww-perl/0.1
WebMoose/0.0.0000
WebQuest/1.0
WebReaper [webreaper@otway.com]
WebWalker/1.10
WebWatch
Wget/1.4.0
XGET/0.7
Yandex/1.01.001 (compatible; Win16; I)
Yandex/1.01.001 (compatible; Win16; M)
Yandex/1.03.000 (compatible; Win16; M)
YandexSomehing/1.0
aWapClient
appie/1.1
arks/1.0
borg-bot/0.9
cIeNcIaFiCcIoN.nEt Spider (http://www.cienciaficcion.net)
combine/0.0
conceptbot/0.3
cosmos/0.3
dienstspider/1.0
dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/)
elfinbot
esther
explorersearch
fido/0.9 Harvest/1.4.pl2
gammaSpider xxxxxxx ()/
gazz/1.0
gcreep/1.0
gestaltIconoclast/1.0 libwww-FM/2.17
grabber
griffon/1.0
havIndex/X.xx[bxx]
htdig/3.1.0b2
iajaBot/0.1
image.kapsi.net/1.0
inspectorwww/1.0 http://www.greenpac.com/inspectorwww.html
jumpstation
larbin (+mail)
legs
libwww-perl-5.41
logo.gif crawler
marvin/infoseek (marvin-team@webseek.de)
moget/1.0
mouse.house/7.1
newscan-online/1.1
phpdig
psbot/0.X (+http://www.picsearch.com/bot.html)
root/0.1
searchprocess/0.9
spiderline/3.1.3
ssearcher100
straight FLASH!! GetterroboPlus 1.5
suke/*.*
suntek/1.0
tarspider
teoma_agent1 [teoma_admin@hawkholdings.com]
urlck/1.2.3
vision-search/3.0'
vscooter/2.0
w3index
w3mir
w@pSpider/xxx (unix) by wap4.com
web robot PEGASUS
weblayers/0.0
webs@recruit.co.jp
webvac/1.0
webwalk
whatUseek_winona/3.0
wired-digital-newsbot/1.5
wlm-1.1
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
StackRambler/2.0 (MSIE incompatible)
Gigabot/2.0
YaDirectBot/1.0
TurtleScanner/1.4 (compatible; Win16; S)
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) METASpider
Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)
Yandex/1.01.001 (compatible; Win16; I)
Yandex/1.01.001 (compatible; Win16; H)
Yandex/1.01.001 (compatible; Win16; P)
Yandex/1.02.000 (compatible; Win16; F)
Yandex/1.03.003 (compatible; Win16; D)
Yandex/1.03.000 (compatible; Win16; M)
YaDirectBot/1.0 (compatible; Win16; I)
Yandex/2.01.000 (compatible; Win16; Dyatel; C)
Yandex/2.01.000 (compatible; Win16; Dyatel; Z)
Yandex/2.01.000 (compatible; Win16; Dyatel; D)
Yandex/2.01.000 (compatible; Win16; Dyatel; N)
ia_archiver
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)
msnbot/1.0 (+http://search.msn.com/msnbot.htm)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
uaportalbot
|
Turtle Scanner (http://www.turtle.ru/) % TurtleScanner
Google Bot (http://www.google.com/) % Googlebot
Yandex Crawler (http://www.yandex.ru/) % Yandex/
Rambler Crawler (http://www.rambler.ru/) % StackRambler
FAST-WebCrawler (http://fast.no/) % FAST-WebCrawler
ASPseek % ASPseek
UdmSearch % UdmSearch
XWareCrawler % XWareCrawler
WatzNew Agent (http://www.watznew.com) % WatzNew
AlkalineBOT % AlkalineBOT
NetAnts % NetAnts
URL_Spider_Pro (http://www.innerprise.net/usp-spider.asp) %URL_Spider_Pro
CGIexpo.com Verifier % CGIexpo.com Verifier
Rumours-Agent % Rumours-Agent
asterias % asterias
nabot % nabot
Pockey-GetHTML/4.11.5 (Win32; GUI; ix86) % Pockey-GetHTML
InetURL % InetURL
WatzNew Agent % WatzNew Agent (www.watznew.com)
Altavista % Mercator-
Altavista % Scooter
AskJeeves % ask jeeves
Direct Hit % (Direct Hit Grabber)
Excite % ArchitextSpider
Excite % libwww-perl/5.33
FAST % fastlwspider
FAST % FAST-WebCrawler
Google % Googlebot/
IBM/Almaden % http://www.almaden.ibm.com/cs/crawler
IncyWincy % http://www.loopimprovements.com/robot.html
Infoseek % Infoseek Sidewinder/
Inktomi % Slurp.
Inktomi % Slurp/
Lycos % Lycos_Spider_
NorthenLight % Gulliver/1.3
NationalDirectory % nationaldirectory-webspider/
PicSearch % http://www.picsearch.com/bot.html
Moget (Japan) % moget@goo.ne.jp
Szukacz (Poland) % www.szukacz.pl/jakdzialarobot.html
OpenFind (TAIWAN) % Openfind data gatherer, Openbot
NaverRobot (www.naver.com) % NaverRobot
appie 1.1 (www.walhello.com) % appie 1.1
MSN Bot % http://search.msn.com/msnbot.htm
KM.RU Crawler (eStyleSearch) % eStyleSearch
KM.RU Crawler (eStyleSearch) % eStyleSearch
Yahoo Slurp (http://help.yahoo.com/help/us/ysearch/slurp)% Yahoo! Slurp
|
по им и вычислять... | |
|
|
|
|
|
|
|
для: serjinio
(07.08.2008 в 12:32)
| | Большое спасибо. | |
|
|
|
|
|
|
|
для: jaroslav
(07.08.2008 в 10:50)
| | В $_SERVER['HTTP_USER_AGENT']; храниться запись вида Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) у поисковых ботов (например у google) USER_AGENT выглядит както так: Googlebot/2.1 Узнайте как USER_AGENT выглядит у других поисковых систем и не учитываете их... | |
|
|
|
|
|
|
|
для: jaroslav
(07.08.2008 в 10:50)
| | О! И узнавать не надо =))) | |
|
|
|