Jos Kirps's Popular Science and Technology Blog
July 31, 2008
On Monday a
new search engine has been launched, it's named
Cuil but pronounced Cool (huh?) and it got a lot of really bad reviews which made me really happy.
So why do I hate Cuil?
Well, a few months ago we started noticing
more and more traffic on our company websites, caused by a
crawler called Twiceler. Twiceler was run by a company called "Cuil" and claimed to be some kind of an experimental search engine robot. A few days later the same crawler also started affecting my personal websites.
The Twiceler bot is probably
the most stupid crawler I've ever seen, it just downloads everything it can find and it seems that it just won't ever stop. If there's a page using dynamic input in a URL (a calendar for example) it will download the same page 100,000 and more times, simply by following all kinds of dynamic links it can find without using any kind of intelligent limitation.
By downloading thousands of pages per hour on each website it can cause an
incredible traffic on a server, and dynamic scripts (written in Perl, Python or PHP for example) start causing an
immense CPU load that may even take your entire server down (as reported by several webmasters). Twiceler is really harmful and can cost both money and downtime. A well written crawler such as Googlebot or Slurp (Yahoo) would never affect a website in such a malicious way.
After googling for Twiceler we found out that many webmasters experienced such problems with Cuil. Of course we thought that such a crappy crawler - which
doesn't seem to care about similar content, website performance, bandwidth and traffic costs - had to be some kind of a malicious spam bot.
As the stupid Cuil/Twiceler bot just won't stop the first thing you'll do as a webmaster or system administrator is setting up a robots.txt file which tells Twiceler not to index any more pages (or at least blocks some of the directories that shall not be indexed, such as dynamic scripts for example).
Cuil claims that their Twiceler crawler respects the robots.txt file, but even days after setting it up nothing changed, the damn bot continued indexed anything it could get and
completely ignored all robots.txt rules (google for Twiceler and you'll see that this is what other webmasters are experiencing too).
So finally we
blocked the entire Cuil bot on our servers, just as many other people recommend in webmaster forums. On our company servers we blocked all incoming connections that could be identified as a Cuil/Twiceler bot, on my personal websites I blocked all of Cuil's IP addresses using .htaccess files.
It was a funny moment when the Cuil search engine went live on Monday and they claimed to have
the world's biggest index. Of course they have! Their damn bot seems to be indexing each dynamic web page a million times, no matter if it's always the same content of if you're clearly saying that this page should not be indexed at all (via robots.txt).
Maybe this also explains the
poor quality of their search results - their index may be the largest on this planet, but
it's probably full of crap and duplicates.
If you're a webmaster/website owner and you're currently experiencing high bandwidth or traffic problems, then you should check your access_log because there's a good chance that your problems are caused by Cuil. If this is the case
I can just recommend to block all of Cuils IP addresses on your server because that seems to be the only thing that really works.
To finish I'd like say that I think Cuil should start focusing on the quality of their algorithms and their content instead of completely relying on the marking of doubtful numbers.
Bookmark This
permalink |
3 comments
July 30, 2008
Since I announced that I'll leave my former company people keep asking me about my future projects, and I'm glad to provide some first details now.
First of all, I'll still have a regular job as I'll be working as a teacher from mid September on. One very positive aspect of my new job will be that I'll have more free time than before, which will allow me to pursue some private projects too. So in future I'd like to focus on two hobbies during my free time: free software and multimedia.
When it comes to software, I'm talking about Open Source projects - like the
CorneliOS webOS and application framework for example, which has already been downloaded over 14000 times from Sourceforge. I think
CorneliOS is an amazing technology and it will definately serve as a basis for all of my future Open Source software projects. CorneliOS is still a quite experimental framework, so one of the first goals will be to make this an attractive tool for end users too. There's still a lot of work ahead, but I'm very confident that this goal can be achieved very soon.
What about multimedia? Years ago I owned a recording studio and I was running several music projects. During the past few years my job didn't allow me to pursue such interests, but I always planned to reactivate the studio one day. So one of the things I'm currently doing is rebuilding the recording studio. It will become a project oriented audio/video studio that I'll use for my own projects, but I definately won't record any bands in my basement. There are tons of unsuccessful studios of that kind and I don't think it would make any sense to add another one.
So this is basically what I'm up to, more details will become available soon.
Jos
Bookmark This
permalink |
0 comments
July 10, 2008
A few days ago I wrote about changes, so today I'd like to announce that I will be leaving the EducDesign company in August 2008 to pursue other interests, which also means that I will no longer be involved in the development of the OLEFA software. In mid September I will start working as a teacher in the community of Sanem (Luxembourg, Europe).
These changes do not affect any of the Open Source software projects I'm involved in - there may have been fewer updates during the past few weeks as I had lots of things to do, but everything should normalize within short time. In fact I may even have more time to focus on these in the future.
Further information will become available soon, so make sure to check back from time to time...
Jos
Bookmark This
permalink |
0 comments
July 01, 2008
Exactly one year ago a very special community website was officially launched.
On July 1st 2007, Galaxiki started as something really new: it's a wiki based science fiction galaxy consisting of millions of stars, planets and moons that can be edited by its community.
Each star, each planet and each moon is represented by an editable wiki page, and the solar systems are all part of an online galaxy that can be explored using a galactical map.
Galaxiki instantly got a lot of positive reviews. It was "website of the day" on Yahoo, About.com, Pocketlint and RedOrbit. It was also community website of the week in Linux Journal.
Today the Galaxiki community consists of more than 2700 users, there are thousands of stars, planets and moons that have been edited, creating an entire new online world. Community members are writing sci-fi stories and publish them as part of stellar histories or they get posted in the community blog.
In April 2008 Galaxiki also published an exclusive interview with british actor David Prowse, who is best known for his role as Darth Vader in the original Star Wars trilogy.
A lot of improvements and new features are planned to become available within the next few months, and the Galaxiki community is looking forward to continue expanding this amazing fictional universe.
Visit the Galaxiki community site:
www.galaxiki.org
Bookmark This
permalink |
0 comments
June 30, 2008
There have to be changes from time to time, and I think this year it's time for some *major* changes. I can't reveal any details today, but I think I'll have to announce some important news within the next few weeks.
During the past few weeks I didn't have much time to post news in my blog or to keep my website up to date, things should normalize very soon.
Stay tuned...
Jos
Bookmark This
permalink |
0 comments
Now displaying
5 of
94 blog entries on page
1 of
19Select a page:
next |
last
©1996-2007 Jos Kirps | All rights reserved |
Site Map|
rate