Why query strings in urls drive Googlebot and other search engine crawlers insane

“We have to reinvent the wheel every once in a while, not because we need a lot of wheels; but because we need a lot of inventors.”
- Bruce Joyce

I wrote about my experience writing a site crawler in php in an earlier post, and I’m going to use some of the background there to make my point here. So it might help to go read it if you haven’t already.

[Google’s crawler [Googlebot] isn’t that sophisticated/writing a crawler in php]

From my casual observation of the way Googlebot crawls some of the sites I work on, I have reached the conclusion that it works in much the same way that a crawler I wrote a year ago worked.

Google bot goes page to page, gathering links from your page and tacking them onto the current url that it is at, right then. So why do query strings give it such a problem? Continue reading

Google’s crawler [Googlebot] isn’t that sophisticated/writing a crawler in php

I spent a lot of time early last year, trying to write a crawler in php (I know, I know).

It was supposed to sit on the server and when so that when you went to the url, it’d generate a google sitemap for your entire site.

What I found out was that writing a good crawler is very hard work. Not because of the recursion involved, but because of the infinite ways link tags appear.

Now Google has validated my experience (more on this in a second).

Continue reading

The right way to update software for your customers by Firefox

I consider myself a power user of windows xp, so why haven’t I upgraded from winamp 5.35 to winamp 5.52?

After all, every single time I start winamp it bugs me to.

winamp update available!

The answer is simple … Its because I’m lazy.

I’m not going to go to winamp.com, try to figure out which version to download and then actually install it over again, just so winamp runs exactly the same as it did before! No way.

But, if the program went out there got the update and installed it for me … I wouldn’t object.

Firefox does this right.

An update for firefox is available

When an update is available, it goes out and finds it for me. If I okay it, it installs the update for me and restarts my browser, putting me back viewing the page I was looking at before … like nothing happened. All I have to do is hit “Download & Install Now”. How easy is that?

downloading and updating Firefox

Nag screens/prompts/dialogs are very annoying. My natural instinct is to close them and get on with my life.

In that scenario, everybody loses.

So if you write software, you should strive to have it update automatically, if you possibly can. That would definitely be a selling point for me as your customer. (Hear that Blumenthals software?)

PS: Most software (including wordpress) does require you to go download and install the newest versions. Since automatically updating software is so rare it could be a killer feature if you incorporated it into your software.

Apparently the creator of Ruby on Rails doesn’t comment his code … kinda

Here’s some excerpts from DHH’s post and comments yesterday on 37 signals

  • The short answer is that we don’t document our projects. At least not in the traditional sense of writing a tome that exists outside of the code base that somebody new to a project would go read …
  • Further more, I don’t really find it necessary for the kind of work that we do. Our biggest product, Basecamp, is about 10,000 lines of code. That really isn’t a whole lot in the grand scheme of things. Everything we do is build is also using Ruby on Rails, which means that most Rails programmers would know their way around our applications straight away. It’s the same conventions and patterns used throughout.
  •  Finally, we write our application in a wonderfully expressive and succinct programming language like Ruby that leads itself very well to a programming style like the one Kent Beck preaches in Smalltalk Best Practice Patterns. Keep your methods short and expressive. On average, our models have methods just four lines long. Adding documentation to a method should usually only be done when you’re doing something non-obvious that can’t be rewritten in an obvious way.
  • [comment] Wim, yes there’s RDoc. I just generally don’t use it for projects. When methods are only an average of 4 lines long written in a language like Ruby, it’s often faster and better to merely browse the code base rather than rely on explicit commenting.

Keep in mind that I’m no Ruby on Rails genius, and from the little I’ve done I can see where DHH is going with this. But I’ve always thought that this argument of a language being so succinct and clear that you don’t have to write comments is just a bit silly for a couple of reasons.

  • I believe that you don’t write code for machines, you write code for people (other developers). So any help you can give them in navigating your code is typically good to have. It saves them time and their employers money … that is what being a great consultant is about, you have to be thinking in terms of how to help your clients’ business and saving them money falls in that category.
  • People who use this line of argument are either too lazy to comment and are trying to justify it …
  • … or don’t understand that there are developers of all skill levels in the industry. So whereas, someone with your skill level would be able to navigate your code quickly, someone who wasn’t as good might take longer …why not avoid that.

Note, that I’m not of the school of thought of commenting just for the sake of it, like I’ve heard some “blub programmers” do. However, I do think that you should always be thinking of other developers when you code and if commenting can get them to a point where they can modify your code in 1 minute instead of a minute and a half … then you should comment.

In the end, I guess its a bit unfair to criticize DHH, because its not clear that he doesn’t comment his code much … though its easy to infer that. I just know from my experience that people who say things like he says have a tendency to have 3 lines of comments in some piece of code 500 lines long.

But if you’re a “rockstar developer, I guess everyone has to dance to your tune, wherever you are right?

Bad UI design no more … Facebook re-designs their search box …

Boy do I feel special! Regulars will remember that I blogged about the bad ui choice that was made with Facebook’s omnipresent search box a few weeks back.

Well, I went into Facebook today and was prattling off about one thing or the other when my eyes fell on this.

Facebooks new search box

this is what it looked like as of January 9th 2008.


As you can see, it would seem that they took my recommendation to add an actual clickable button to the search box, so that mobile phone users could actually use it.

Of course I kid (kinda) about about being responsible for the change. After all why would company valued at $200bn listen to a lowly Austin Web developer with a blog and an opinion. I don’t know for sure, but I’ll claim it … if only for the fact that no one else is:) Steven Colbert did it with Mike Huckabee’s success in Iowa, and I’m doing it here.

Good job Facebook!!!
[Now, if they would only allow embeddable videos in Facebook notes ....]

Why php_value directives for php.ini set in .htaccess fail when php is running as cgi or fcgi

I was trying to take advantage of PHP5′s new auto_prepend_file directive today, by using the php_value directive to set it in a .htaccess file. But as soon as I did that, my cheerfully running application puked and died, with the familiar message.

“Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request”

I had seen this behavior before, when I was writing an app for a client a few months ago, but I hadn’t had time to investigate it. Today I decided to go a-googling and I promptly found my answer

Those are Apache directives, but in CGI mode Apache calls the php binary, which turn reads php.ini. Since the binary doesn’t read httpd.conf it has no effect on PHP. As PHP isn’t loaded into Apache, Apache doesn’t know what to do with the directives and borks.

How to run both mysql5 and mysql4 on the same windows Machine.

I finally made the jump from mysql 4 the other day, to take advantage of mysql’s new “INSERT … ON DUPLICATE UPDATE” command (which I think is spectacular by the way).

I didn’t want to actually upgrade from mysql4 to mysql5, just run the two mysqls side by side … so I looked around for a bit and figured out that all you need to do is get both  servers running on different ports. The trick is to remember to reference localhost:port (eg: “localhost:3307″) instead of just “localhost” whenever you connect to it, in php, for example. Below are screenshots of the things you need to watch out for when doing this on a windows machine. Continue reading

Ruby reddit and top ten list of Ruby gems!

Quick note to say that  I just discovered the brand new Ruby Reddit which is really fortuitous now that I’m spending a lot of time with Ruby on Rails. From there I found the “Ten Essential Ruby Gems” … which I am installing and screwing with as I type :]

Many thanks  to Reginald Braithwaite who runs the immensely popular software development blog Raganwald for his post about the Ruby Reddit.