Archive for February, 2008

Coding Horror On Beautiful code

I got around to reading this very insightful piece about truly beautiful code on coding horror today.

In it, Jeff Atwood talks about the problem with the book Beautiful Code, which is that it actually talks about code and not the ideas behind the code.

To put it succinctly in Jeff’s words …

“Ideas are beautiful. Algorithms are beautiful. Well executed ideas and algorithms are even more beautiful. But the code itself is not beautiful. The beauty of code lies in the architecture, the ideas, the grander algorithms and strategies that code represents”

Beautiful Code … the book

I just remember thumbing through the book at my local Barnes & Noble and not being enamored of it … I went on to spend 2 hours reading Obie Fernandez’s The Rails Way instead.

Add comment February 26th, 2008

Mysql5 runs with twice the memory of Mysql4 on Windows XP

just an interesting side note … stumbled upon this “discovery” by accident earlier today.

capture_27.Png

Add comment February 13th, 2008

Firefox 3 Beta 3 is here!

 Its here!

A couple of nice interface touches here and there. A bunch of bugs in Beta 2 still remain (trouble with Ajax posts, blinking cursor in weird places … like next to images or in h1 tags) but I’m still excited about it!

[Download it now!]

Download Firefox 3 Beta 3

Firefox 3 Beta 3 update!

Add comment February 12th, 2008

Why query strings in urls drive Googlebot and other search engine crawlers insane

“We have to reinvent the wheel every once in a while, not because we need a lot of wheels; but because we need a lot of inventors.”
- Bruce Joyce

I wrote about my experience writing a site crawler in php in an earlier post, and I’m going to use some of the background there to make my point here. So it might help to go read it if you haven’t already.

[Google’s crawler [Googlebot] isn’t that sophisticated/writing a crawler in php]

From my casual observation of the way Googlebot crawls some of the sites I work on, I have reached the conclusion that it works in much the same way that a crawler I wrote a year ago worked.

Google bot goes page to page, gathering links from your page and tacking them onto the current url that it is at, right then. So why do query strings give it such a problem?

The answer is simple. Imagine this url for an item that doesn’t exist anymore.

www. example.com/store.php?buyid=29&catid=12

When a crawler encounters this url and tests it to see if it returns a 404 … it doesn’t.

Why?

Because  www. example.com/store.php is usually still a valid page. It  won’t give the crawler an error, unless you explicitly code it to.

So the crawler now tosses  www. example.com/store.php?buyid=29&catid=12 onto its list of pages to be crawled. Can you see the disaster waiting to happen?

www. example.com/store.php?buyid=29&catid=12 and any other non-existent urls like it are basically just mapping to the still valid www. example.com/store.php but in the crawlers mind they are all different urls.

Now , if there are other urls on that page (store.php), like for related products for example. Google just takes the url and tacks it on to the url (it thinks) its at right now. So it winds up with

www. example.com/store.php?buyid=29&catid=12store.php?buyid=39&catid=11

It does that for every invalid query string url that has store.php in its base. It then goes back and crawls them again and now it has.

www. example.com/store.php?buyid=29&catid=12store.php?buyid=39&catid=11store.php?buyid=39&catid=11

The crawler is now in a tailspin … going around in circles trying to crawl your site. Chewing up your cpu cycles and generally being a nuisance.

I hope this helps you understand why Googlebot hates query strings so much.

I haven’t tried this yet, but I think it should be clear that making the base url of a query string  resolve to a 404 error will help it out a lot.

So as an example

www. example.com/store.php?buyid=29&catid=12 

should return a code 200/ok

and

www. example.com/store.php

should give a 404 error.

This is just my theory, I don’t know that It’d be practical.

PS: I hope this further helps you understand why search engine crawlers also hate PHP session ids on your content.

Add comment February 8th, 2008

Google’s crawler [Googlebot] isn’t that sophisticated/writing a crawler in php

I spent a lot of time early last year, trying to write a crawler in php (I know, I know).

It was supposed to sit on the server and when so that when you went to the url, it’d generate a google sitemap for your entire site.

What I found out was that writing a good crawler is very hard work. Not because of the recursion involved, but because of the infinite ways link tags appear.

Now Google has validated my experience (more on this in a second).

Just a couple of things I had to consider with my crawler were

  • I had to program it to look for a base tag so that I’d know if to treat the links as relative or absolute.
  • I had to check each link to figure out if it was an internal or external link so I’d know whether to crawl it.
  • Then I had to keep a running list of links crawled, so that I’d know if I had crawled a link before or not
  • I had to let the program know that if there was a “/” in a relative link, to let it know to substitute the domain for it.
  • knowing how to deal with “../” … this was a pain and a half
  • I had to let it know how to deal with mailtos, javascript, and improperly written urls like <a href=”www.concept47.com”> (more on this later) … etc

My crawler worked by gathering a list of links that it continually added urls to be crawled onto. As it crawled the urls, it put them in another array that each new url had to be crosschecked with before being added to the list to be crawled.

The problem  though,  is the way people write html markup. As many of you know, there are some nastily written pages out there … so if someone wrote <a href=”www.concept47.com”> or <a href=”concept47.com”> or even <a href=”screwyougoogle”> my crawler had to know not to add it to the list to be crawled.

This is very difficult to do correctly and for all the time I spent on it, there is no real way to deal with it. You could write special cases for <a href=”www.concept47.com”> but what about <a href=”ww.concept47.com”> or
<a href=”w.concept47.com”>  … see the problem?

Even though the urls give you a 404, they’ll make it onto your list to be crawled and waste the crawlers time. I felt like such a loser for not being able to figure out this issue, but it seems the Googlebot has the same problem.

Look at this. [click the image to make it bigger]

Googlebot has problems with bad link tags

This is from my webmaster tools console.

The problem here was that I had a link tag on one of my blog posts that went like this

<a href=”www.unfuddle.com”>

As you can see, even the mighty Googlebot didn’t pick up on the error. It just tacked the url onto the current url, it was at and went on about its business.

Validation!

Read the next in this series [Why query strings in urls drive Googlebot and other search engine crawlers insane]

Add comment February 8th, 2008

The right way to update software for your customers by Firefox

I consider myself a power user of windows xp, so why haven’t I upgraded from winamp 5.35 to winamp 5.52?

After all, every single time I start winamp it bugs me to.

winamp update available!

The answer is simple … Its because I’m lazy.

I’m not going to go to winamp.com, try to figure out which version to download and then actually install it over again, just so winamp runs exactly the same as it did before! No way.

But, if the program went out there got the update and installed it for me … I wouldn’t object.

Firefox does this right.

An update for firefox is available

When an update is available, it goes out and finds it for me. If I okay it, it installs the update for me and restarts my browser, putting me back viewing the page I was looking at before … like nothing happened. All I have to do is hit “Download & Install Now”. How easy is that?

downloading and updating Firefox

Nag screens/prompts/dialogs are very annoying. My natural instinct is to close them and get on with my life.

In that scenario, everybody loses.

So if you write software, you should strive to have it update automatically, if you possibly can. That would definitely be a selling point for me as your customer. (Hear that Blumenthals software?)

PS: Most software (including wordpress) does require you to go download and install the newest versions. Since automatically updating software is so rare it could be a killer feature if you incorporated it into your software.

Add comment February 7th, 2008

Apparently the creator of Ruby on Rails doesn’t comment his code … kinda

Here’s some excerpts from DHH’s post and comments yesterday on 37 signals

  • The short answer is that we don’t document our projects. At least not in the traditional sense of writing a tome that exists outside of the code base that somebody new to a project would go read …
  • Further more, I don’t really find it necessary for the kind of work that we do. Our biggest product, Basecamp, is about 10,000 lines of code. That really isn’t a whole lot in the grand scheme of things. Everything we do is build is also using Ruby on Rails, which means that most Rails programmers would know their way around our applications straight away. It’s the same conventions and patterns used throughout.
  •  Finally, we write our application in a wonderfully expressive and succinct programming language like Ruby that leads itself very well to a programming style like the one Kent Beck preaches in Smalltalk Best Practice Patterns. Keep your methods short and expressive. On average, our models have methods just four lines long. Adding documentation to a method should usually only be done when you’re doing something non-obvious that can’t be rewritten in an obvious way.
  • [comment] Wim, yes there’s RDoc. I just generally don’t use it for projects. When methods are only an average of 4 lines long written in a language like Ruby, it’s often faster and better to merely browse the code base rather than rely on explicit commenting.

Keep in mind that I’m no Ruby on Rails genius, and from the little I’ve done I can see where DHH is going with this. But I’ve always thought that this argument of a language being so succinct and clear that you don’t have to write comments is just a bit silly for a couple of reasons.

  • I believe that you don’t write code for machines, you write code for people (other developers). So any help you can give them in navigating your code is typically good to have. It saves them time and their employers money … that is what being a great consultant is about, you have to be thinking in terms of how to help your clients’ business and saving them money falls in that category.
  • People who use this line of argument are either too lazy to comment and are trying to justify it …
  • … or don’t understand that there are developers of all skill levels in the industry. So whereas, someone with your skill level would be able to navigate your code quickly, someone who wasn’t as good might take longer …why not avoid that.

Note, that I’m not of the school of thought of commenting just for the sake of it, like I’ve heard some “blub programmers” do. However, I do think that you should always be thinking of other developers when you code and if commenting can get them to a point where they can modify your code in 1 minute instead of a minute and a half … then you should comment.

In the end, I guess its a bit unfair to criticize DHH, because its not clear that he doesn’t comment his code much … though its easy to infer that. I just know from my experience that people who say things like he says have a tendency to have 3 lines of comments in some piece of code 500 lines long.

But if you’re a “rockstar developer, I guess everyone has to dance to your tune, wherever you are right?

Add comment February 6th, 2008

Bad UI design no more … Facebook re-designs their search box …

Boy do I feel special! Regulars will remember that I blogged about the bad ui choice that was made with Facebook’s omnipresent search box a few weeks back.

Well, I went into Facebook today and was prattling off about one thing or the other when my eyes fell on this.

Facebooks new search box

this is what it looked like as of January 9th 2008.

01-09-200802-47-34-pmcroppercapture.Png

As you can see, it would seem that they took my recommendation to add an actual clickable button to the search box, so that mobile phone users could actually use it.

Of course I kid (kinda) about about being responsible for the change. After all why would company valued at $200bn listen to a lowly Austin Web developer with a blog and an opinion. I don’t know for sure, but I’ll claim it … if only for the fact that no one else is:) Steven Colbert did it with Mike Huckabee’s success in Iowa, and I’m doing it here.

Good job Facebook!!!
[Now, if they would only allow embeddable videos in Facebook notes ....]

Add comment February 6th, 2008

How to get delete key to actually delete on a Mac running windows xp with bootcamp

On windows xp on bootcamp, the delete key actually does what the backspace key should do.

To get it to act like it should, just hit hold down [fn] then delete to your heart’s desire.

Add comment February 6th, 2008

Why php_value directives for php.ini set in .htaccess fail when php is running as cgi or fcgi

I was trying to take advantage of PHP5’s new auto_prepend_file directive today, by using the php_value directive to set it in a .htaccess file. But as soon as I did that, my cheerfully running application puked and died, with the familiar message.

“Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request”

I had seen this behavior before, when I was writing an app for a client a few months ago, but I hadn’t had time to investigate it. Today I decided to go a-googling and I promptly found my answer

Those are Apache directives, but in CGI mode Apache calls the php binary, which turn reads php.ini. Since the binary doesn’t read httpd.conf it has no effect on PHP. As PHP isn’t loaded into Apache, Apache doesn’t know what to do with the directives and borks.

Add comment February 5th, 2008

Humor on a microsoft website?

I’m being unfair here but I saw this today on  “Microsoft’s Channel 9 website” and I thought it was pretty funny.

But then, I’m easily amused :]

02-04-200808-47-46-pmcapture.Png

PS: I actually first saw it years ago, but now I have a blog :)

Add comment February 4th, 2008

Who needs a blogger/webadmin/gardener/apartment manager?

this guy apparently.

PS: I saw this a few weeks back on Craigslist, did a screen capture and forgot to post it.

Craigslist job ad …. need a blogger/webadmin, gardner and apartment manager

Add comment February 4th, 2008

How to run both mysql5 and mysql4 on the same windows Machine.

I finally made the jump from mysql 4 the other day, to take advantage of mysql’s new “INSERT … ON DUPLICATE UPDATE” command (which I think is spectacular by the way).

I didn’t want to actually upgrade from mysql4 to mysql5, just run the two mysqls side by side … so I looked around for a bit and figured out that all you need to do is get both  servers running on different ports. The trick is to remember to reference localhost:port (eg: “localhost:3307″) instead of just “localhost” whenever you connect to it, in php, for example. Below are screenshots of the things you need to watch out for when doing this on a windows machine.

The first part of the install is pretty straight forward, just make sure to install it into a different folder than your current mysql install.

At this screen be sure to tick the “Configure the MySQL Server now” box

Mysql5 configure the Mysql server now

At this screen pick “detailed configuration” and continue on …

Mysql5 detailed or standard configuration screen

mysql5 configure the mysql server 5.0 server instance

Mysql5 InnoDB Tablespace Settings

At this screen be sure to change the port number.

mysql5 please set the networking options

From the dropdown pick a service name that won’t conflict with the name of the service for the current MySQL install.

mysql5 set the windows options

mysql5 configure the mysql server 5.0 server

After that is all done, we want to try and connect to our brand new server. So fire up MySQL Administrator or MySQL query browser and create a new connection, like so … (Yes. I know running as root is bad for you … thanks).

mysql query browser new connection

Once you’re done … connect to MySql Query browser and you should go to this screen! The nice thing about this is that you can probably even run mysql 4, 4.1 and 5 all together.

voila … connected

Add comment February 4th, 2008

Ruby reddit and top ten list of Ruby gems!

Quick note to say that  I just discovered the brand new Ruby Reddit which is really fortuitous now that I’m spending a lot of time with Ruby on Rails. From there I found the “Ten Essential Ruby Gems” … which I am installing and screwing with as I type :]

Many thanks  to Reginald Braithwaite who runs the immensely popular software development blog Raganwald for his post about the Ruby Reddit.

Add comment February 1st, 2008


I recommend

Linode VPS's for Rails hosting
Heroku for mindless Rails hosting
Site 5 for shared Rails hosting and all round great service

Posts by Category

Calendar

February 2008
S M T W T F S
« Jan   Mar »
 12
3456789
10111213141516
17181920212223
242526272829  

Posts by Month