Posts filed under 'developers'
I was going through my google analytics logs today and I noticed that a lot of folks were coming to my site on Google searches for stuff like ‘print_r + ruby on rails‘.
So I figured I’d write a blog post about it, because I’ve had the same problem.
What you’re looking for is ‘inspect’.
If you have an array, hash or object that you want to take a quick-and-dirty look at just type in
objectname.inspect
and you’ll get an output of the contents of said array, hash or object.
Here is a screen capture of a quick irb session to show you how it works.

I hope this helps.
May 1st, 2008
The more I work with Ruby and Ruby on Rails, the more I begin to understand (though not necessarily agree with) a lot of the vitriol that has been aimed at PHP over the years by developers using other more rigorous languages.
A few weeks back I ran into this little speed bump while working with Ruby on Rails, where I was trying to do a multiple assignment like this
Most seasoned Rubyists will be waving their arms around and yelling “NOOOOOO!!!”
But coming from a PHP background this seemed perfectly okay to me.
Let me go off on a (relevant) tangent here and show you how the PHP code for this assignment would work
$x = $y = $z = array();
$x = 'me';
print_r($x);
print_r($y);
print_r($z);
the output from this is
me Array() Array()
Notice how the variables $y and $z remain arrays?
Now lets look at the same ruby code.

You can see that when we do an assignment of
ALL the variables point to the same array, so changing one item, changes all the other variables!
This is because arrays, hashes and certain objects are passed by reference not by value.
I say “certain” objects because the assignment
… doesn’t work the same way - as you can see above - even though the quoted string “you” is an object in Ruby.
So be careful PHPsters … this cost me a couple of hours in my project.
Hopefully you can skate around this one if you come across it.
April 29th, 2008
This one took a little bit but I finally figured it out …
@xml = render_component_as_string :controller => “quote”, :action => “xml”, :params => {:request_id => 100}
This would run the action “xml” of the controller “quote” and pass the parameter “100″ to it to do so.
Whatever would have been displayed at /quote/xml/100 is now stored in @xml
This allows you get the output from any action …. anywhere, also allowing you to pass parameters to it in the process.
Even better, this actually runs the action and its view (unlike render :action, which just renders the action view).
For more details, go to the Ruby On Rails Manual > Using components
April 17th, 2008
After weeks of anticipation , the apache module that allows you to upload your ruby on rails application to the server and have it “just work” has just been released.
I’ve just downloaded the source code from their git repository (git rocks!!) and am trying to see if it’ll install on windows.
Update: It won’t install on windows and there are no plans to ever allow it to (damned Linux elitists!!! :P).

April 11th, 2008
I’ve been interested in moving to the new Versioning system championed by none other than Linus Torvalds … creator of Linux.
But I’m on a windows box (and I like it here) and didn’t want to deal with using cygwin to manage repositories in Git.
Cue this succinct blog post on how to run Git on windows.
enjoy.
April 4th, 2008
Thanks for Justin Cook for doing all the leg work on this one.
All you have to do is create a new table from the old one, which filters out the duplicate entries
1
2
3
| CREATE TABLE new_table AS
SELECT * FROM old_table WHERE 1
GROUP BY [colum_to_remove_duplicates_from]; |
so as an example
1
2
| CREATE TABLE news_new AS
SELECT * FROM news WHERE 1 GROUP BY [title];<!--more--> |
Now you just have to go in and rename the news db to news_old (for the newbies to this … don’t drop the table immediately, at least until you’re sure the new one works fine).
Then rename news_new to news
Phpmyadmin should make all this easy (click on the “Operations” tab), but in case you are working from the console, here is the syntax you need to know
1
| RENAME TABLE new_table TO old_table; |
That isn’t all that you have to do, however. The new table will be missing its primary key and auto increment settings as well as any other indexes that you may have had on the old table.
Go in and set the primary key (first) back on the column that had it, then (secondly … the order matters) clear out the default value on it and set it to auto increment.
You can re-add your indexes later … be sure to put a unique index on the column you were having trouble with.

March 27th, 2008
I have just blown 4 - 5 hours on this “feature” of PHP and I thought someone else would care to know.
You can’t use numeric keys for sessions in php!
so stuff like
$_SESSION['1234'] = ‘boo”;
… won’t work because PHP’s session handling mechanism simply refuses to store that particular session variable.
Even worse, it fails silently, leading you to think your brain has fried itself.
Personally, I just thought something funky was up with PHP5 (google searches seemed to indicate problems with PHP5 sessions, so I focused on that).
But after trying version 5.2.4, 5.1.6 and still having the same problem I finally tried it out in PHP4 and … still had the problem. Then I seriously started trying to track down the bug.
After finally finding it and writing the right Google query (thanks for nothing Google :|) … I found a couple of articles that point out this problem.
So I’m writing this with an SEO’ed title that should hopefully grab the right folks, before they blow hours trying to figure this out.
Please go and vote for this feature to be fixed (made to fail loudly … so you know exactly what is wrong).
Add a comment if you do.
March 27th, 2008
Life is too short not to have garbage collection …
- Awesome quote from a developer Jon Cooper who I was talking with at SxSWi.
He was talking about writing code in .NET and coding speed reliant modules in C++ or C.
March 11th, 2008

Panelists
Kevin Rose | Digg
Cal Hendersen | Flickr
Joe Stump | Digg
Chris Lea | Media Temple
Garrett Camp Stumble Upon
Matt Mullenweg | WordPress
{Discussion: Kevin seems to be moderating}
- Consensus is that you think about scaling when you get there
- Joe Stump says that it (not worrying about scaling initially) helped them concentrate on building cool features
- Software load balancers suck … Squid is highly recommended
- Pound is good for http load balancing
- Joe Stump says that at 15 million pageviews you should start thinking of specializing servers (images, db, static files … that sort of thing)
- Wordpress uses mainly rented server boxes …. 1000 of them (didn’t know that)
- Cal from Flickr says that engineering time is expensive and that if you can just solve the problem by throwing money at it … then you should.
- They recommend that when your development staff grows past 2 people that someone be appointed to standardize code (underscores vs Camel Case)
- TRAC is brought up by Stump … use it if you
- Lea says to document your code. Its actually a monetary issue, time spent figuring out code is time not coding, if you can reduce that you save money.
- Cal cracks a really great joke “What is this documentation thing you speak of?” … “seriously … just hire people that agree with you”
- Question comes up about remote workers … Kevin says that at one point they didn’t care … they hired a guy from the East to help digg scale.
- Matt says his people don’t see each other for months and they get together usually for social purposes more than anything
- Says they are trying to get to a stage where they have users clustered in certain cities.
{Floor opened for questions}
Q: What bottlenecks do you guys usually encounter
- If its not your db then its your file storage (NFS etc) … Cal concurs … says its almost IO … “With a teeeny bit of foresight … you can avoid that” Love this guy!
- Talking about digg architecture … Joe Stump says digg db has 200 tables but only about 2 are problem children
- says one of them has about 200 million records … and the two get the highest 2 read and write requests
- Stump says that your language never matters … Cal chimes in “Unless its Ruby” … the crowd roars in appreciation!
- PS:not sure where this fits, but there is a discussion about how their admin tools are usually not very well done. Joe says that whenever something goes wrong with digg, they usually check with the Admins first to see if they’ve screwed with anything.
Q:recommended software
- Cal says use Ganglia for graphing, puppet for admin. Lea suggests Unin for graphing. Cal says Ganglia and unin are almost the same. LVS comes up.
Q: How do you keep the community from becoming obnoxious
- Kevin talks about giving the community the tools to moderate themselves, talks about the success of the “bury” option.

Q: About source control
- They do pushes at digg from once a day to 45 times a day, but Rose says officially they’ll be pushing twice a month.
- Joes suggests that when you’re ready to push live, you should freeze your code and create a new branch
- This way when something breaks, you can fix the code in the branch and push it live from there.
Q: asks about what they do when they can’t have a local development environment
- Cal says they can’t support a local dev environments because its just too complex, too many moving parts. They just use dev servers. Everyone on the panel concurs.
- Cal: Talks about how Flickr lives on memcached and Squid, suggests taking a look at Varnish if all your stuff is in memory. Says that they have 32,000 requests per second for images.
- Joe: Says they cache things like user objects containing user data (because users barely alter their info after they first get set up)
- Matt: Talks about the use of output caching … says that if you’re getting 20 million page views then caching pages for, say one second, can take you down to about 8 million or was it 800,000 …
- Joe: Talks about queuing. Says that when a user diggs something, they just cache it for that user so it shows up dugg, but they queue it. So a digg might not get into the databases for a few minutes.
- Digg uses gearman for queuing. Cal talks about a “ghetto queue” of cron and mysql to implement queuing. Joe says that’s “housing projects” bad.
Q: about API’s
- I love Cal! He says that there’s something about something about API’s that brings out the stupidity in people. They just try to suck down all your data as fast as possible … where they wouldn’t try to do that with a regular web page. He says that implementing a throttling system is a good idea, to avoid getting creamed by idiots. Did I mention that I love this guy!
Q: any good documentation for using the tools they recommended
- Answer: A resounding NO. Joe says 90% of what he’s learned with open source is by trial and error.
Definitely Worth the price of admission!
March 11th, 2008
I got around to reading this very insightful piece about truly beautiful code on coding horror today.
In it, Jeff Atwood talks about the problem with the book Beautiful Code, which is that it actually talks about code and not the ideas behind the code.
To put it succinctly in Jeff’s words …
“Ideas are beautiful. Algorithms are beautiful. Well executed ideas and algorithms are even more beautiful. But the code itself is not beautiful. The beauty of code lies in the architecture, the ideas, the grander algorithms and strategies that code represents”

I just remember thumbing through the book at my local Barnes & Noble and not being enamored of it … I went on to spend 2 hours reading Obie Fernandez’s The Rails Way instead.
February 26th, 2008
“We have to reinvent the wheel every once in a while, not because we need a lot of wheels; but because we need a lot of inventors.”
- Bruce Joyce
I wrote about my experience writing a site crawler in php in an earlier post, and I’m going to use some of the background there to make my point here. So it might help to go read it if you haven’t already.
[Google’s crawler [Googlebot] isn’t that sophisticated/writing a crawler in php]
From my casual observation of the way Googlebot crawls some of the sites I work on, I have reached the conclusion that it works in much the same way that a crawler I wrote a year ago worked.
Google bot goes page to page, gathering links from your page and tacking them onto the current url that it is at, right then. So why do query strings give it such a problem?
The answer is simple. Imagine this url for an item that doesn’t exist anymore.
www. example.com/store.php?buyid=29&catid=12
When a crawler encounters this url and tests it to see if it returns a 404 … it doesn’t.
Why?
Because www. example.com/store.php is usually still a valid page. It won’t give the crawler an error, unless you explicitly code it to.
So the crawler now tosses www. example.com/store.php?buyid=29&catid=12 onto its list of pages to be crawled. Can you see the disaster waiting to happen?
www. example.com/store.php?buyid=29&catid=12 and any other non-existent urls like it are basically just mapping to the still valid www. example.com/store.php but in the crawlers mind they are all different urls.
Now , if there are other urls on that page (store.php), like for related products for example. Google just takes the url and tacks it on to the url (it thinks) its at right now. So it winds up with
www. example.com/store.php?buyid=29&catid=12store.php?buyid=39&catid=11
It does that for every invalid query string url that has store.php in its base. It then goes back and crawls them again and now it has.
www. example.com/store.php?buyid=29&catid=12store.php?buyid=39&catid=11store.php?buyid=39&catid=11
The crawler is now in a tailspin … going around in circles trying to crawl your site. Chewing up your cpu cycles and generally being a nuisance.
I hope this helps you understand why Googlebot hates query strings so much.
I haven’t tried this yet, but I think it should be clear that making the base url of a query string resolve to a 404 error will help it out a lot.
So as an example
www. example.com/store.php?buyid=29&catid=12
should return a code 200/ok
and
www. example.com/store.php
should give a 404 error.
This is just my theory, I don’t know that It’d be practical.
PS: I hope this further helps you understand why search engine crawlers also hate PHP session ids on your content.
February 8th, 2008
I spent a lot of time early last year, trying to write a crawler in php (I know, I know).
It was supposed to sit on the server and when so that when you went to the url, it’d generate a google sitemap for your entire site.
What I found out was that writing a good crawler is very hard work. Not because of the recursion involved, but because of the infinite ways link tags appear.
Now Google has validated my experience (more on this in a second).
Just a couple of things I had to consider with my crawler were
- I had to program it to look for a base tag so that I’d know if to treat the links as relative or absolute.
- I had to check each link to figure out if it was an internal or external link so I’d know whether to crawl it.
- Then I had to keep a running list of links crawled, so that I’d know if I had crawled a link before or not
- I had to let the program know that if there was a “/” in a relative link, to let it know to substitute the domain for it.
- knowing how to deal with “../” … this was a pain and a half
- I had to let it know how to deal with mailtos, javascript, and improperly written urls like <a href=”www.concept47.com”> (more on this later) … etc
My crawler worked by gathering a list of links that it continually added urls to be crawled onto. As it crawled the urls, it put them in another array that each new url had to be crosschecked with before being added to the list to be crawled.
The problem though, is the way people write html markup. As many of you know, there are some nastily written pages out there … so if someone wrote <a href=”www.concept47.com”> or <a href=”concept47.com”> or even <a href=”screwyougoogle”> my crawler had to know not to add it to the list to be crawled.
This is very difficult to do correctly and for all the time I spent on it, there is no real way to deal with it. You could write special cases for <a href=”www.concept47.com”> but what about <a href=”ww.concept47.com”> or
<a href=”w.concept47.com”> … see the problem?
Even though the urls give you a 404, they’ll make it onto your list to be crawled and waste the crawlers time. I felt like such a loser for not being able to figure out this issue, but it seems the Googlebot has the same problem.
Look at this. [click the image to make it bigger]

This is from my webmaster tools console.
The problem here was that I had a link tag on one of my blog posts that went like this
<a href=”www.unfuddle.com”>
As you can see, even the mighty Googlebot didn’t pick up on the error. It just tacked the url onto the current url, it was at and went on about its business.
Validation!
Read the next in this series [Why query strings in urls drive Googlebot and other search engine crawlers insane]
February 8th, 2008
I consider myself a power user of windows xp, so why haven’t I upgraded from winamp 5.35 to winamp 5.52?
After all, every single time I start winamp it bugs me to.

The answer is simple … Its because I’m lazy.
I’m not going to go to winamp.com, try to figure out which version to download and then actually install it over again, just so winamp runs exactly the same as it did before! No way.
But, if the program went out there got the update and installed it for me … I wouldn’t object.
Firefox does this right.

When an update is available, it goes out and finds it for me. If I okay it, it installs the update for me and restarts my browser, putting me back viewing the page I was looking at before … like nothing happened. All I have to do is hit “Download & Install Now”. How easy is that?

Nag screens/prompts/dialogs are very annoying. My natural instinct is to close them and get on with my life.
In that scenario, everybody loses.
So if you write software, you should strive to have it update automatically, if you possibly can. That would definitely be a selling point for me as your customer. (Hear that Blumenthals software?)
PS: Most software (including wordpress) does require you to go download and install the newest versions. Since automatically updating software is so rare it could be a killer feature if you incorporated it into your software.
February 7th, 2008
Here’s some excerpts from DHH’s post and comments yesterday on 37 signals
- The short answer is that we don’t document our projects. At least not in the traditional sense of writing a tome that exists outside of the code base that somebody new to a project would go read …
- Further more, I don’t really find it necessary for the kind of work that we do. Our biggest product, Basecamp, is about 10,000 lines of code. That really isn’t a whole lot in the grand scheme of things. Everything we do is build is also using Ruby on Rails, which means that most Rails programmers would know their way around our applications straight away. It’s the same conventions and patterns used throughout.
- Finally, we write our application in a wonderfully expressive and succinct programming language like Ruby that leads itself very well to a programming style like the one Kent Beck preaches in Smalltalk Best Practice Patterns. Keep your methods short and expressive. On average, our models have methods just four lines long. Adding documentation to a method should usually only be done when you’re doing something non-obvious that can’t be rewritten in an obvious way.
- [comment] Wim, yes there’s RDoc. I just generally don’t use it for projects. When methods are only an average of 4 lines long written in a language like Ruby, it’s often faster and better to merely browse the code base rather than rely on explicit commenting.
Keep in mind that I’m no Ruby on Rails genius, and from the little I’ve done I can see where DHH is going with this. But I’ve always thought that this argument of a language being so succinct and clear that you don’t have to write comments is just a bit silly for a couple of reasons.
- I believe that you don’t write code for machines, you write code for people (other developers). So any help you can give them in navigating your code is typically good to have. It saves them time and their employers money … that is what being a great consultant is about, you have to be thinking in terms of how to help your clients’ business and saving them money falls in that category.
- People who use this line of argument are either too lazy to comment and are trying to justify it …
- … or don’t understand that there are developers of all skill levels in the industry. So whereas, someone with your skill level would be able to navigate your code quickly, someone who wasn’t as good might take longer …why not avoid that.
Note, that I’m not of the school of thought of commenting just for the sake of it, like I’ve heard some “blub programmers” do. However, I do think that you should always be thinking of other developers when you code and if commenting can get them to a point where they can modify your code in 1 minute instead of a minute and a half … then you should comment.
In the end, I guess its a bit unfair to criticize DHH, because its not clear that he doesn’t comment his code much … though its easy to infer that. I just know from my experience that people who say things like he says have a tendency to have 3 lines of comments in some piece of code 500 lines long.
But if you’re a “rockstar developer, I guess everyone has to dance to your tune, wherever you are right?
February 6th, 2008
I was trying to take advantage of PHP5’s new auto_prepend_file directive today, by using the php_value directive to set it in a .htaccess file. But as soon as I did that, my cheerfully running application puked and died, with the familiar message.
“Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request”
I had seen this behavior before, when I was writing an app for a client a few months ago, but I hadn’t had time to investigate it. Today I decided to go a-googling and I promptly found my answer …
Those are Apache directives, but in CGI mode Apache calls the php binary, which turn reads php.ini. Since the binary doesn’t read httpd.conf it has no effect on PHP. As PHP isn’t loaded into Apache, Apache doesn’t know what to do with the directives and borks.
February 5th, 2008
Previous Posts