Today's Fix-it: Fixing Craiglist's bad page formatting

Date/Time Permalink: 12/23/06 02:15:50 pm
Category: HOWTOs and Guides

Here comes the Holidays, and since my usual news-browsing will be spoiled by 101 redundant end-of-year list-o-ramas for the next week, I'll be happily destroying my mind reading "Best-of-Craiglist", which is an archive going back to the Stone Age and is funny as flying monkeys. Except sometimes the post shows up on the page as one huge blob of text with no line breaks. Isn't that annoying? Obviously some kind of rendering bug on Craig's behalf. If you view source, you will see that the line breaks are obviously there, but no HTML p or br tags to render them correctly in a browser. Here's what I do:

Save the page as plain HTML. It will be [8-digit $RANDOM].html.

Pop open a console and go:

sed 's/^/<br>/g' 34529205.html > temp.html && rm 34529205.html

And open temp.html in a new Firefox tab and read. Delete when done. Note that we did add some superfluous whitespace, but what the heck, the paragraphs break again.

Wow, wasn't that a fun hack? Truly, this is why geeks are geeks. Not because we like doing everything the hard way, but because life is broken 50% of the time and we like to fix it and go on.

Note to Craig: Netcraft says your server is "Apache/1.3.34 Unix mod_gzip/ mod_perl/1.29", so could you paste that sed line somewhere in your archive script? You run a great site.

UPDATE 11/06/07: Craiglist seems to have fixed this bug by now. I can plow hundreds of posts back in the archives and not find a text blob anywhere. Leaving this blog post as a missed connection with logic. But if you Googled this up, you can still use this hack to fix other problems, which is the least I can do now that I've wasted your time leaving this post here, since you searched this post out while looking for something else entirely.

