Getting git to work on OS X Tiger

If you haven’t heard of git yet, it’s quickly becoming the preferred version-control system for tons of open-source projects, including the twin suns of ruby on rails and prototype.

In fact, if you keep your eye on the github blog you’ll see a steady stream of well-known projects moving over to git, as diverse as the Blueprint CSS framework and the Haskell compiler.

Basically, if git was a stock-market commodity, analysts would be issuing strong buy recommendations left, right and centre. Git’s tipping-point has arrived.

How to play

If you’ve arrived here via search-engine, it’s probably because you’re trying to work around errors like Can’t locate Error.pm or Can’t locate SVN/Core.pm. Read on…

I already had macports installed, but if you haven’t, follow the macports install instructions – we’ll be using macports to download and install git as it’s supposed to be simpler than building from source.

If you’ve had macports installed a while, make sure it’s up to date:


$ sudo port selfupdate

We want to use git to connect to subversion repositories as well, so we’ll just check that’s possible:


$ port list variant:svn
git-core        @1.6.0  devel/git-core
subversion      @1.5.1  devel/subversion

I already had subversion installed but through trial-and-error found I needed to reinstall it with perl-bindings (git must be using perl scripts to talk to subversion…) Note: I’m using the -f flag to force it to reinstall, you might want to try without first, just to see what conflicts it brings up:


$ sudo port uninstall -f subversion-perlbindings
$ sudo port install -f subversion-perlbindings

Next, we install git:


# This may take a while to install with all its dependencies:
$ sudo port install git-core +svn

And finally, we check it works:


$ mkdir myproject; cd myproject;

# Check your PATH's set properly, this should output:
# fatal: Not a git repository
$ git svn

# If that's OK... clone a repository:
$ git svn clone http://example.com/svn/project/trunk

Can’t locate Error.pm

If you’re getting Can’t locate Error.pm or Can’t locate SVN/Core.pm you should immediately try:


$ PATH=/opt/local/bin:$PATH git svn

If that works, you know it’s just a PATH problem. It’s something to do with Apple’s perl install having slightly kooky ideas about where to store perl libraries.

If you’re still getting complaints about Error.pm, you need to install the CPAN module – and we’re going to use the /opt/local/bin instance of cpan, to make sure things go in the right place for us:


$ sudo /opt/local/bin/cpan -i lib::Error

Cross your fingers, and try again:


$ PATH=/opt/local/bin:$PATH
$ git svn clone http://example.com/svn/project/trunk

If things are working, git will spend a while cloning the subversion repository by pulling out every single revision so you can have a complete set of revisions (including deltas), ready for you to refer to with lightning-speed regardless of internet connectivity. Which is nice.

Posted in Git, Mac, Subversion, Tip

PHP Session Management (grievance 2)

Sometimes PHP surprises you with an easy-to-use feature, like sessions.

Sessions are quite easy to use in PHP. One call to @session_start(), and you have a magic global called $_SESSION to store data in; associated with the user using a cookie called PHPSESSID. PHP takes care of reading and writing the session data for you, and you think no more about it.

Simple.

Time passes, and you haven’t given sessions another thought. Your site’s evolving, using more and more AJAX, and seems to be performing ‘OK’. But, there’s a niggling doubt that something’s not quite right.

For us, we realized something was wrong when we opened multiple search-results in separate windows. We could see the tabs were loading one by one, slowly.

I guess we should have paid more attention to start with. Our previous web development background revolved around enterprise-class application servers. Sessions just worked, no concurrency worries. If you happened to run into a race-condition, you worked around it using threading and locking facilities provided by the implementation language. It never occurred to us that PHP would be so different.

PHP, the way we’re running it (via mod_php) couldn’t be further from the application-server model if it tried. (By default) sessions are implemented using file-based storage, not held in shared memory ready for use by multiple threads.

Storing sessions in files means PHP has to take heavy-handed precautions against concurrent read/write access to the session – it locks the session file for the duration of a request.

The idea never occurred to us – that session management would block user-requests, stopping concurrent requests completing (think AJAX.) Fortunately the quick-fix solution is simple: call session_write_close() as soon as you’ve finished writing to the session. Depending how you use sessions, you may find a number of actions only need read-access to the session, in which case you may want to open and close the session together: @session_start(); session_write_close()

That’s the quick fix, but there are plenty of other options to explore to. A quick code-audit could identify a ton of actions, controllers and pages that simply don’t need session access at all. Now you know PHP locks the session file, you probably want to avoid calling session_start() unless absolutely necessary.

Secondly, PHP allows you to choose what type of session-management you use. You can use memcached either on its own, or with a database backing-store. You could use a MySQL back-end, or roll your own session management registered using session_set_save_handler. It’s really up to you.

Perhaps that’s the problem right there. All the session-management hooks are there because the default session management sucks. The simplicity of using sessions lulls you into a false sense of security, but make no mistake – sessions need to be handled with care if you’ve any hope of running a high-volume website.

Are your sessions managed properly?

Posted in PHP

SVN log message encoding problem

It’s good practice to put useful commentary in the log message whenever you commit code to a repository.

Today, I wrote a log message about centigrade and farenheit conversions, using the proper degree symbol °, but this triggered an encoding problem, resulting in an error message:

macbook:~/projects/smarty ash$ svn ci plugins/function.temperature.php 
svn: Commit failed (details follow):
svn: Can't convert string from native encoding to 'UTF-8':
svn: Tweak: altered temperature title attribute so it contains both farenheight AND
centigrade.  e.g. "88?\194?\176F or 17?\194?\176C".  The order is switched depending on user
preference.
--This line, and those below, will be ignored--

M    function.temperature.php

svn: Your commit message was left in a temporary file:
svn:    '/Users/ash/projects/propagandr/smarty/plugins/svn-commit.tmp'

It didn’t take long to realize although my editor (vim) was configured to use UTF-8, the subversion command-line client had no way of knowing that.

One way of stopping this happening again would be to set my locale permanently so the character-type is UTF-8 (e.g. export LC_CTYPE=en.UTF-8.) But, as a short-term one-off fix, avoiding retyping the log message (and a little off-topic: remembering subversion ignores filenames mentioned in log messages, forcing you to reenter them on the command-line again) – the simple fix was:

ash$ LC_CTYPE=en_GB.UTF-8 svn ci -F plugins/svn-commit.tmp plugins/function.temperature.php

Worked like a charm.

Posted in Subversion, Tip

Tweaking PNG transparency with ImageMagick

This took me way too long to find out, so I thought I’d blog here and hopefully save someone else some time.

ImageMagick is a great swiss-army-knife type tool, with a shed-load of options for converting and combining images. Unfortunately, the sheer number of options can make it a bit time-consuming and frustrating trying to find the one you want.

My aim was simple: given a PNG, make the whole thing semi-transparent.

Searching Google using “transparent” and “opacity” drew a blank – all I got was instructions on how to set transparency for certain colours – not what I wanted to do.

The word I was missing was “alpha”, and the magic incantation for changing the opacity of the whole image is:


convert input.png -channel Alpha -evaluate Divide 2 output.png

In my case, I wanted to set the PNG to be 50% transparent (hence “Divide 2″.) Of course, you can change that number to whatever works for you.

Posted in Tip

Microformats, dark data and CSS – part 2

The first part of this article considered over 100 HTML 4 attributes and came to the conclusion class was the only one suitable for storing machine data (i.e. data specifically inserted and intended for machine parsing.)

In this second part, I’ll review several ways to store data in the class attribute, determine the ‘best’ method, and suggest a CSS implementation change that is (IMO) both trivial and immensely beneficial.

We start by considering the definition of the class attribute, how it’s value is interpreted, and what restrictions this this places on us for storing data.

Isn’t class object-oriented?

Some people say class has an object-oriented use as though (X)HTML and CSS are object-oriented languages, with inheritance based on class values. But that’s not how things work: inheritance is based on parent/child relationships, with everything else determined by “the cascade“.

Let me illustrate with a contact directory example I hope isn’t too contrived.

Contact phone numbers are styled using common fonts and padding, but with different background-images based on the type of phone number (home, work, fax etc.) Using a top-level concept class of tel, we “subclass” using home, work and fax.

Phone numbers can be output and formatted using multiple classes working together:

<span class="tel home">+1 212 123 1234<span>

Because home is such a generic term, we’d write CSS using a 2-class selector like this:


.tel { font: ...; padding-left: 16px; background: transparent no-repeat middle left; }
.tel.home { background-image: url(icons/tel-home.gif); }
.tel.work { background-image: url(icons/tel-work.gif); }
.tel.fax { background-image: url(icons/tel-fax.gif); }

Dropping tel from the mark-up would cause all styling to be lost – the value home on its own does not encapsulate enough information to determine its position in a class hierarchy. Later on, I’ll come back to this and suggest hyphenation as an option that may embody a class relationship more explicitly.

Unordered class data

By definition, class is an unordered set of white-space separated values. The values “tel home” and “home tel” should be treated the same, with the CSS selector “.tel.home” applying with equal specificity to both numbers below.


<span class="tel home">+1 212 12341 12112<span>
<span class="home tel">+1 212 12341 12112<span>

We must bear this ordering-independence in mind when storing data in class. Trying to store multiple bits of data in sequential order cannot work – e.g. a conference schedule:


...
<li><span class="dtstart 9:00 dtend 9:15" title="9am">09:00</span> - Registration</li>
<li><span class="dtstart 9:15 dtend 10:30" title="9:15am">09:15</span> - Keynote</li>
<li><span class="dtstart 10:30 dtend 10:45" title="10:30am">10:30</span> - Coffee</li>
<li><span class="dtstart 10:45 dtend 12:00" title="10:45am">10:45</span> - Session 1</li>
<li><span class="dtstart 13:00 dtend 14:00" title="1pm">13:00</span> - Session 1</li>
...

Note: in this example, humans are supposed to infer end-times by looking at the start-time of the following event. We include machine-data for end-times because “inference” is not easy for programmers to implement.

Although the order is clear and correct in the mark-up, browsers, parsers and libraries have no obligation to maintain the order when accessed. e.g. a “classes” method could return an arbitrarily ordered array of classes:


// fetch the classes for the first item in the schedule:
var classes = $('.dtstart:nth(0)').classes();
// may output: ["9:00", "9:15", "dtend", "dtstart"]

Without further labouring, the take-home point is: data in class-values cannot rely on ordering.

The necessity of prefixes

You may not be 100% certain how your content will be processed or transformed, or what corruption it may suffer; but you can at least attempt to mitigate disaster.

For example: times embedded in machine-data can be arbitrarily precise, from specifying years on their own (“2008″), to fully specifying a time-zone and exact second of an event (“20080721T124032+0100″) The longer format is unlikely to cause confusion (to machines), but the shorter variants could easily be mistaken for model numbers. e.g. the ISSN of periodicals for sale:


<li><a href="..." class="issn 02624079 dtstart 20080719" title="New Scientist dated 19th July 2008">New Scientist no. 2665</a></li>

As we can’t rely on ordering, we need to join the data-type and the data-value together. A few approaches have been suggested, including wrapping the value, or concatenating the pieces with an arbitrary separator – I suggest using a hyphen, which I’ll justify in a minute:


<a href="..." class="issn{02624079} dtstart{20080719}">
<a href="..." class="issn#02624079 dtstart#20080719">
<a href="..." class="issn-02624079 dtstart-20080719">

The hyphenated-prefix selector [attribute|=prefix]

CSS 2 introduced several attribute selectors, including one I’m calling the hypehenated-prefix selector.

The specification admits the primary purpose of this selector is for matching language subcodes; i.e. where CSS rules need only apply to content written in some subset of natural languages:


[lang|=en] blockquote, [lang|=en] q, blockquote[lang|=en], q[lang|=en] { quotes: '“' ”'; }
[lang|=de] blockquote, [lang|=de] q, blockquote[lang|=de], q[lang|=de] { quotes: '«' '»'; }

The rules above specify different quote-marks for German and English. Using the prefix selector means the appropriate rule applies to all English languages, including “en-GB” and “en-US”, as well as content marked no more specifically than lang=”en”. Similarly, the ‘de’ rule applies to all German languages.

However, this selector can just as easily be applied to classes. We can rewrite the telephone-number example as:


<span class="tel-work">+1 212 800 1234<span>
<span class="tel-home">+1 212 123 1234<span>

[class|=tel] { font: ...; padding-left: 16px; background: transparent no-repeat middle left; }
.tel-home { background-image: url(icons/tel-home.gif); }
.tel-work { background-image: url(icons/tel-work.gif); }
.tel-fax { background-image: url(icons/tel-fax.gif); }

Relaxing the hyphenated-prefix rules

Sadly, the hyphenated-prefix is overly-restricted. In the following example, only one rule applies:


[class|=issn] { font-weight: bold }
[class|=dtstart] { background-image: url(bg/microformat.gif); }

<li><a href="..." class="issn-02624079 dtstart-20080719" title="New Scientist dated 19th July 2008">New Scientist no. 2665</a></li>

The problem is due to the way [attribute|=prefix] is defined:

Match when the element’s “att” attribute value is a hyphen-separated list of “words”, beginning with “val”. The match always starts at the beginning of the attribute value. This is primarily intended to allow language subcode matches (e.g., the “lang” attribute in HTML) as described in RFC 1766 ([RFC1766]).

(Emphasis added.)

If the definition had instead been made to cater for a white space separated set of hyphenated tokens, we’d be in a much better position for styling and parsing machine-data microformats today.

[attribute|=prefix] implementations

(Surprisingly) the big four browsers (including IE7) all support the hyphenation prefix selector. But, JavaScript library support is lacking, specifically (naming the javascript library I use daily) jQuery doesn’t handle the hyphenated-prefix selector, although it’s a simple patch.

Assuming JavaScript libraries (or microformat parsers) already implement attribute-selectors, it’s a simple matter to support white space separated hyphenated-prefixes. The key regular-expression is:

/(^|\s)prefix(-|\s|$)/

Assuming your users know what they’re doing and are willing to fix their own issues after throwing something stupid at your library, the regular-expression is easily built on executed on the fly:


new RegExp("(^|\\s)" + prefix + "(-|\\s|$)").test(attribute)

Or (I think), in XPath 2.0:


//*[matches(@attribute, "(^|\s)prefix(-|\s|$)")]

Encoding data

Quotations and ampersands aside (naturally taken care of by normal (X)HTML encoding rules) there’s an obvious problem when data-values contain white space. Fortunately, there’s also an obvious solution, as several methods exist to encode arbitrary data into continuos strings without any white-space. In JavaScript, the methods available include escape, encodeURI and encodeURIComponent, and I’d suggest encodeURI as the best option – providing a good balance between safely encoding data, without being overly aggressive and creating human-unreadable data.

Simplicity is the key

Microformats success depends on its simplicity; using a few attributes and a handful of patterns to invisibly add extra layers of information to existing content.

Hopefully, I haven’t suggested anything in conflict with existing microformats. Hyphenated-prefixes should be viewed as an additional tool in your arsenal. Not as a competing or successor solution.

With a more flexible definition of the that damned attribute selector, I’m sure the unAPI folks would have produced an even simpler specification, and the arguments around microformat’s datetime design pattern would have been resolved years ago.

Though I’m sure it doesn’t show, I’ve written and rewritten this article many times, but it doesn’t get any more complex:

If you want to piggy-back machine-data on existing content, use the class attribute. Separate data-types from data-values using a hyphen, and encode the data using something equivalent to JavaScript’s encodeURI.

That’s all folks

Posted in CSS, HTML