There’s was a bit of kerfuffle when the BBC dropped support for microformats in their program listings.

You can’t argue with their reasons: data for microformats was being read aloud by screen-readers, popping up as tool-tips, and confusing the hell out of their users.

The microformats community rallied to solve the issue, but Auntie Beeb rejected all their proposals; and due to lack of community support, also back-tracked on their own proposal (inserting data- prefixed values into the class attribute.)

You can’t blame them for rejecting the microformats community’s proposals. This proposal feels particularly torturous:

  To be held on
  <span class="dtstart dtend">
    <abbr class="value" title="1998-03-12">
      12 March 1998
  <span class="dtstart">
    <abbr class="value" title="08:30">
    <abbr class="value" title="-0500">
  <span class="dtend">
    <abbr class="value" title="09:30">
    <abbr class="value" title="-0500">

That suggestion seems to be the result of a 30 minute brain-fart by microformat’s spiritual leader and, like the BBC, I find it “complicated” (I doubt that was the first word that sprang to mind though.)

Let’s go back to the specs and see what HTML gives us to work with. Considering over one hundred attributes in HTML 4, only a handful apply to the elements we’d want to tag (not just <abbr>, but also <span>, <a>, <li> amongst several others.)

The only attributes available to insert microformat data and still pass validation are: class, dir, id, lang, style and title.

We’ve dismissed all event-handling attributes (onwhatever) as they’re supposed to contain script, and we can quickly dismiss a few more attributes now:

dir can only contain two values (ltr and rtl.)

id must be unique per page; that’s a restriction we can’t work with for microformats.

style attributes are supposed to hold CSS properties. It could be subverted using a vendor prefix e.g. style="-mf-dtstart: ...". However, you’d buy yourself a place in hell, and you’d never get support from the microformats community.

That leaves us with lang, title and class.

Why lang?

Good question.

The lang attribute indicates what language the content of an element is held in. Language codes were defined in RFC 1766, since replaced by RFC 3066, which has itself been replaced by a due of RFCs: 4646 and 4647.

Amongst the long lists of language codes, you can find a few options that should let you embed machine-data without it being read-aloud. e.g. the language-code zxx indicates there’s “no linguistic content”, so you might think a screen-reader would simply skip the content:

  To be held on 12 March 1998
  <span class="dtstart dtend" lang="zxx">1998-03-12</span>
  from 8:30am EST
  <span class="dtstart" lang="zxx">08:30-0500</span>
  until 9:30am EST
  <span class="dtend" lang="zxx">09:30-0500</span>

Sadly, a quick trial using accessibility features on a mac reveals the content it still read aloud. I suspect the other possible language codes – art for “artificial languages”, and the x- prefix for “private use” – would suffer the same fate.

Also, the content would need hiding from sighted users using a simple CSS rule: [lang|=zxx] { display: none } – but this would fail under various viewing conditions (e.g. syndicating data via RSS without styles.)

Finally, the whole idea of marking machine-data using the lang attribute may be frowned upon. RFC 4646 includes this:

Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages, but excludes languages not intended primarily for human communication, such as programming languages.

Looks like the lang attribute’s a bit of a no-no then. That leaves us with class and title.

title vs. class

We’ve already mentioned the accessibility issues with title. The BBC have done the research and user-testing other people only think about. Their results show it’s not just screen-readers that get confused by machine-data in title attributes – sighted users are baffled too.

Fragmenting one machine-readable value into three doesn’t solve the problem – it exacerbates it. The number of elements with mystical tool-tips increases; while human-friendliness increases for some (dates), it decreases for others (timezones).

The title attribute is meant for humans, no matter how you spin it.

All hail the mighty class

I must apologize: did I just have a “title vs. class” debate without mentioning class?

I guess I did. title got disqualified, leaving class to win by default.

(small side-note: out of all the attributes on the short-list, class is the only one defined to contain CDATA – i.e. general-purpose Character DATA. Just thought that might be worth a mention.)

Having whittled down our options to one winner, we now need to decide how we’re going to organize our data, and cram it into the class attribute. We need to take into account the definition of class, and that the attribute-value is treated as an unordered list of white-space separated tokens.

That’s for part 2, where I also propose tweaking a CSS3 selector to change it from a single niche application to become a general purpose tool in a web-designers arsenal.

4 thoughts on “Microformats, dark data and CSS – part 1

  1. Just to clear one thing up… we (the BBC) haven’t back-tracked on our class -data proposal, we’re still happy with it. In true BBC fashion I tried to keep the article unbiased and point out the pros and cons of each proposal including our own, which is by no means perfect.

    However, that’s not to say we’ve made our mind up and we won’t accept anything but our proposal. We’re certainly open to ideas. Looking forward to your follow-up post.

  2. You hit the nail on the head. Using the title attribute is a terrible, terrible idea, as it exposes machine readable information to the user. I completely agree that using the class attribute is a good way to go.

    While you’re talking about repurposing the class attribute, it’s worth thinking about the extra text that the hformat producer is forced to put into human readable HTML. For hReviews, the rating has to go into plain text. This is an absolute pain in the ass. It means any site displaying the hreview has to play silly CSS tricks to hide the “best” and “worst” attributes (and even the score itself, if they want to display a graphic instead). Including machine-readable information in human-readable HTML also falls into the “terrible, terrible idea” category.

    I’m not sure what you’re going to suggest in your part two, but it would be nice to see a general mechanism for storing all uformat attributes in a single class attribute.

Comments are closed.