Syntax Coloring for Blog Posts

If you were reading Objective-C 2.0 Tutorial: Part II yesterday, you might have noticed the first signs of syntax coloring starting to break through the surface. It looked a bit weird for a few hours, but I think the final result was worth it. It turns out this wasn't so easy to do.

I had punted on syntax coloring for a while, but it really felt like something was missing when the Objective-C 2.0 posts came up. With so much code, it really helps to have visual cues. So I sat down to take a look at this. I knew of three possible strategies:

1. JavaScript-based parsing/coloring engine
2. Server-side parsing/coloring engine
3. TextMate HTML generation

The JavaScript solution does exist as a Google Code project called google-code-prettify. It looks nice and simple, but I couldn't get anything to happen with Objective-C source. There's also CodeHighlighter, which I found on Joe Maller's post on the subject, but there's no Objective-C support here, either.

Server-side parsing. Do I want to write an Objective-C parser? Nope. Moving on.

The TextMate solution, it turns out, produces stunningly beautiful results. The TextMate bundle (yes, there's a bundle for TextMate itself) has a few interesting commands in this area:
Create CSS from Theme
Create HTML from Document / Selection
Create HTML from Document / Selection with Lines


The "Create CSS from Theme" command converts your current TextMate color theme into a CSS file. And it works well. Really well. I actually thought the first output test was the original TextMate window.

Initial TextMate HTML Output
Original TextMate window on the left, HTML version on the right.    


The "HTML from Document" command converts TextMate's document structure into HTML, adds span tags with appropriate CSS class names for styling, then slaps the matching CSS at the top, producing a free-standing web page with the properly-styled document contents.

This all works because TextMate the document contents to bundles in the form of scopes. Scopes are cascading selectors (much like CSS, in fact), which is why you can write a command that applies to all types of source, or just to, say, Objective-C. It's also why the "objacc" tab trigger generates different results in an @interface block than an @implementation block.

You can see what the scope is for any given block of text by selecting the text and choosing "Show Scope" from the "Bundle Development" bundle (Control-Shift-P, by default):

TextMate Show Scope


An NSString type in the context of a property declaration, for example, has these scopes:
source.objc
meta.interface-or-protocol.objc
meta.scope.interface.objc


The syntax coloring themes uses scopes for styling text. So the "Create HTML from Document" command really does three things:

1. Converts scope names in the theme to CSS class selectors
2. Converts the theme attributes to CSS properties
3. Converts the document structure to HTML with scope names converted to CSS class names

This works great for one-off documents, but the catch is that the HTML generator applies all scopes for all elements in the document in the form of span tags — even those which are not styled by the current theme. So the result is a lot of extra span tags which are not used in many cases.

There is immense scripting potential with all of that metadata inline, but for my purposes, I just wanted something a bit more streamlined. The great thing about TextMate bundles is that you can go in and muck around with them yourself. So I did.

The HTML generator was created by Brad Choate. The script is written in Ruby and resides in the TextMate app package:
/Applications/TextMate.app/Contents/SharedSupport/
    Bundles/TextMate.tmbundle/Support/lib/doctohtml.rb


The script is in the Support folder, so you can't edit it directly from the Bundle Editor. You can still open it directly, of course. Make a backup of the original before doing so.

I'm not a Ruby expert, but I knew just enough to hack something together. I changed the script to build up a list of scopes that the current theme has styling information for, and only output CSS class names for those selectors. In an extreme case, the result is going from this:
<span class="support support_class support_class_cocoa">NSString</span>* director;director = <span class="meta meta_bracketed meta_bracketed_objc"><span class="punctuation punctuation_section punctuation_section_scope punctuation_section_scope_objc">[</span><span class="meta meta_bracketed meta_bracketed_objc"><span class="punctuation punctuation_section punctuation_section_scope punctuation_section_scope_objc">[</span><span class="meta meta_bracketed meta_bracketed_objc"><span class="punctuation punctuation_section punctuation_section_scope punctuation_section_scope_objc">[</span>movie <span class="meta meta_function-call meta_function-call_objc"><span class="support support_function support_function_any-method support_function_any-method_objc">director</span></span><span class="punctuation punctuation_section punctuation_section_scope punctuation_section_scope_objc">]</span></span> <span class="meta meta_function-call meta_function-call_objc"><span class="support support_function support_function_any-method support_function_any-method_objc">fullName</span></span><span class="punctuation punctuation_section punctuation_section_scope punctuation_section_scope_objc">] (snipped for length)

To this:
<span><span class="support_class">NSString</span>* director;director = <span><span>[</span><span><span>[</span><span><span>[</span>movie <span><span class="support_function">director</span></span><span>]</span></span> <span><span class="support_function">fullName</span></span><span>]</span></span> <span><span class="support_function">capitalizedString</span></span><span>]</span></span>;<span class="support_type">NSUInteger</span> movieTitleLength;movieTitleLength = <span><span>[</span><span><span>[</span>movie <span><span class="support_function">title</span></span><span>]</span></span> <span><span class="support_function">length</span></span><span>]</span></span>;</span>

And the final result looks identical.

I couldn't eliminate all of the unnecessary span tags because the converter doesn't build up a structure — it just runs straight through and outputs HTML as it encounters each element. If I left out the opening tags, there'd be a lot of closing tags with no counterpart. This really isn't a criticism, though. I understand the command was not designed to be the ultimate HTML generator. I sent Brad an email about this, so we'll see what happens.

The great thing about this generator, though, is that it generates output specific to the currently-selected theme. So you can generate multiple CSS files, load them into the same document, and have multiple blocks of code displayed using different themes for contrast. For example, client-side code can be displayed with a lighter-colored theme, and the server-side code is in a darker theme.

You can edit themes using the built-in theme editor:

TextMate Theme Editor


You can also add coloring for scopes which are not already styled in the current theme. I've added custom colors for project-specific symbols in the past by editing the language bundle and adding coloring for them in the theme editor. I consider this a huge productivity gain.

I'm happy to release my modifications, but they're really hacky and I'm not sure what the license requirements are. I'll post an update if I figure it out with Brad.
Design Element
Syntax Coloring for Blog Posts
Posted Nov 5, 2007 — 15 comments below




 

Matthew Flanagan — Nov 05, 07 4998

Hi,

I use Pygments to do syntax highlighting in my blog. It is a python module and script that works on any platform that supports python. It has support for an impressive list of languages and other markup including Objective-C (not sure about 2.0).

Scott Stevenson — Nov 06, 07 5000 Scotty the Leopard

@Matthew Flanagan: I use Pygments to do syntax highlighting in my blog. It is a python module and script that works on any platform that supports python

Pretty slick. I could probably use this instead. Everything else here is PHP, but I guess it would be easy enough to call out into Python.

Michael Sheets — Nov 06, 07 5001

The license for the bundle items. There are exceptions to the license that are noted in the specific items but for the most part the bundles are open source.

Ciarn Walsh — Nov 06, 07 5002

If you wanted to automate the highlighting process you could use Ultraviolet, a Ruby highlighting package which uses an implementation of TextMate’s parser (and uses TextMate’s themes).

If you modified the file in the application package directly don’t forget to move the bundle to /Library/Application Support/TextMate/Bundles/ so that your changes aren’t overwritten when TextMate is updated.

stubblechin — Nov 06, 07 5004

Wow, cool.

Magnus Nordlander — Nov 06, 07 5006

You could also use GeSHi, which is written in PHP and has support for Objective-C, although it doesn't produce as nice results... Oh, and you definitely want to do implement some caching strategy if you do that, because GeSHi can take a couple of hundred milliseconds or more if it's a lot of code.

Dominik Wagner — Nov 06, 07 5008

Just wanted to say that SubEthaEdit can copy as xhtml from any colored document as well. Its use in latex can be seen in the latex tutorials here: seeing latex

Scott Stevenson — Nov 06, 07 5009 Scotty the Leopard

@Magnus Nordlander: You could also use GeSHi, which is written in PHP and has support for Objective-C, although it doesn't produce as nice results

I haven't messed around with the color much, but one nice thing it does is make Cocoa class names clickable with links to the official documentation. Wow. Very nice touch.

pavel — Dec 11, 07 5190

would you be willing to share the code for that modified html generator?

i have no ruby skills :(

Alex — Jun 09, 08 6057

This is a great list of tips. Thank you.

Donavan — Jun 30, 08 6126

Very thorough instructions. Thank you.

Glenn — Sep 07, 08 6351

And the final result looks identical.

Not to put too fine a point on it, but...I don't think so.

The first example formats:
NSString* director;director = [[[movie director] fullName]...

while the second formats:
NSString* director;director = movie.director.fullName...

I don't know if all of those punctuation... classes would still be there if the first example used dot notation.

Scott Stevenson — Sep 07, 08 6356 Scotty the Leopard

@Glenn: I don't know if all of those punctuation... classes would still be there if the first example used dot notation.

Yeah, it looks like you're right. I pasted the wrong text into the second example. The default output is still far more verbose, though, so the main point still applies.

Pete K — Feb 04, 09 6606

Heya, I know this is an older post, but I was having this problem, too.

I use Snipt (http://snipt.net) now for hosted code coloring, and it's pretty damn spiffy.

http://pkarl.com/articles/introduction-git-version-control-you-and-me-os-x-1/

I just had to paste my code into a box & pick a syntax. They give you an embeddable script tag.

Oli — Jun 29, 09 6820

Is there any chance you could share your Ruby hacking on this? I’d also like to do this, but unfortunately I don’t know anywhere enough to do this ;-(

Thanks in advance!




 

Comments Temporarily Disabled

I had to temporarily disable comments due to spam. I'll re-enable them soon.





Copyright © Scott Stevenson 2004-2015