Wednesday, December 20, 2006

More on Syntax

My last post appeared on reddit--thanks dons! This induced some comments I'd like to respond to. First of all, there already is a macro system for Haskell, called (somewhat misleadingly) Template Haskell. It already provides the capabilities to generate arbitrary Haskell code. (More correctly, Haskell 98 code, since extensions like Generalized ADTs are not supported.) It also provides the given quasi-quotation mechanism I used in my last post's mock-ups: [| ... |]. However it has two problems. Firstly, macros are marked specially using the $(macro ...) syntax, which is not as seamless as it could be, although there might be good reasons to keep it, namely to make it easily recognizable, when macros are involved. Secondly, its quasi-quotation syntax is very limited, i.e., you cannot introduce new bindings and it's hard to modularize code--but I might be wrong with here since I might not have pushed it as far as possible. The problem is: when you cannot use the quasi-quotation syntax then you're left building up the quite complex Haskell parse tree yourself. Due to limited documentation and Haskell's syntax rules you usually write your macros by first getting the AST of some sample code, e.g. using:
-- | print a human-readable representation of a given AST
printAST :: ExpQ -> IO ()
printAST  ast = runQ ast >>= putStrLn . show

pp = printAST [| let x = $([|(4+)|]) in x 5 |]
which then (reformatted) looks like this:
$ pp
LetE [ValD (VarP x_0)
           (NormalB (InfixE (Just (LitE (IntegerL 4)))
                            (VarE GHC.Num.+) Nothing))
                     []]
     (AppE (VarE x_0) (LitE (IntegerL 5)))
Then you try to customize this for your purposes. Not pretty. My actual attempt was to take a type name as a parameter, inspect it, and then generate some boilerplate code. Well, I tried but gave up after being unable to construct some type. Maybe I didn't try hard enough. Anyways, macro-writing should be that hard! My proposed solution certainly is just a sketch of an idea, essentially pointing to prior art. I don't claim that this will in fact work nicely or even that it will work at all. I am pretty confident that it might, though, and I am planning to give it a shot later on. Maybe extending Template Haskell with features similar t o Scheme's syntax-case might be enough, for a start. And yet, I don't consider this a high-priority project, since a lot of uses for macros in Lisp can be solved differently in Haskell, as has also been mentioned in the comments to my previous post:
  • Controlling the order of evaluation is not necessary in Haskell since, due to lazyness. And if we have to control it somehow, we mostly use monads.
  • The whole category of (with-something (locally bound vars) ...) can be implemented almost as conveniently using withFoo \locally bound vars -> do ...
  • A lot of cases for special syntax can be achieved using clever operator and constructor naming. E.g., in wxHaskell: t <- timer f [interval := 20, on command := nextBalls vballs p], or, for an in-progress project of mine I simulate a convenient assembler syntax by allowing a notation like: res <-- a `imul` c. However, I was not able to use the <- notation, since I have different scoping rules than Haskell and I'm not in a monad.
  • Many cases of boilerplate code generation can be covered using generic programming, e.g. using Scrap Your Boilerplate.
So where would (more usable) macros still make sense?
  • Allow more flexible syntax for domain-specific embedded languages (DSELs), e.g. an XML library or a parser library might profit from this right now. (Yes, I think Parsec could be more readable). Also, DSLs like Happy would be even nicer if embedded directly into Haskell. Arrows and Monads were considered general enough concepts to introduce new syntax for them, but I think there's more out there that deserves it. I also think that an upcoming project of mine might hit the limits of what's currently possible in Haskell. Some people seem to agree.
  • Speaking of ParseC there's still one common use for Lisp-macros: optimizing at compile-time. You can get quite far by carefully designing your combinators for your DSELs. However, combining nice syntax and performance is very hard. In Lisp, the loop embedded language, for example, does quite heavy transformations on the given code. Partial evaluation is probably the more general solution here, but it seems to be not quite ready for primetime, yet.
  • The point, that a powerful enough system would essentially make syntactic sugar a library can be seen as a positive side effect, too. But I think this doesn't have much practical significance.
Bottom line: There certainly are less uselful applications of macros in Haskell than, e.g. in Lisp, but there are serious enough arguments to at least consider them.

Thursday, December 14, 2006

Syntax

Coming to Haskell from Common Lisp, I soon started to miss macros, and tried out Template Haskell .. and gave up. Haskell's syntax parse tree is way too complex to be usefully manipulated by Haskell functions. You can get used to Lisp syntax, but you always have to justify it to outsiders and there certainly lies a lot of value in resembling mathematical syntax. I certainly agree with dons' post on this issue. If only it weren't for the disadvantages! Sure, syntactic extension have to be done carefully, and powerful abstraction mechanisms in the language always have to come first. But having macros as an integral part of the language definition would result in desugaring being a library instead of a part of the compiler. This is a very powerful way of syntactic abstraction. I know that ML, being designed as a meta-language, has some ways of syntactic abstraction, though I haven't took a closer look at these, yet. Let me however outline a way of achieving macros almost as powerful as general Lisp macros, which works for languages with less simple syntax. This system is (partly) implemented in the Dylan programming language and is called D-Expressions. It is essentially an extension of the Lisp syntax. Lisp in its most primitive form it looks like this: Expr ::= Atom | ( Expr* ) This represents a simple tree. However, why should we restrict ourselves to represent nested structure with parens? We might as well use brackets or braces or keywords. So: Expr ::= Atom | ( Expr* ) | [ Expr* ] | { Expr* } | def Expr* end This way we still retain the unambiguous nesting structure, but have a more varied syntax. Reader macros in Common Lisp (and probably Scheme, too) do this and already perform some sort of desugaring, by translating { ... } into (some-macro ...). This however has one big problem: You can only have one macro for { ... }. Common Lisp "fixes" this by using some prefix to the parens, e.g. #c(3 4) denotes a complex number. This isn't exactly beautiful and doesn't scale well, though. In fact we'd rather like to have the chance to use this more variable syntax inside our macros and we'd need a way to make it scalable. Dylan solves this by allowing three kinds of macros:
  • Definition-style macros have the form: define modifiers* macro-name Expr* end
  • Function-style macros have the form: macro-name(Expr*)
  • Statement-style macros have the form: macro-name Expr* end
Macros are defined using pattern matching rewrite rules, e.g.:
define macro when
  { when ?cond:expr ?body:body end }
    =>  { if ?cond ?body else #f end }
end
Here ?cond:expr states that the pattern variable cond matches a form of the syntactic category of an expression. Similarly, for ?body:body. Multiple patterns are possible and extending this to Lisp-style abstract syntax tree transformations is possible, too, as shown in this paper on D-Expressions . Adding this feature to Haskell would probably require some small modifications to the syntax of Haskell, but I think we don't have to drop whitespace sensitivity. This way embedding DSLs in Haskell should be even more seemlessly and rants about the do-notation would not be needed. Here's how the implementation of the syntactic sugar to list expressions could look like:
macro do {
  [| do { ?e:expr } |] => [| ?expr |]
  [| do { ?e:pat <- ?e:expr; ??rest:* } |] 
    => [| ?expr >>= (\?pat -> do { ??rest ... }) |]
  [| do { let ?pat = ?expr; ??rest:* } |] 
    => [| let ?pat = ?expr in do { ??rest ... } |]
 ...
}
where ??rest:* matches anything up to the closing "}" (resulting the pattern variable rest to be bound to a sequence of tokens) and ??rest ... expands all tokens in rest. Sure, there are a lot of open issues to be solved--e.g. type specifications representing a seperate sub-language of Haskell, and how to actually achieve whitespace sensitivie macros--but I think it would be very useful extension. Comments welcome! :) Edit: The last example actually is just a mockup of some possible syntax and does not even make much sense in Dylan either. But you get the idea (I hope). I've been pointed to a similar idea, called Syntax Macros. Seems to be along the same lines.