Templates and the C++ syntax tree

In my last few posts I’ve been writing about C++0x templates. Specifically I’ve been talking about how to make template classes easier to read by allowing more-imperative/less-declarative and more-iterative/less-recursive class definitions. I’ve talked about extending the packing operator and adding code-generating scripting to the compiler.

The C++ compiler already has “scripting” of sorts, in the form of the preprocessor. But this is very limited, a declarative language without recursion (or iteration), at least not without very clever hackery (see the Boost Preprocessor BOOST_PP_.. macros and my earlier post about using BOOST_PP_.. macros).

The usual way of generating C++ code is with templates, which are declarative and allow full recursion. Templates should be all you need, but they are surprisingly difficult to use. This is partly due to syntax, but mainly due to the fact that templates only deal with large syntactic sub-trees (class and function definitions). You cannot use a template if you just want to generate an expression or an argument list. Also templates only have integers and classes (structs) available as data structures (although MPL helps here a little), and that template selection (pattern matching) is primitive (you cannot easily match patterns buried inside structures or lists).

In a previous post I talk about how convenient it would be if we could use the packing operator to generate member variables for the tuple class.

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  class
tuple
  {
      // Member variables - not a legal use of ...
      private:
      TYPEs ...
    values;

    /* .. etc .. */
  };

It’d be nice if you could write a meta-function using templates to generate the sequence of member-declarations necessary (the BNF non-terminal for the sequence is called member-specification).

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  struct
generate_member_variables
  {
      // Generate member declarations like this:
      //   TYPE0 member_0 ;
      //   TYPE1 member_1 ;
      //   TYPE2 member_2 ;
      //    .. etc ..
      typedef
      std::grammar::
      generate_member_specification_from_types
       <  0
        , "member_"
        , std::grammar::list< TYPEs ... >
       >::eval
    eval;
  };

  template< typename ... TYPEs >
  class
tuple
  {
      private:
      // The compiler will change the following struct
      // into a sequence of member variables.
    generate_member_variables< TYPEs ... >::eval;

    /* .. etc .. */
  };

To make this work you’d need a library of meta-functions (templated structs) and classes to represent parts of the C++ syntax tree. The compiler would also have to recognize these snippets of syntax tree and compile them into the program like they were a normal part of the token stream. (You’d also need to be able to manipulate lists, tokens, literals, identifiers, and strings at compile-time.)

In the above example we assume a meta-function library that lives in namespace std::grammar and can build a struct that the compiler will recognize as a sequence of member variable declarations.

With these changes you could specify member variables something like this.

/* fantasy code -- will not work */

using std::grammar::member_specification;
using std::grammar::member_declarator;
using std::grammar::type_name;
using std::grammar::identifier;
using std::grammar::list;

  struct
my_struct
  {
    // example 1
    // generates these member vars:
    //   int m_id;
    //   double m_price;
    //   char* p_name;
    codegen typename
    member_specification
     <  list< int, double, char* >
      , list< "m_id", "m_price", "p_name" >
     >::eval;

    // example 2
    // generates these member vars:
    //   my_class1 member_var1;
    //   my_class2 member_var2;
    codegen typename
    member_specification
     < list
        <  member_declarator
            <  type_name< "my_class1" >
             , identifier< "member_var1" >
            >
         , member_declarator
            <  type_name< "my_class2" >
             , identifier< "member_var2" >
            >
        >
     >::eval;

    /* .. etc .. */
  };

This example introduces the keyword codegen which tells the compiler to expect a special type. A little like typedef, ‘cept not really. When the compiler sees codegen it knows treat the type as a C++ syntax tree and expand it into inline code.

The C++ compiler could be much more code-generation friendly. Defining meta-classes and meta-functions to represent and manipulate parsed C++ code is a good first step, along with adding a way to inline these syntax-tree objects into the code. Boost MPL provides a good of example of how those meta-objects can be defined.

This would let you build small parts of the syntax tree, like parameter lists, expressions, and initialization lists that you could reuse. These in turn would make it easier to generate larger class and function templates.

Variadic tuple<..> — imperative meta programming

Since my last two posts (here and here) are about how might unpack and generate code if it were very smart, I thought I’d bring up a more difficult case. How do you build templated constructors inside the variadic template tuple class?

Remember, this is all fantasy code. The operator will not actually be able to do any of this. Also remember this problem can be solved by defining tuple using a recursive first/rest typelist scheme (see the current Boost implementation).

At first glance it might appear that it’s easy to define a constructor. Just do this.

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  class
tuple
  {
      // Member variables - not a legal use of ...
      private:
      TYPEs ...
    values;

      // Constructor
      public:
    tuple( TYPEs && ... args)
      : values( std::forward( args)) ...
      { }

    /* .. etc .. */
  };

This is pretty good, but we’d like to make all the arguments optional. So we could do this.

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  class
tuple
  {
      // Member variables
      private:
      TYPEs ...
    values;

      // Constructor
      public:
    tuple( TYPEs && ... args = TYPEs( ))
      : values( std::forward( args)) ...
      { }

    /* .. etc .. */
  };

I think the double expansion in the parameter list works. I’m pretty sure you only specify immediately before the variable to be packed, and that all the already-packed variables in the expression will be unpacked at the same time. So you don’t specify again after TYPEs( ).

In any case, this still needs improvement. This requires that all TYPEs have a copy constructor even if they are going to be default constructed. We really want a bunch of constructors, taking zero, one, two, etc arguments.

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  class
tuple
  {
      // Member variables
      private:
      TYPEs ...
    values;

      // Generate a family of constructors (not legal C++).
      // If TYPEs has three types <A, B, C> this
      // generates four constructors:
      //   constructor( )
      //   constructor( A &&)
      //   constructor( A &&, B &&)
      //   constructor( A &&, B &&, C &&)
      for... ( size_t
          N = 0
        ; N <= sizeof...( TYPEs )
        ; ++ N )
      {
          public:
        tuple( TYPEs && unpack...( N ) args)
          : values( std::forward( args)) ...
          { }
      }

    /* .. etc .. */
  };

Yikes, that’s not C++, and we’re not in Kansas anymore. In this fantasy code for… is a meta keyword, interpreted by the compiler and producing a series of constructors. And unpack…(N) only unpacks the first N items in the packed variable TYPEs, and not the entire list.

Instead of for… in the above example we could use plain old for and say the compiler should run code that is not in a code-defining context. But we’d still need to quote or emit the tokens inside the loop above. Something like this.

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  class
tuple
  {
    /* .. etc .. */

      for ( size_t
          N = 0
        ; N <= sizeof...( TYPEs )
        ; ++ N )
      {
        emit {
            public:
          tuple( TYPEs && unpack...( N ) args)
            : values( std::forward( args)) ...
            { }
         }
      }

    /* .. etc .. */
  };

This implementation is still incomplete however. The constructor methods themselves should be templated so you can construct tuples from values with convertible types. With that in mind, here is the last implementation.

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  class
tuple
  {
      // Member variables
      private:
      TYPEs ...
    values;

      // Generate a family of constructors.
      // If TYPEs has three types <A, B, C> this
      // generates four constructors with compatible
      // types:
      //     template< >
      //   constructor( )
      //     template< typename AC >
      //   constructor( AC &&)
      //     template< typename AC, typename BC >
      //   constructor( AC &&, BC &&)
      //     template< typename AC, typename BC, typename CC >
      //   constructor( AC &&, BC &&, CC &&)
      for... ( size_t
          N = 0
        ; N <= sizeof...( TYPEs )
        ; ++ N )
      {
          public:
          template< typename generate...( N ) ARG_TYPEs >
        tuple( ARG_TYPEs && ... args)
          : values( std::forward( args)) ...
          { }
      }

    /* .. etc .. */
  };

This introduces the fantasy generate...(N) meta function to generate a template parameter list of N typenames. This also unpacks values and args at the same time (and in parallel) even though values is packed from TYPEs while args is packed from ARG_TYPES. And args usually packs fewer names than values.

This kind of imperative code generation is sometimes easier to create and understand than the declarative templates C++ currently provides. But it also adds more syntax and keywords to C++, which is plenty complicated already. And a declarative solution that isn't as restrictive as C++ templates may be a better than this imperative suggestion. So I'm not advocating any of this stuff. I'm just thinking out loud, writing it down to see what it looks like.

Variadic tuple<..> — the get<N>() method

In my last post I talked about how a naive implementation of tuple might be easier to understand, and I showed how you might define the member values with the operator, if only were very smart about unpacking (which it isn’t).

The get<N>() method presents additional challenges.

/* fantasy code -- will not work */

  template< typename ... TYPEs >
  class
tuple
  {
      // Member variables
      private:
      TYPEs ...
    values;

      // Generalized get<N>( ) template method
      private:
      template< size_t >
      void
    get( ) const
      { BOOST_STATIC_ASSERT( false); }

      // Specialized get<N>( ) methods:
      //   get<0>( ), get<1>( ), get<2>( ), etc
      // Introduces indexof...(), which is like sizeof...()
      public:
      template< >
      decltype( values ) &
    get< indexof...( values ) >( )
      { return values ; }
      ...

      // Specialized get<N>( ) const methods
      public:
      template< >
      decltype( values ) const &
    get< indexof...( values ) >( ) const
      { return values ; }
      ...

    /* .. etc .. */
  };

You can see how this would work if only could unpack to create a bunch of method specializations. This also requires indexof…(values) which expands into 0, 1, 2, 3 and so on as it unpacks the member variables packed into values.

This would also have to work when a type was const or a reference. But that’s a topic for another day.

Next Page →