Erlang as C++ code generator

In my last post I talked about how templates could be extended to manipulate and transform bits of the C++ syntax tree. Parsed C++ syntax objects, like block statements and parameter lists, are just the kind of thing you need when generating code.

But there are other ways to approach the problem. You can generate code as text, spitting out source code files to be compiled. Or you can generate code with the preprocessor.

With the preprocessor you don’t deal with syntax trees of course, since the preprocessor doesn’t know how to parse C++. It only knows how to tokenize the text stream, and so it deals with tokens. Tokens are syntax primitive, and aren’t as abstract as statements or expressions, but you can do a lot with just tokens. Sometimes you want to work on that level anyway, like when you want to generate a piece of code with unmatched curly brackets, maybe to open a namespace.

Of course the preprocessor is weak without recursion, looping, or data structures. But you can imagine a sophisticated language, embedded in C++ source like the preprocessor, that inserted generated C++ token streams. You’d want a language with recursion, data structures, and data types that could be manipulated and transformed.

The Erlang language gives us a hint of what this might look like. (We could also look to Prolog as a model, although it may be overkill, at least for what I have in mind.)

Erlang appears well suited to code generation. It’s declarative and functional like the C++ template language. Functional means a function’s return value depends only on its arguments and will always be the same if you pass in the same parameters. In other words, my_struct<43,char*> will evaluate to the same type as long as the arguments 43 and char* don’t change.

Erlang’s pattern matching and native list structure are good for simple parsing and list processing, which are also useful for code generation. Erlang-style pattern matching and guards are particularly useful if you’re working without strong typing.

Here’s what an Erlang-like preprocessing language might look like embedded in C++. Here we define a simple function, chop_off_head/1, which returns a list sans the first element. (The /1 part of the name means it take one argument.)

// Warning - this is all fantasy/speculative C++
// It will NOT compile.

// How would we embed a language like Erlang in
// C++ source code?

// 1. Surround the meta code in a special wrapper,
// like you do with asm { ... }
erlang {
  -module( cpp).
  -export( [chop_off_head_a/1]).
  chop_off_head_a( [] ) -> [];
  chop_off_head_a( [A | Rest] ) -> Rest.
}

// 2. Make it clear that we're defining parts of a module
// in the special wrapper, and we're not in "shell" mode.
erlang:module( cpp ) {
  -export( [chop_off_head_b/1]).
  chop_off_head_b( [] ) -> [];
  chop_off_head_b( [A | Rest] ) -> Rest.
}

// 3. Keep the Erlang code in a separate file. This way
// we can also load/test the Erlang code in a native
// environment. Allow a "-module(cpp)." line in the
// included file.
erlang:module( cpp ) {
# include "cpp.erl"
}

We have to distinguish module definitions from commands. Modules only define Erlang functions and do not generate any C++ code until the functions are called.

How do we feed C++ tokens to the Erlang functions, and how are the return tokens injected back into the compiled program? I’m thinking the code might look something like this.

// Warning - this is all fantasy/speculative C++
// It will NOT compile.

// What would a call to an embedded Erlang function
// look like?

  template< typename ... TYPEs >
  struct
mystruct_a
  {
    // We could set an erlang variable to build a list
    // and then pass that to the function. The last function
    // returns a list of cpp tokens to be taken up by the
    // compiler.
    erlang {
      X =
      -cpp_tokens_start
        TYPEs ...
      -cpp_tokens_end.
      cpp:generate_member_vars_a( X ).
    }
  };

  template< typename ... TYPEs >
  struct
mystruct_b
  {
    // Or we could unpack TYPEs into a comma-separated list.
    erlang:cpp:generate_member_vars_b( [ TYPEs ... ] ).
  };

In these examples I assume TYPEs… expands to a comma-separated list of types which will have to have some standard representation for the embedded Erlang, maybe as atoms or Erlang tuples.

In the mystruct_a example, X is set to a list of all the tokens of the expanded TYPEs… including the comma separators. Since a type can consist of many tokens (think int const * const &) the Erlang function will need the comma tokens. Unless we get the compiler to parse the tokens for us and set X to an Erlang list of type object.

I use -cpp_tokens_start and -cpp_tokens_end (which is NOT standard Erlang) to escape out of Erlang and tell the compiler to tokenize everything between as C++, and to bundle all those tokens up into a list and assign them to X. If we use delimiters like { and } (curly brackets) we’ll have a harder time specifying unmatched } in our escaped C++ token stream.

Maybe the C++ parser could compose the C++ tokens between -cpp_tokens_start and -cpp_tokens_end into more abstract objects, like s.

The last function in the wrapper, cpp:generate_member_vars_a(X), presumably returns a list of C++ tokens that the compiler takes and and treats as C++ code.

In mystruct_b I was thinking [TYPEs...] would expand into something like [int,float,char] which is an Erlang list. In this case the commas would not be tokenized since they are part of the Erlang syntax.

Finally, here’s what the Erlang code used above might look like.

-module( cpp).

% File contains functions to parse, manipulate,
% and generate C++ token streams.


% == exports ==========================

-export( [make_type/1]).
-export( [make_variable/2]).
-export( [make_numbered_name/2]).
-export( [parse_tokens_type_list/1]).
-export( [transform_types_to_variables/3]).
-export( [generate_tokens_for_variable_decl/1]).
-export( [generate_tokens_for_variable_decls/1]).
-export( [generate_member_vars/1]).


% == record declarations ====================

-record( type, {name}).
-record( variable, {name, type}).


% == make_type/1 =======================

make_type( Name ) ->
  #type{ name=Name }.


% == make_variable/2 =======================

make_variable( Name, Type = #type{} ) ->
  #variable{ name=Name, type=Type }.


% == make_numbered_name/2 =======================
%
% Generates a string from a prefix and a number:
%   make_numbered_name( "my_name_prefix_", 3245) ->
%   "my_name_prefix_3245"

make_numbered_name( Prefix, Counter ) ->
  list_to_atom(
    string:concat(
      Prefix,
      integer_to_list( Counter ))).


% == parse_tokens_type_list/1 =======================
%
% This only parses simple type lists, where each type is
% a single token. Works with type lists like this:
%   type1 , type2 , type3
% Does not work for a type consisting of more than one
% token, like this:
%   type1 * , type2 const & , int ( * ) ( int , char )

parse_tokens_type_list( [] ) ->
  [];
parse_tokens_type_list( [Arg | Rest] ) ->
  [ make_type( Arg)
  | lists:reverse(
      parse_tokens_type_list_tail( Rest, [] ))].

parse_tokens_type_list_tail( [], Collect ) ->
  Collect;
parse_tokens_type_list_tail( [',', Arg | Rest], Collect ) ->
  parse_tokens_type_list_tail(
    Rest,
    [ make_type( Arg) | Collect ]).


% == transform_types_to_variables/1 ======================
%
% From a list of types, generates variable declarations
% separated by semicolons. The variables will have generated
% names. The following:
%   transform_types_to_variables( [int, char], "var_", 34)
% yields a list of 6 tokens like this:
%   int var_34 ; char var_35 ;
%
% Can be used to generate class-member variables.

transform_types_to_variables( Types, Prefix, Counter ) ->
  lists:reverse(
    transform_types_to_variables_tail(
      Types, Prefix, Counter, [] )).

transform_types_to_variables_tail( [], _, _, Collect ) ->
  Collect;
transform_types_to_variables_tail(
    [Type = #type{} | Rest], Prefix, Counter, Collect ) ->
  transform_types_to_variables_tail(
    Rest,
    Prefix,
    Counter + 1,
    [ make_variable(
	    make_numbered_name( Prefix, Counter ),
        Type)
    | Collect
    ]).


% == generate_tokens_for_variable/1 ======================

generate_tokens_for_variable_decl( Var = #variable{} ) ->
  [ (Var#variable.type)#type.name,
    Var#variable.name,
    ';'
  ].


% == generate_tokens_for_variable_decls/1 ======================

generate_tokens_for_variable_decls( Vars ) ->
  lists:append(
    lists:reverse(
      generate_tokens_for_variable_decls_tail( Vars, [] ))).

generate_tokens_for_variable_decls_tail( [], Collect ) ->
  Collect ;
generate_tokens_for_variable_decls_tail(
    [Var = #variable{} | Rest], Collect ) ->
  generate_tokens_for_variable_decls_tail(
    Rest,
    [generate_tokens_for_variable_decl( Var ) | Collect]).


% == generate_member_vars/1 ======================
%
% Given a stream of tokens from a type list, generates
% a bunch of variable declarations.
%
% The following:
%   generate_member_vars( ['my_type_x', ',', 'my_type_y'])
% will generate the following C++ tokens:
%   my_type_x member_var_0 ;
%   my_type_y member_var_1 ;
%
% Another name for this could be:
%   transform_tokens_type_list_to_variable_decls

generate_member_vars( Tokens ) ->
  generate_tokens_for_variable_decls(
    transform_types_to_variables(
      parse_tokens_type_list( Tokens), "member_var_", 0)).

As you can see, expressing ideas in Erlang is a lot easier than expressing them as C++ template meta-functions, which is the reason I started thinking about using it for code generation. Erlang is a clean and easy-to-understand language that could work well at the token level or higher. But you can also see how the Erlang and C++ syntax clash, particularly when feeding C++ tokens to Erlang functions, although my proposal above could probably be improved.

I firmly believe that code-generating languages and development environments will become much more important in the future, and the C++ language is clearly moving in that direction. One tool that could help is a re-vamped C++ preprocessor, maybe based on a language like Erlang or Prolog. Or maybe it could even look like Javascript, which is imparative, and would be good at treating the code as a character stream. Or maybe C++ templates really are the best solution, especially if they could be extended to work with smaller fragments of parse tree.

It’s something to think about.

Arrays, shared_ptr< T >s, and deleters

In my last few posts about shared_ptr<T>s I’ve been using a struct called private_deleter. When you first attach a target object to a shared_ptr<T> you can also specify a deleter, which is a functor with an operator() that takes a single argument, a pointer to the target object, and deletes it.

  // Instances of some_type will be managed
  // with shared_ptr< some_type >s.
  struct
some_type
  { };

  // Deleter that works when you create
  // some_type with operator new.
  struct
some_type_deleter
  {   void
    operator ()( some_type * p)
      { delete p; }
  };

  // This shared_ptr has no deleter, so the
  // target is deleted with the function
  // boost::checked_delete< some_type >(..).
  shared_ptr< some_type >
sp_1(
  new some_type);

  // Boost provides a standard deleter struct
  // that calls boost::checked_delete<T>(..).
  shared_ptr< some_type >
sp_2(
  new some_type,
  boost::checked_deleter< some_type >( ));

  // This target object is bound to a new instance
  // of our deleter struct, defined above.
  shared_ptr< some_type >
sp_3(
  new some_type,
  some_type_deleter( ));

You use the deleter, of course, to control target object deletion. Which is symmetric since you also control target creation. Without the deleter you could only create target objects with operator new to match the operator delete assumed by shared_ptr.

Since the deleter becomes part of the intermediate object (aka the “control block”), you can also use it as a way to add variables and functions without intruding on the target class. For example, you can use it to store a list of notifier functors to be triggered when the target is deleted. Or you can use it as a chunk of memory in which you construct the target object. But in this post we’re just going to use the deleter to delete.

The Boost smart pointers provide a shared pointer just for arrays, called shared_array<T>. It is similar to shared_ptr<T> except it uses operator delete[] instead of operator delete to delete the target. But shared_array<T> is not part of TR1, and it is not necessary since shared_ptr<T> supports custom deleters.

The following shows how to use a deleter so shared_ptr<T> correctly deletes arrays.

# include <boost/detail/lightweight_test.hpp>
# include <boost/shared_ptr.hpp>
using boost::shared_ptr;

  // Deleter to correctly delete arrays
  // (remember it's OK to delete a const)
  template< typename T >
  struct
array_deleter
  {
      void
    operator ()( T const * p)
      { delete[] p; }
  };

  // Example target object
  struct
my_type
  { my_type( )  { balance += 1; }
    ~my_type( ) { balance -= 1; }
    static int balance;
  };
  int my_type::balance = 0;

  int
main( )
{
  // Create an array and bind it to an array_deleter
  // with shared_ptr. The my_type constructor is
  // called 7 times, followed by 7 calls to the
  // destructor.
  {    shared_ptr< my_type >
    sp_array(
      new my_type[ 7 ],
      array_deleter< my_type >( ));
  }
  BOOST_TEST( 0 == my_type::balance);

  return boost::report_errors( );
}

Boost even provides an array-deleter class (checked_array_deleter<T>) so you don’t have to define one yourself.

# include <boost/shared_ptr.hpp>
using boost::shared_ptr;
using boost::checked_array_deleter;

  shared_ptr< my_type >
sp_array_inst(
  new my_type[ 53 ],
  checked_array_deleter< my_type >( ));

Since shared_array<T> is no longer necessary to support array deletion, I suspect it will never become part of the standard library. operator [] is not enough justification for its existence. It will languish in the Boost library, a barely supported dead-end experiment, and will never be integrated with shared_ptr and weak_ptr.

As a final note, another way to work with arrays and shared_ptrs is to use an array wrapper like the TR1 array class. Since you can allocate these objects with operator new you can rely on the default shared_ptr deleter.

# include <boost/array.hpp>
using boost::array;

  shared_ptr< array< my_type, 4 > >
sp_array2(
  new array< my_type, 4 >);

But shared_ptr< array< my_type, 4 > > is a lot more awkward than simple shared_ptr< my_type >, and I doubt it will become a common idiom.

Specializing make_shared< T > and allocate_shared< T >

In my last post I talked about factory_type, a class that supplies a factory function make_new( ) as a static method.

(In retrospect, I suppose factory_type isn’t such a good name since it sounds like the instances are factories. Better names might be factory_made_object_type or gizmo_type.)

Anyway, the code looked like this:

# include <boost/shared_ptr.hpp>
using boost::shared_ptr;

  class
factory_type
{
    /* this_type alias */
    private:
    typedef
    factory_type
  this_type;

  // ======== Disable Copy ============

    /* disabled copy ctor */
    private:
  factory_type( this_type const &)
    ; /* no implementation */

    /* disabled copy assignment */
    private:
    void
  operator =( this_type const &)
    ; /* no implementation */

  // ======== Factory =================

    /* private default ctor */
    private:
  factory_type( )
    { }

    /* private dtor */
    private:
  ~factory_type( )
    { }

    /* the only place where the dtor is used */
    private:
    struct
  private_deleter
    {   void
      operator ()( this_type * p)
        { delete p; }
    };

    /* factory, only way to make these objects */
    public:
    static
    shared_ptr< this_type >
  make_new( )
    { return
        shared_ptr< this_type >(
          new this_type,
          private_deleter( ));
    }
};

Now there’s been some discussion (see here and here) about defining standard factory template functions called make_shared<T>(..) and allocate_shared<T,A>( A const &, ..) with signatures like this:

namespace std {

  template< typename T >
  shared_ptr< T >
make_shared( )
  ;

  template< typename T, typename ALLOC_T >
  shared_ptr< T >
allocate_shared( ALLOC_T const & allocator_inst)
  ;

  template< typename T, typename ... ARG_Ts >
  shared_ptr< T >
make_shared( ARG_Ts && ... args)
  ;

  template< typename T, typename ALLOC_T, typename ... ARG_Ts >
  shared_ptr< T >
allocate_shared( ALLOC_T const & alloc_inst, ARG_Ts && ... args)
  ;

} /* end namespace std */

It is easy to specialize make_shared< factory_type >( ) to use our class’s private factory.

namespace std {

  // specialize make_shared<T>( )
  template< >
  shared_ptr< factory_type >
make_shared< factory_type >( )
  { return factory_type::make_new( ); }

} /* end namespace std */

Let’s try some more complicated factories. Assume factory_type has some additional constructors and factories.

  class
factory_type
{
  ... as declared above ...

  // ======== Constructors =================

    /* private default ctor */
    private:
  factory_type( )
    ;

    /* private ctor */
    private:
  factory_type( float, void *, char = 'A')
    ;

    /* private ctor */
    private:
  factory_type( double, long, std::string const &)
    ;

  // ======== Factories =================

    public:
    static
    shared_ptr< this_type >
  make_new( )
    { return
        shared_ptr< this_type >(
          new this_type,
          private_deleter( ));
    }

    public:
	template< typename ... ARG_Ts >
    static
    shared_ptr< this_type >
  make_new( ARG_Ts && ... args)
    { return
        shared_ptr< this_type >(
          new this_type( std::forward< ARG_Ts >( args) ...),
          private_deleter( ));
    }

  // ======== Factories with allocators =======

    public:
	template< typename ALLOC_T >
    static
    shared_ptr< this_type >
  allocate_new( ALLOC_T const & alloc_inst)
    { return
        shared_ptr< this_type >(
          new this_type,
          private_deleter( ),
		  alloc_inst);
    }

    public:
	template< typename ALLOC_T, typename ... ARG_Ts >
    static
    shared_ptr< this_type >
  allocate_new( ALLOC_T const & alloc_inst, ARG_Ts && ... args)
    { return
        shared_ptr< this_type >(
          new this_type( std::forward< ARG_Ts >( args) ...),
          private_deleter( ),
		  alloc_inst);
    }
};

We’ll find it much harder to specialize make_shared<T>(..) and allocate_shared<T,A>( A const &, ..) to use these class-supplied factories that take parameters because you cannot partially specialize a templated function like you can a templated struct. Forgetting that, you might try:

namespace std {

  // Illegal partial specialization of template function.
  // Does not compile!!
  template< typename A_T >
  shared_ptr< factory_type >
allocate_shared< factory_type, A_T >( A_T const & alloc_i)
  { return factory_type::allocate_new( alloc_i); }

} /* end namespace std */

And the compiler will choke and complain. You could try to fully specialize the template function since partial specialization is illegal, but that means you have to know the type of the allocator beforehand. And allocators tend to come in many types. Another approach would be to define a separate overloaded (not specialized) template function called allocate_new, but typename T would have to be the first template parameter, and the first thing you want to do is specialize that as factory_type. So that doesn’t improve things in this case.

We could make this work if the first template parameter was used in the function’s argument list. If instead of allocate_shared< factory_type >( alloc_inst) the idiom was allocate_shared_2( factory_type::tag( ), alloc_inst) then we could define another template function around ALLOC_T and we would not have to specialize. But in this case that’s not a very attractive option.

Another probably better option is to define allocate_shared<T,A>(..) as a simple call to something like shared_ptr_maker<T>::allocate(..). Then we could specialize the template class shared_ptr_maker<T> instead of the function.

And finally, I’m not sure, but these examples may be an abuse of make_shared and allocate_shared. These were originally proposed as a way to hide use of the new operator since shared_ptr also hides delete. But looking at Peter Dimov’s code suggests the purpose of these functions may now be to implement an allocation strategy, where the intermediate object (aka the “control block”) and the target object are allocated as one chunk of memory instead of as two. If that’s so, it’s unlikely you’d provide specializations or overloads for these functions.

Next Page →