Prexonite Script: Shadow ids in Prexonite Script

Shadow ids in Prexonite Script

Posted on 2009-03-02

Downloads¹

New this time around is the very small but useful addition of shadow ids, a feature not entirely unlike the concept of shadowing in C#, thus the name.

Imagine you are working with two comparatively complex libraries, say a convenient wrapper around the Prexonite compiler and some sort of templating engine. If the respective authors of those two libraries both decide to use the identifier `load` for one of their functions, you end up in trouble.

Even though it is absolutely possible to define different symbols (i.e., names) to differentiate between those functions, Prexonite's architecture does not allow two functions with the same physical name inside one application. And since the physical id happens to be the id the function has been defined with, a collision is inevitable.

Enter shadow ids

Have a look at the following example:

function as load, compiler_load (path, loader)
{
  if(loader is Null)
  {
    var engine = asm(ldr.eng);
    var app = asm(ldr.app);
    loader = new Prexonite::Compiler::Loader
      (engine, app);
  }
  loader.LoadFromFile(path);
}

This function is a wrapper around the Prexonite.Compiler.Loader.LoadFromFile method in some fictional compiler library file. Notice the keyword as in front of the list of names for that function. It means that the function itself is anonymous, i.e. it's actual name is generated by the compiler. To access the function, you may use any of the alternative names specified.

In addition to that, a generalization for shadow ids is provided for easy function alias definition:

function compiler_load as load (path, loader)
{
  …
}

…which defines an ordinary function compiler_load and immediately adds an alias load for convenient access. This is really just syntactic sugar for

function compiler_load (path, loader)
{
  …
}
declare compiler_load as load;

Limitations

Shadow ids do not provide obfuscation, they are formed by appending a unique string to the first id in the alias list. So function as f, g(){} will be named something like f\gene05c25a4869140c599e4982665b48417.

Also, f will be persisted in meta data as the functions LogicalId. This, however is an implementation detail and must not be relied upon.
There is currently no equivalent for global variable definitions, although such a feature is planned.

Debugging support

The Prexonite Script architecture was never built with debugging in mind. Although line/column/file information is (somewhat) propagated to the AST, it is not used thereafter. There is no mechanism to map instructions to code lines (or vice-versa). Nontheless I am currently prototyping a simple, byte code based, command line debugger.

It will probably come in the form of a library with a special breakpoint function, that invokes an environment not unlike the interactive mode for Prx.exe. It supports inspection of the stack, local variables, watch expressions, stepping (through byte code, not statements) as well as step into functionality.

Have a look at this example session:

Local variables of main
 var y = 2;
 var x = 1;

Stack of context of function main
 1: 
 2: 
code = function main
[
LogicalId main;import False;
\symbol_mapping {0 -> 0,
1 -> 1};
]
 does asm {
var y,x
   00: @func.0 debug_break
        ldc.int 1
        stloc x
        ldloc x
        ldc.int 1
        add
        stloc y
        ldloc y
        ldloc x
-> 09:  cgt
        jump.f 14
        ldloc x
       @cmd.1 println
        jump 16
   14:  ldloc y
       @cmd.1 println
   16:  ldc.string "Bye!"
        cmd.1 println
   18:  ret.value
   19: }

main@9>

Notice how references to local variables are shown by-name even though the underlying bytecode actually uses the faster by-index opcodes. At the top you see the list of local variables, followed by a display of the current stack. Above the prompt is an enhanced view of the bytecode with only important addresses marked (beginning, end, current and jumpt targets).

What is missing right now, is step-into. Because the debugger is invoked on a per stack context basis (aka function activation). Enabling the debugger for calls made from the current context requires intercepting all corresponding calls. The task is non-trivial because closure, coroutine and structure method calls are statically indistinguishable from calls to aribrary implementations of IIndirectCall and friends.

Macro system

Another area of development is the addition of a more user friendly macro system into Prexonite Script. Currently, one can rewrite undefined ids, a mechanism that is not supposed to be a replacement for macros but rather a means of changing the default handling of unknown ids.

Alternatively, compiler hooks can be used to transform entire functions. Not only is it very tedious to find the specific calls to replace, it's also not guaranteed that you find every instance of a certain expression by traversing the tree. All uses of compiler hooks share a similar pattern of searching a certain dummy function reference and replacing it with custom nodes, using the transformed function itself and any of its parameters as arguments.

A possible simplification of this scheme would be the explicit support for macro functions:

macro square(x)
{
  //Define a
  var tmp = define("tmp");
  var assignment = ast("GetSetSymbol",SetCall,tmp,[x]);
  var computation = multiply(tmp,tmp);
  return block([assignment, computation]);
}

The macro square transforms calls to it like ( square(x+2) ) to an expression like ( let tmp = x+2 in (tmp)*(tmp) ). The details are not fleshed out yet. A more concrete discussion follows.

Footnotes

The linked downloads are from January 2010 instead of March 2009. When converting the blog entry, I felt that this was the safer approximation than taking a random commit from around March 2009 (I no longer have that exact download on file).