Scope problems

From dis-Emi-A

Jump to: navigation, search


A recurring problem in many languages is the clash of names between a local scope and a more global scope. This is exacerbated by adding classes and namespaces into the mix, further brought to turmoil as inheritance, mixins, static imports and other special language features are introduced.

Contents

Base Case

The base case is shown below (in C syntax)

int g = 42;  //clearly a global g

int func( int a, int b )
{
  int g = a /2;  //clearly a local g
  g += b;  //which g?
  return g/2; //which g?
}

By convention most people will see that the unqualified g in the function refers to the local scoped g, and would know that (at least in C++) you could refer to the global g as ::g, which introduces this problem:

int g = 42;

namespace inner
{

int g = 84;

int func( int a )
{
  int g = 1;

  return 
    g //the local one
    + ::g //the global one with value 42
    ;
}

} //eon inner

So how does one access the 84 value? Indeed, why does :: always start at the outermost scope, if the namespaces are truly hierarchial the ::g should be the 84 value, and we have no way of referring to the 42 value.

NewLang Problem

If we truly wish to have a fully auto-typing and auto-declaring language we run into yet another problem:

g = 13;

myfunc :> a, b -> z
{
  q = a^2;  //clearly a new variable is declared (As q is not known)
  g = a + q; //does this introduce a new variable, or does it use global g
  z = g / 2; //z is the return variable, as declared in the signature
}

The case for a, b, z is clear, as they are declared explicitly in the function declaration. For the line about g we have a problem: with automatic typing, how do we know if we are introducing a new variable or referencing an existing one?

Let

One solution would be to explicitly require that all declarations of new variables are appropriately marked:

let g = 13;

That says we are introducing a new g here. This however leads to all the traditional problems such as:

  1. In which unit do I need to declare that?
  2. What if it is declared more than once?
  3. Needing to declare all variables before use (we wanted to avoid that)!

We would still nonetheless need some kind of syntax to refer to variables defined in another scope...

...one might wish to argue that if you need to access a variable in a higher scope you should just not declare a local variable with the same name. This is generally proper practice for most new code, however it isn't always possible. The basic "stuck" scenario is if you are writing a unit for a program which has its own globals defined, and are hooking in an external base class, which also happens to use the same variables for its class members. You will need some way to distinguish which variables you are using.

Explicit Global

One option, used by PHP, is to explicitly mark the use of global variables within a function. This keeps the namespace for variable exclusively local. PHP actually goes further and provides a global array which allows the binding of global variables to distinct local variable names.

$q = 42;
$r = 80;

function()
{
  global $q; //introduce q locally
  $global_r =& $_GLOBALS['r']; //introduce r with a new name
  
  $r = 10; //this is now a local r
  $q = 15; //this modifies the global q
  $global_r = 20; //this modifies the global r
}

This approach actually works reasonably well when you have a rather flat namespace (global/local). PHP uses yet a different technique to resolve class members. Though there is a significant problem with arbitrary scopes (anonymous scopes in functions, for-loop/if scopes): how do you go about declaring what access what, and it introduces a lot more work.

PHP avoids this by doing something many people are very unfamiliar with (many even unaware of when using PHP); there are no further scopes within a function, all blocks, loops, and conditions use the same namespace for their variables. Since the user usually has complete control over their function this is workable, though it adds unsual problems with loops and the iteration variable.

Goal

The goal is of course to get the cleanest code which is not ambiguous and is not hard to program for. Most people are willing to agree that named variables refer to items in the local scope first, so we can work from there are assume non-locals require some kind of extra notation. So first we need to reiterate the problems we know from other languages:

  1. C++ missing relative scope (the local g, and the global ::g, what about in between)
  2. PHP iteration variable (temporaries from one iteration retain their value the next iteration unless specifically reset)
  3. Java/C unnoted ambiguity (two equally accesible g's, the compiler doesn't know which one you want)
  4. (Various) single/flat namespace (all variables in one namespace there is no doubt as to which is used, but the naming convention becomes a pain)

. . .

Personal tools