Type System

From dis-Emi-A

Revision as of 20:36, 27 November 2006 by Mortoray (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search


This document explains the typing we intend to use in the language and gives the reasoning behind it. Refer also to Automatic types and Types.

Contents

Goals

The basic goals of the typing system are:

allow compile-time type checking 
In theory the complete set of constraints must be checkable at compile-time. This provides a level of certainty that the code is correct without needing to run it. Additionally it allows a great degree of flexibility in the compiler as to how it compiles the code.
guarantee type safety 
As it is recognized that a perfect compiler may never exist, and a very good compiler will take years to create, type safety has to be guaranteed at run-time in addition to compile-time.
eliminate code duplication 
No functional code sequence should ever need to be duplicated anywhere in the code -- a full and rich set of patterns must be available. Additionally, redundant, though non-trivial, code fragments must also be expressable in shorter form (to avoid syntax bloat).
maintain all known information about a type 
Anything known about a type is a hint to the compiler about what is desired and how to check or compile that item. Anything the programmer may know about the type must be expressable in some fashion.
function within an auto-typed system 
The resulting system must lend itself to be automatically typed.
allow performant implementations 
The system must allow for a reasonable static analysis to determine what is the best manner in which to compile the typing information. Constraining the compiler to do only one thing hurts performance, either in size, speed, or both.
exist on a real operating system 
Access to/from real binary resources and interfaces must be done from within a langauge native syntax. Exotic systems will be exempted from the basic langauge definition.

Automatic Typing

The system should for the most part allow all types to be determinable by the compiler where the programmer need only specify particular constrains or ambiguity resolutions as they arise.

As we intend to incorporate an exhaustive set of typing options this becomes vital to preventing the system from being overwhelming. The programmer need only specify what they know for certain, the compiler will determine everything else.

Automatic typing immediately allows for parametric functions without additional syntax. It follows that the whole typing system should allow for such parametrics as a whole in a similar fashion.

Refer to automatic types.

Compile Check

As automatic typing yields most typing information about data it must still make some decisions about how to compile, and it must ensure the code repesents what the user intended.

Any explicit typing the compiler finds must therefore match at least one available option revealed by auto-typing. Should there be a mismatch the compiler will emit an error.

If the compiler is too limited to determine the exact typing situation it will make its best attempt and allow mismatches to be resolved at runtime via an exception mechanism. Note that these situations will still be considered programming mistakes with the intent that a more mature compiler could find them.

Fundamental Immutables

In order to allow a good assignment definition there needs to be a complete set of basic types which exist in an immutable form. This will allow basic assignment in all typical cases of fundamental use.

A manner in which to mark custom types as immutable is required and the compiler should be able to determine this statically.

 ? Tuple Operations For Units

Units could be expressable as another type combined with a unit:

speed = ( 4, m/s )

In this fashion the fundamental type operators could be overloaded to handle these units:

OP(+) :> ( ( a, au ), ( b, bu ) ) -> ( c, cu ) 
{
  c = a + b;
  cu = au + bu;
}

And then the operators can also be overloaded to deal with the type information itself (logical combination of the types).

Pros:

  • any function built from fundamental operators automatically supports units
  • requires no additional syntax to support (other than tuple pattern matching)
  • works with any type of unit information (not just a predefined set)

Cons:

  • requires additional form of each function which supports units

(Not Desired) Functional Evaluation

To provide one consistent manner in which to defined functions and or values, such that they can be used easy and interchangeably there will be a notion of automatic function calling, that is

name

is equivalent to

name()

in cases where name is a function taking no parameters and returning a single value. This allows closures to easily used in any case where a value is expected.

Pros:

  • very flexible use of closures as parameters to functions
  • allows easy exchange of data with procedures

Cons:

  • creates a distinction between single and multiple parametered closures
  • creates a two distinct RValue evaluation paths for any value (which creates an infinite loop on the one path)
  • how does one get a reference to the closure then rather than the value

Although the pros can be fairly attractive rectifying the cons creates far too complicated of a system. The decision is then to not do any form of automatic evaluation. Should we want to override the access to certain member variables (that is, create custom getters/setters) perhaps then a notational equivalence could be introduced.

Closures

The closure syntax is already the basis for the typing system, in that we don't consider any difference between what is a function, closure, or type. We need to extend this to indicate what happens at creation and destruction, and how we indicate the definition can't be called directly.

TypeName :> ( -> obj )
{
  obj = @this;

  data1 = vale;
  data2 = value;

  MemFunc :> ( a -> c ) { ... }
}

With shorthand syntax for functions, which is yet to be defined, the first line would actually look like this:

TypeName :> ( -> @this )

Pros:

  • uses a universal syntax (unification of functions, classes, closures, etc.)

Cons:

  • doesn't give one the feel of defining a type
  • requires some altering of what :> means inside an existing definition (though maybe not)
  • constructors are always publically accessible (could this be limited somehow, maybe an auto-only keyword)

Binds

Once we have some sort of structures defined we need to have a way in which we can refer to members of that structure. This will be called the bind syntax, and it must be available temporarily and assignable.

By example:

a.b

Means b within the context of a. Whether b is simple data or a closure is irrelevant since those are handled by a unified syntax already.

In this sense . is actually a binding operator between the context and some name.

 ? Explicit syntax for containers

The auto-typing itself automatically allows for container types. For example, here is some simple list code:

ListNode :>
{
  data;
  next;

  Create :> @new, value -> node
  {
    data => value;
    next => end; //where end is some reserved value
    node = @this;
  }

  Append :> value
  {
    next => Create( value );
  }
}

ListNode from up-typing will be applicable to virtually all types in the system. When ListNode is used in another function eventually the code will attempt to insert a defined type into the list, at this point the compiler know what type of list is being used.

To capture programmer knowledge a syntax to express this constraint is required. This also requires some fashion to pass this information to the ListNode itself (note that passing an explicitly typed value to Append would also fully type the list.)

Not to confuse this with function calls, we'll introduce a new parameter tuple syntax:

:ListNode<Integer>: intList;

And the part of the definition which is modified is:

ListNode<NodeType> :>
{
  :NodeType: data;
  :ListNode<NodeType>: next;
  ...

The rest will be filled in correctly be autotyping. Note that the constraint on "next" is not strictly necessary, though it gives good hint to the compiler to maintain list consistency (without this constraint it is possible that two distinct list nodes point to each other since there is no constraint which would prevent that).

Pros:

  • keeps typing at a compile-time level

Cons:

  • requires naming the parametric holder and not just the type (though if the holder variable itself is untype and simply a typed value is added this can be avoided / or we could just allow the parameter notation itself without associated type?)
  • constraint is associated with name and not the definition