Types
From dis-Emi-A
This page deals with the aspects of what a type is: it's structure, function, and behaviour. It will primarily ignore the aspect of auto-typing though.
Contents |
What is a type
Traditionally a type refers to a specific kind of structure which maintains values, that is, it refers to the structure and meaning of the value of a variable. Furthermore a type may include information on how one accesses that information, for example the difference between a pointer and stack object in C. Further to that the type may include information about the nature of the value, such as the cv-modifiers in C++. Still further, it is not uncommon to see that the storage mechanism of the value is also considered part of the type, consider the register keyword in C, or that OpenGL variables exist in their own "space".
OOP adds a layer to this mix by attempting to map relationships between types: in C++ one can inherit from a base-class, in Java one can implement and interface, and in Ruby one can mixin another type. This layer is less clear to meaning as if the basic aspects of type, for it is this layer that adds the greatest deal of ambiguity -- sure there are strict technical meanings for each operation, but at the logical level it isn't always clear what is desired.
Since there are so many aspects to a typing system it is good to look at all of these aspects and try to come up with a common strategy. Rather than work from the language towards a human model though, we will start from all the human notions of typing and map them towards the language.
Refer to Type Implementations for the techniques used in achieving typing. Refer to Type System for the system in this language.
Structure
These are the concepts dealing with the structure of the data. This is the most logical group to call the "data" itself.
Composition
Composition is the high level concept that indicates a data structure which is comprised of many other data structures. This is the basic C struct type. With such a structure we are indicating that this value is made up of all of the other values.
Such structures need not be simple, nor fixed in size, for graphs and lists are also included in the notion of composition. This inclusion of such items as being composites is technically troublesome, as most langauges have a distinction between POD types and linked types. At an abstract level there is also some difference as well, the POD types use a direct reference to their parts while the linked types require some kind of indexing.
Direct
A composition in which all members are directly and randomly accessible, usually name.
The fields of a tax form are all named and numbered, and they do not exist independently of the form itself.
Indexed
A composition in which members require the use of an index or iterator to gain access to the items. The manner to obtain an index may be through numeric operations, axis traversal, searching, or other specialized operation.
The book collection in a library are accessible via searching operations. This is not an aggregate insofar that the book themselves comprise the collection -- one could however say that the book collection is aggregated into a library.
Aggregation
A distinct form of a composition in which the members are not actually owned by the aggregration, they are rather referenced by it. Often on creation not all the members are present and on destruction the members are not destroyed.
A library and all its members are an aggregate. The people who are members are independent objects, distinct from the library's existence, but have a clear relationship with the library.
Units
Another special attribute of data is that of an attribute. While most compound data types tend to have only one possible unit, the most fundamental of data is often, or always, associated with unit. By units we are referring to SI like units such as centimeters or seconds.
This is considered part of the structure as it affects the intrepretation of data.
Encoding
Many pieces of data may share the same astract value though have very distinct manners in which they are represented. This has to be maintained as part of the structure of the data since it is very important that such information is not lost and that the data is not misused.
For example, the String object in Java contains a string in UTF-16 encoding, whereas a string in C++ maintains the data in an OS specific format (usually ASCII or UTF-32).
Access
These are the concepts dealing with how the data is accessed and/or operated on.
Direct
Data structures are normally comprised of parts, in these cases there needs to be some mechanism to access those individual parts. The most basic technique is by name, in which each field in a structure has a particular name.
Indexed
Often data cannot be retrieved by name alone and requires an index, or iterator, to gain access to some part of the structure.
Bound / Hidden
In some cases the data itself is not actually accessible at all, but rather there are functions which reveal information about the data. In such cases it is almost better to think of the data as a function to receive a result data rather than the data itself.
Engine
Though often considered an aspect of aggregation, an engine refernec is rather distinct. This form is where some value has a reference to an API which contains operations for the elements of the value. This reference can often be done with a pointer, though sometimes it is done with direct linking, global singletons, or other mechanisms.
Encapsulation / Visibility
This is the concept about using the data, it implies groupings of data and in particular sets of functions which are usable by the world, and other sets which are usable only by the "encapsulation" itself.
Security
Related to encapsulation security usually implies that some kind of security token will be required in order to access the data. This usually appears when using RPC where some kind of token is often required to call certain functions.
HTTP authorization requires that data requests contain an authorization signature.
Random access
Random access means that the parts of the data can be accessed in any sequence which is desired.
The elements of a C array can be accessed by an integer at any time.
Directional access
Directional axis usually implies that access via an iterator has restrictions on how that iterator can be manipulated: perhaps it can only go forward, or each item can only be accessed once.
A C++ forward_iterator has directional access to a dequeue.
Interface
The access to the data can be done with a well defined set of functions. This usually implies some commonality to the structure of the data, though not always.
A clear example of this is the Java "interface" and "implements" concept. Though one must consider any function signature to define an interface to certain data.
Nature
These are the concepts dealing with the changes to the data over its lifetime.
Immutable
Immutable data is data that never changes after its initial creation. This is dinstinct from the C++ notion of a "const" object which indicates more of a read-only mode, the object itself can change, just not via the caller. Immutable values *never* change after initialization.
The Java String is immutable as are several of the fundamental types in Python (though immutable version are offered).
Local Mutable
This is the largest class of data and indicates that the data will change over time, though it gives no indication of the frequency of that change. The "local" modifier indicates that the data only changes however with respect to the calling context -- it will not magically change.
Global Mutable
A "global" version of mutable implies the data is changed predictably but may be influenced by multiple processes operating on the same data. The implication is that there is some manner in which the data can be safely and correctly accessed.
There is an overlap with "local mutable" and "volatile".
Volatile
Volatile values are those which may change outside of any well planned or organized context. There is no manner in which to predict the assigned value of this data, it may change at any time.
There is a significant overlap with "global mutable".
Permanent
Unlike "immutable" permanent implies that the data will always be there, and is usually used in conjunction with an index. Permanent data is data which does not under normal circumstances disappear. For example, an insert-only table in a database creates permanent data.
This is strongly linked to the access to the data.
Ephemeral
Some data may only be available for a short period of time, after which the data is lost forever. Users of such data need to know about this limitation.
Data inside a web-cache is ephemeral
This is strongly linked to the access to the data.
Instantaneous
Some data only has meaning at the exact moment it is accessed, and every access thereafter changes that meaning. Such data is considered instantaneous.
For example, the load average on a machine for the past 1 minute is instantaneous, since the particular 1 minute is unique to exact time the request on the data was made.
Patterned
Some data changes in a predictable fashion and it is often useful to know such information.
For example, the current time is always greater than any previous time.
Limits
Though hinted at in the other groups the allowable values of data make up a set. This set is usually expressed though constraints and can either be set by the programmer or are fundamental limits of the underlying system.
Question: Is this actually part of the structure of the data?
Range / Domain
In typical math notation the range and domain refer to the inputs and outputs of a function.
In typical computer use the range and domain often refer to respectively the constraints on a value and type compatibility of a value.
In both cases they refer to limits on the data beyond the abstract type.
Subset / Superset
A subset is limit on the data stating that the value set is fully contained in some other well known value set. The superset then refers to that containing set, and is usually used to refer to the superset of several other subsets of data.
For example, a positive integer is a subset of all integers.
Subtype
If it were fully clear what a subtype is then most of this discussion could probably be avoided. It depends on what we are trying to express in the subtype as to what it actually means. Therefore we will avoid this term and stick with all the other more specific indications of meaning.
