Equivalence Classes

From dis-Emi-A

Jump to: navigation, search


Equivalence classes, also called equivalence partitions, are a manner of reducing an infinite range of input into a smaller set of input (or output as the case may be). Each of these classes is thus a subset of the whole, yet has some property which makes that class distinct from the others.

For example, if we consider telephone numbers, we could partition all the telephone numbers in the world into independent sets based on their country code. Each of these smaller sets is an equivalence class (the country is the same).

Contents

The 100% Tree

For any given input, or output, there is a set of data, or states, which comprise all possibilities for that input, or output. If we make the complete set the root of a tree, we can see that equivalence classes for a hierarchy off of that tree -- such that the union of all nodes at a given level is again the complete set.

This is the complete tree of a data set, and it is vital in prioritizing testing. The exact mathematic properties are of less interest than the actual use of the tree, so there is no need to worry about minor errors or slight inconsistencies.


Levels

A level is the set of all nodes on that tree at a given depth from the root of the tree. These levels are what are used explicitly when prioritizing testing.

For best coverage tests should be conducted on a breadth-first order, such that an entire level is tested before moving on to a deeper level. On each test a single item of that partition is used to perform the test.

Recording which sample was tested is vital, since if you do get around to testing a level deeper you know for sure one of the tests has already been conducted.


Reduction

In order to minimize the amount of effort and provide for best control over coverage, it is desirable to have each level contain the least number of nodes possible which still represent useful equivalence classes.

For input testing the first level should be the same for all inputs, virtually without exception, and those two classes are "valid" and "invalid". "Valid" inputs are all those which should be accepted by the software. "Invalid" inputs are all those which should not be accepted by the software. This immediately forces a good balance between testing of valid and invalid conditions.

On the "Invalid" branch the next two logical options tend to be "syntactic" and "semantic", indicating whether the error is due to syntax of the input, or the semantic value of the input.

Refer to Equivalence Class Listing for a larger list of many standard partitioning criteria.

Example

The table below shows a tree for an "email address" input field.

Tree for Email

In this quick example we assume the program can accept any RFC compliant email address, and it also supports native entry of Unicode domains. Furthermore, the semantic invalid case assumes the program rejects invalid domain names.

Repeated Branches

In some cases there is no way to do a clean partition of the data without producing repetition in the branches. This occurs in instances when there is more than one distinct criteria on which to separate the data, such that the selection of one does not eliminate the others.

For Example, if the program allows the user to create custom attributes for records. These attributes have a basic type (string, number, or date) and an enumeration flag (are enumerated, are not enumerated). If you split on type, you will have the duplicated enumeration branches under each type, and if you split on enumeration you will have duplicated type branches.

In the cases where there are two criteria there are two options:

  1. Create one level with a combination of both attributes
  2. Create a matrix which shows the combinations

Combined

If every combination requires a further expansion of the tree then listing each combination explicitly as a node is the preferred manner. This requires of course that the number of combinations is very small, otherwise too many branches will be produced. If you find that you are producing more than 10 branches, you should consider not branching on this criteria at this time -- or use a matrix instead.

A company's payroll system needs to handle employees differently whether they are full-time or part-time, and whether they are nationals or foreigners. As each combination of types is followed by distinct processing, it makes sense to split the tree accordingly:
  • full-time / national
  • part-time / national
  • full-time / foreigner
  • part-time / foreigner

The need to do combined branches is a good indication that although the criteria look distinct, at this point in the software they are treated as a single field.

In the payroll example, the full-time and national status are treated as a single field and separation makes no sense. Though perhaps elsewhere in the sofware entries are sorted on these two flags indepedently.

Matrix

If this level consists of only leaf items on the tree then a matrix is preferred. If for each case you need to further expand the tree then a matrix may not be the best solution, though if not all nodes go further it may still be usable.

An image needs to be loaded into the program, and the program supports several image types of various color depths. In each case the image is loaded, so the effect is the same, and as there are many combinations.
Format \ Color 1-Bit 8-Bit 24-Bit RGB 32-Bit RGBA
GIF X X
JPEG X X X
PNG

The X's denote a combination that is not possible and need not be tested.

Refer to Test Matrix for more information.

References

  • Wikipedia equivalence partitioning [[1]]
Personal tools