Rationale for developing another type system.

This forum is for discussing the development of Rel examples and sample applications.
Post Reply
Chris Walton
Posts: 76
Joined: Sat Aug 18, 2012 2:13 pm

Rationale for developing another type system.

Post by Chris Walton »

***NOTE***
This posting has been superseded by that dated 2013-11-13 below. This version has been retained, to keep the thread coherent. The new version replaces this one completely.
***NOTE***

This is the note I included in Ex 10, with my rationale for using a type system as the example, when Rel already has one.

Why yet another type system? The Third Manifesto (TTM) specifies the requirements for a type system, Tutorial D describes an language that meets those requirements, while Rel implements that language. So why not rely on those requirements, that language, and that implementation?

This type system (SDT) is designed to support problem and requirements gathering and analysis. As such it needs to be able to represent types that may not have a computable representation - see the definition for unbounded_integer for an example.

The base data types are derived from dimensional analysis rather than the lowest common denominator of constructs available to a/some/all computing languages. The chosen set should be capable of defining all but the most exotic of data types, though to define these it may be necessary to combine relational operators with non-operational operators; and/or extend the type system to multiple inheritance or multiple levels of inheritance. They also provide a more natural representation of different types.
The definitions of the various data types used are based very heavily on prior work by Shlaer and Mellor, and the Kennedy-Carter consultancy.

The main motivation for the type system is that it is an essential building block for a project to demonstrate proof of concept of a system development method. Such type system has to tie in with other aspects of the method. In addition, research in the method has led to some very specific ideas about how a type system should behave. Although it is believed that this type system is:
(a) compatible with TMM's requirements;
(b) internally consistent;
(c) complete;
no attempt has been made to formally prove any of these beliefs. Details of the expectations for a type system are spelt out in the note "Expected behaviour of type systems". It is not clear how far this type system conforms to/is compatible with/or contradicts that described in TTM and implemented in Rel.

It is very difficult to understand what the type system in TTM et al actually is. The implications of the requirements are less than clear; the approach of the implementation is accessible only via the grammar; and the documentation for the type system is virtually non-existent. Add to this the perpetual shadow that the approach to type systems described here is different from that described in TTM et al. but any differences are not obvious. then utilising Rel's type system becomes more problematic.

The differences between Rel and the example type system that have been identified so far are:
1. Rel has chosen to make identifiers case sensitive, and keywords insensitive. While there is no reason to believe that case sensitivity or insensitivity is preferable, it is my personal belief that the mix of choices within Rel is the least useful set possible. It leads to constraints on the choice of identifiers that are unnecessary; and leads to anomalies in definition. In Rel it is not possible to define an identifier called ordinal, Ordinal, or ORDINAL. It is possible to have identifiers named boolean, Boolean, but not BOOLEAN. It is also one of the expectations that type systems should be able to define types with different values of case sensitivity, collate sequences and the like. The lack of case sensitivity in identifiers does make implementing this expectation more difficult.
2. It is not possible to specify default values in Rel - whether in type definitions or variable definitions. The INIT construct is concerned with initialising multiple POSSREPs when a value is supplied to one of the POSSREPs.
3. The set of basic types in Rel is based on computer implementation types. SDT has a set of basic data types, based on dimensional analysis.
4. There is one known difference between SDT and TTM, though it is hoped that this is only a terminology difference. TTM permits variables to be defined ORDINAL and ORDERED. By contrast SDT uses enumerated and ordinal. The documentation for TTM does not really make it clear what is meant by ORDINAL or ORDERED. Ordinal has the connotation of an ordering system, and it seems that TTM uses it to describe a value from a list: not one that has an inherent order. Enumeration has no such implication. Equally when considering sequencing,the implications of ORDERED are normally only complete ones, rather than both partial and complete orderings.
5. Rel generates a different set of operators than those described as legitimate in the expected behaviour of type systems.

The requirement is for a type system that will support multiple aspects and phases of system development. The main construct needed to describe and implement processing and architecture in a computer system, sequence, is precisely that which is most difficult to represent within a relational system without artificial additional structures.

One approach adopted throughout this development is that all interfaces (or bridges) between parts of the system are as narrow as possible. Defining a type system can be done with about a dozen references to the base types and helper functions of Rel. As use of the built-in operators in conjunction with inherited data types (whether because of lack of understanding of the TTM requirements; lack of understanding of the syntax of Rel; or the current state of development of Rel) seems to have the potential to generate substantial numbers of errors within Rel, minimising the use of such built-ins is deemed helpful.
Last edited by Chris Walton on Sun Nov 17, 2013 11:24 pm, edited 1 time in total.
Dave
Site Admin
Posts: 372
Joined: Sun Nov 27, 2005 7:19 pm

Re: Rationale for developing another type system.

Post by Dave »

Good post. A few comments...
Chris Walton wrote:It is very difficult to understand what the type system in TTM et al actually is.
Have you read Date & Darwen's "Databases, Types and the Relational Model" and "Database Explorations" books? These extensively discuss their type system. However, they emphasise that it is a model rather than a specification, thus leaving various details up to the implementer. Admittedly, the Rel-specific bits are barely documented, but for examples here. On the other hand, Rel is an Open Source project so perhaps someone would like to contribute documentation. :)
Chris Walton wrote:1. Rel has chosen to make identifiers case sensitive, and keywords insensitive.
There's a history behind this. In the earliest Rel versions, identifiers were case insensitive and keywords were case sensitive. This most closely matched the published Tutorial D examples. Multiple users complained about the upper case keywords forcing AN UNREADABLY ALL-CAPS STYLE OF CODING or risked carpal tunnel syndrome from constantly stretching for the shift key, so I removed case sensitivity on keywords. Then multiple users complained about the case insensitive identifiers, because they wanted to make variables named myThing of type MyThing inherited from type MYTHING, so I made identifiers case sensitive. I haven't had many complaints since. Since you're creating a type system, it's almost inevitable that your terms will collide with Tutorial D keywords and BOOLEAN, as a built-in type name, is a bit "special". I suspect this isn't a problem for typical database-driven application development, where the chances of the problem domain's names colliding with Tutorial D keywords is fairly small. I could conceivably alter the parser to skip identifiers when tokenising -- this would allow you to create identifiers that match keywords -- but I suspect code readability might suffer.
Chris Walton wrote:2. It is not possible to specify default values in Rel - whether in type definitions or variable definitions. The INIT construct is concerned with initialising multiple POSSREPs when a value is supplied to one of the POSSREPs.
That's a characteristic of Tutorial D. Some proposals have been made for supporting default values in various contexts, but I believe none have made it into the "official" Tutorial D grammar.
Chris Walton wrote:5. Rel generates a different set of operators than those described as legitimate in the expected behaviour of type systems.
I'm not quite sure what you mean here.
Chris Walton wrote:As use of the built-in operators in conjunction with inherited data types (whether because of lack of understanding of the TTM requirements; lack of understanding of the syntax of Rel; or the current state of development of Rel) seems to have the potential to generate substantial numbers of errors within Rel, minimising the use of such built-ins is deemed helpful.
It's mainly the current state of development of the type system in particular, and even more particularly, what appears to be my lack of error-checking to avoid winding up in an unstable state due to inadvertent recursive references. You're almost certainly the first user to stress-test the type system in the particular direction you've taken it. Indeed, each new "explorer" of the type system tends to initially uncover an entirely unanticipated seam of failure. After a few updates to fix the failures, it stabilises quickly and seems complete and workable -- until the next "explorer" takes it in a new direction.
Chris Walton
Posts: 76
Joined: Sat Aug 18, 2012 2:13 pm

Re: Rationale for developing another type system.

Post by Chris Walton »

I have not read Databases, Types and the Relational Model. It is apparently out of print, and not likely to be reprinted. At least this is the information I received when I oredered it. I have read several of Date's and Darwen's books, and some notes and other papers that have been published. These are largely, links I found on the TTM site. Nowhere do these give a coherent picture of the type system, though many give substantial fragments.

I was not aware of the history of the case insensitivity issue. It is difficult, and so personally dependant that there is probably no one good solution. As a personal note, I made substantial headway in understanding what was going on when I started thinging of THE_value rather than THE_VALUE. Idiosyncratic or what?

As to the set of operators generated (or not generated) I had made a list of those items that did not correspond to my expectations in "Expectations of a type system". I will come back to this but at present, and for a couple of weeks, that list is not available to me, nor is the capability to regenerate it.

I hope your previous experience of the stability of the type system continues to manifest itself.
Chris Walton
Posts: 76
Joined: Sat Aug 18, 2012 2:13 pm

Rationale for new base types

Post by Chris Walton »

This system describes a set of data types, expressed in Rel. This type hierarchy is designed to support problem and requirements gathering and analysis. As such it needs to be able to represent types that may not have a computable representation, or which may be vague or ill-defined.

The data types are derived from dimensional analysis. The set should be capable of defining all but the most exotic of data types, and does allow the extension of the set of data types to deal with such cases. They also provide a representation of types that I find more natural than the in built types within Rel. The definitions of the various data types used are based heavily on prior work by Shlaer and Mellor, and the Kennedy-Carter consultancy. 

A motivation for the types is as an building block for a project to demonstrate proof of concept of a system development method (TSD). Such types have to be usable for other aspects of the method, and to be used by other elements of the method.

There are some differences between Rel and the desired representation of the types. These are:

1. Rel has chosen to make IDs case sensitive, and keywords insensitive. While there is no reason to believe that case sensitivity or insensitivity is preferable, the choices within Rel lead to constraints on the choice of IDs that are unnecessary; and lead to anomalies in definition. In Rel it is not possible to define an ID called ordinal, Ordinal, or ORDINAL. It is possible to have IDs named boolean, Boolean, but not BOOLEAN. The types available should be capable of defining other types with attributes for case sensitivity, collate sequences and the like. The lack of case sensitivity in IDs does make implementing this expectation more difficult.

2. There is no place to specify default values in Rel. While the use of defaults is discouraged in TSD, some standards require defaults.

3. The set of basic types in Rel is based on computer implementation types. TSD has a set of basic data types, based on dimensional analysis.

4. TTM permits variables to be defined ORDINAL and ORDERED. By contrast TSD uses enumerated and ordinal. Enumerated types are simply values from a list. Ordered in TTM implies that the instances of the type can be sequenced. Ordinal in TTM implies sequencing plus navigation operators (first, last, next, previous) defined on the type, whereas ordinal in TSD is a lightweight way of describing both complete and partial orderings. The defined set of types has to support the TSD specification rather than the TTD one. The sets described by the various definitions fulfil the conditions: Enumerated ⊂ Ordered ⊂ Ordinal (TTM) ⊂ Ordinal (TSD).

The main constructs needed to describe and implement processing and architecture in a computer system, sequence, selection, and iteration, are precisely that which is most difficult to represent within a relational system.
Post Reply