Data Typing Is A Relic |
Written by Ian Elliot | |||
Friday, 08 February 2013 | |||
Page 1 of 2 Most modern languages that are thought to be "respectable" are examples of the same approach - strongly typed class-based languages. This could be the single biggest mistake in the history of programming. Strong data typing is generally thought to be not just a good thing but probably the best way to program. Data typing and object oriented programming seem to go together naturally and they reinforce each other in both practical and theoretical ways. These ideas have been at the core of modern programming since objects were introduced to C to create C++ and on to languages like Java and C#. It is such a given that a good language will be strongly typed and implement some sort of class type hierarchy that we judge new languages by it and try to retro fit the missing "features" to languages that don't fit the mold. It is the reason why the latest new languages - Go, Dart, Kotlin and so on - just look like Java clones. It is also the reason why so many attempts are being made to convert JavaScript into Java. However, we may just have grown too accustomed to a single way of doing things. The class-based strong typing mono-culture of Java, C++, C# and so on might just be an aberration. The whole issue of data typing started very early on in the history of computing. Primitive data types were forced on us by the hardware. Most languages implemented a range of primitive data types and we extended this idea to class-based hierarchies without really evaluating the alternatives. As a result something primitive has been elevated to the status of high theory and this makes it difficult to challenge. Even if you finally don't agree with my argument you at least should think about it carefully. First we look at the reason for primitive data typing. Primitive typesLet's consider for a moment the historical roots of data typing and see how it evolved into the more sophisticated idea of the strongly typed class hierarchy. Back in the dark ages of assembler and Fortran programmers lived nearer to the bits. If you wanted to store something in memory you needed to care how it was stored. You needed to know about fixed point, floating point, signed integers, characters and perhaps even strings - although strings were a little a sophisticated for the time. This naturally resulted in programmers needing to know about data types and using operators that were specific to particular types. A little later programming languages such as Basic appeared that attempted to make programing seem easy and natural. Have you ever been confronted by a beginner with the task of explaining was is wrong with something like:
were the text property is a string say. The beginner has a lot of trouble trying to see the difference between "123" and 123. They both "look" like numbers but one is a string and one is an integer. The difference between 123 and 123.0 is another one that causes problems in the same way but it doesn't raise its head in practical situations quite as often. If you try and learn a language like C that is still close to the bits then the situation is a lot worse with byte, short, int and so on and even high level languages like Java that are supposed to have grown up and moved away from the hardware still have similar data types. We need to think about this from an "ideal world" point of view for a moment. The reason we have primitive data types is because it makes life simpler for the language implementer. As programmers we don't really want to get involved in detailed representation of data. Any such considerations are about efficiency and not about design - when efficiency is a consideration use a lower level language like C. For general programming what we want is the highest level possible language that abstracts away from the primitive hardware. Such a language should just work with the data as appropriate. A truly advanced language would just let the programmer write
and it would automatically perform what ever tricks were needed to store the data. Variables would not be typed and type specific operators would simply force conversions as required - and all under the covers. If you think such things are impossible - well yes they are difficult but not impossible. There is also a lot of scope for getting things wrong. For example JavaScript is well known for performing type juggling to make things easier and also for doing things that are unexpected. For example, 1+"1" gives the string 11 but 1-"1" gives the numeric value zero. This isn't good but the reason is that JavaScript confuses the issue by using + to mean string concatenation and numeric addition. The error in design isn't that JavaScript shouldn't do conversions but that it shouldn't use the same operator for different operations. The fact that they are unexpected indicates that its the detail that is wrong not the general principle. JavaScript has only one sort of numeric data and converts between text, boolean and numeric as required. Its far from perfect but it proves that we actually don't need primitive data types. When you next see JavaScript being ridiculed for strange type conversion rules get the critic to explain the difference between "123" and 123 to a beginner. In a modern language the way that data is handled should depend on the operations applied to it and not the assumption of a particular representation. Class Type HierarchiesThis is where things get complicated and it is where it is most difficult to evaluated what we do with an unbiased view. If you start out with the idea that variables are typed then when you move to object oriented programs it becomes clear that each different sort of object is a data type. This is made even clearer by the way classes are used to create objects in the same way that primitive type names are used to create primitive types. Yes. everything is an object, but we know it isn't so. When we write
and then
it is made clear that myClass is a type just like int. Class is the raw material of the type system of most modern languages. The next logical step is to look at the way classes relate to one another. If we use inheritance to create new derived classes then clearly there is a simple relation ship between the base and derived class. The derived class has, by inheritance everything that the base class has so in terms of the type system the derived class is a sub-type. The rule, formalized as Liskov's substitution principle, is that any derived class can be used in place of a base class - because a derived class is also an example of the base class possibly plus some new properties. What all this leads to is the class-based type hierarchy that we have all grown to accept as the right and perhaps only way to do things. The class hierarchy allows us to use strong typing to make sure that we aren't trying to store a string in an int and a base class in a derived class variable. Great! We can catch type errors at compile time. Of course without strong typing there would be no pure type errors. The issue isn't type but applying the appropriate operations to the data. When you write "123" * "2" you are clearly asking for a multiplication and the data should be treated as numeric. As the data can be converted to numeric then this should be done automatically. However a strongly typed language will throw a type error at this point even though all that is missing is a conversion operator. If the data doesn't have the form of a number e.g. "ABC" * "D" then you have a real "type" error that can't be solved by a conversion or a cast. Strong typing catches this sort of error as well as the previous one and this is a real error. |
|||
Last Updated ( Saturday, 09 February 2013 ) |