1.4. CURRENT APPROACHES TO THE PROBLEM
The problem outlined is well documented and many authors are engaged in research which addresses it. As always researchers bring their own specialism to
-12-
Chapter 1Information from Data
bear on any problem and the different approaches derive from existing schools of research in computer science. Two main themes can be identified under which research into the wider area of this problem domain is currently being carried out.
The engineering approach, which focuses on the system design process, is concerned with successful elicitation of requirements and construction of robust systems to meet these requirements. The tools being developed by this approach are those of Computer Aided Systems Engineering (CASE), formal methods, and more recently Object Oriented Analysis and Design. These methods acquire and build models of the organisational structure, and these knowledge bases inform the design process and may be significant in the maintenance and evolution of the applications. Nevertheless the knowledge is embedded in the design is implicit rather than explicit, and is not generally available to the user. Thus this approach, whilst valuable for the construction of many applications, is not directly applicable to the construction of decision support systems where the user wishes to address and model ad-hoc problems.
Another approach is the application of artificial intelligence. Given an "unintelligent" information system the aim is to facilitate the user's access. An expert system front end can incorporate the knowledge of an expert user and can deduce user's intentions. Alternatively a natural language interface can achieve the same results. Both approaches require encoding of the organisational knowledge in an appropriate form. This knowledge is then accessed by an inference engine which draws conclusions about the request being submitted.
Both approaches suffer from the same shortcoming, which is that they both separate data structure from organisational knowledge. Thus they treat the fact that "customer X has an overdrawn credit account" in a completely different way to the fact that "a customer providing evidence of credit worthiness may be granted credit account facility". Applications based on these approaches will inevitably mean that the user has to navigate two different systems. The seamlessness of this navigation from the user's point of view will depend on the quality of the user interface and will therefore be application dependent.
-13-
Chapter 1Information from Data
1.5. THE RESEARCH DOMAIN
From these experiences, and many similar examples encountered in teaching and consulting across the spectrum of statistical analysis, decision support systems, management information and information systems design, a solution domain emerged.
To tackle the problem it will be necessary to exploit and integrate several current research directions. Current approaches to organisational modelling will be assessed. OOA is an obvious candidate together with the work on data modelling inherent in Object Oriented database development. The gap between operational data and information for decision making is similar to the work on statistical databases and statistical data modelling, so this literature will be relevant as will current work in the design and construction of Decision Support Systems.
1.5.1 Conceptual modelling
First, it became clear that the problem area was not at the implementation level. That is, it was not concerned with how statistical packages operate nor how data is stored in databases, but how knowledge about these two fields is managed and exploited. To characterise current practice, a model is constructed when an information system is designed and built, and this is then lost or forgotten once the system becomes operational. But the operational system cannot tell the user much about the world it serves, and to understand the operational system, particularly why things happen the way they do, requires reference to external documentation, or usually discussion with members of the original project team. Thus when a user wishes to produce management information, summaries and aggregates; let alone more sophisticated forecasts, they have to provide a lot of extra information to make this possible and in particular, knowledge about the types of variables and about their domains. This information is typically referred to as metadata within the statistical community, but is identical in nature to the conceptual models of the information systems community.
-14-
Chapter 1Information from Data
A major theme of this research is the construction of appropriate models to represent the problems being addressed by the development of an information system. The accepted terminology for high level abstract models which the developer uses to communicate with the problem owner is conceptual models and the process of deriving such a model is called conceptual modelling. The consideration of different approaches to conceptual modelling means the discussion of different frameworks for achieving a conceptual model. Logically one should refer to such as a conceptual modelling framework, toolkit or method however this is clumsy and we will follow convention and talk about a conceptual model when it is wished to refer to the method and not just the instantiation of a particular modelling approach.
1.5.2 Domains of discourse
It also became clear early in the research that there are several different areas to this study which many researchers have inadequately differentiated. A conceptual model is traditionally presented as a model of the real world or at least the part of the real world of interest to the organisation, commonly referred to as the Domain of Discourse or Universe of Discourse (UoD). However there are several other domains which need to be crossed in creating a path between the decision maker and the information they require. These are illustrated in figure 1.1. There is the information system itself, an analogue hopefully of the real world, but often we do not distinguish between them and will refer to customer when we mean a customer's record. The author was once told by an enquiry clerk at Paris Charles de Gaulle that the plane he had arrived on had not yet landed. It took some persuasion that landed it had, but the information system had not been appraised of the fact. Within the information system there are other domains all of which contain representations of the real world: there are logical data models and flow charts and binary information stored on disc. These do not directly interest us but often confusion can arise when for example we are told that the department to which an employee is assigned is a character string with a maximum of 20 characters. Somewhere else in the world of interest there will be a manager with a
-15-
Chapter 1Information from Data -16-problem. The term problem domain is used for any representation of the complex interactions, variables and objectives that make up the problem. There will also be a domain in which information is presented, structured in a manner pertinent to the problem domain. The language of this domain will be one of statistical summaries and management reporting: breakdowns, averages, totals etc.
Figure 1. 1 : The various domains implicit in the research
1.5.3 Object orientation
It also became clear that any new approach to this area would have to draw on recent work in object oriented analysis and design. It was going to be necessary to create richer conceptual models which were capable of expressing not only the structure of the real world but also the content of the information systems and the structure of the statistical summaries describing them. It was expected that merely extending conventional modelling tools such as the E-R model would not be a
satisfactory way of describing the different domains to be spanned. The richness of an object model was likely to be needed and the discipline of the object oriented approach would be essential for controlling the complexity in the overall model.