Joost G. Kircz--Elsevier Science, 1055 KV Amsterdam
Hans E. Roosendaal-- Communication in Physics Project, WINS Faculty, University of Amsterdam.
In this presentation, we report a two fold approach to the issues and opportunities modern electronic media pose for scientific information.
The first part of this paper addresses a number of elements in the process of information: needs, transfer, and disclosure in academic environments and discusses results of in-depth interviews with a number of scientists from various fields.
In the second part, we discuss the changes electronic publishing will induce in scientific information handling. We try to analyse the different cognitive components leading to a variety of ways in which information is presented, and we briefly discuss recent research towards a better understanding of the fundamental changes electronic publishing will introduce.
2. Process and needs
2.1- The science process
The main issue to be addressed in the context of electronic publishing is:
"How can it support and enhance the science process"?
Communication is the essence of science, and more particularly, it is the engine of the whole science process ( Reference). The scientific communication process is an object of investigation and it provides data for research programmes in a variety of science studies (Reference).
It would go well beyond the scope of this contribution to describe the science process even in some detail. We will assume here that the science process consists of a system of related, mostly competing research programmes (Reference).
On this basis a number of different stages in the research process from conceptionalisation of problems, to theory, to hypotheses, to predictions and testing, and finally interpretation of research outcomes can be distinguished (Reference)
While we realise that there is no consensus on the above, these different stages lead to a number of main communication needs as experienced by researchers in different fields (see below).
This structure of the science process has a number of social consequences, which are discipline dependent. Most important are common standards, resulting in specific rules and ethics. Furthermore, each scientist has to establish his own position, and this is mainly done through recognition of his contributions to science in the research process. These contributions can be informal and formal and are to a large extent manifested in publications ( Reference).
2.2- Communication needs
Generally, the communication needs result from research needs in the different stages of the science process. Analysis (Reference) indicates the following needs:
awareness of knowledge, both in the researcher's own research domain, as well as in other (mostly related) research domains. Though of particular importance in the earlier stages of the science process, it is a conditio sine qua non throughout all stages.
awareness of new research outcomes: new developments have to be followed closely and need to be accounted for in the research process at the earliest possible stage.
specific information: this means relevant theories, and detailed information on research design, instrumentation, and methodologies.
platform for communication: the researcher should have at his disposal a fully-fledged communication platform satisfying his needs, from the very informal, private discussions to convenient, formal interactions with colleagues.
ownership protection: throughout the entire research process the researcher wants to claim priority of his contribution to the research fields, and needs protection, at a variable degree, depending on the stage, of this ownership; this ownership extends to how the information is communicated and disseminated.
It is inherent to science, and to the science process, that both are in constant flux or growth. In this contribution, two aspects of this constant growth are worth mentioning:
despite the fact that science has been growing at a rather constant pace for more than 300 years (Reference), there is the general feeling that this accumulated growth has lead to an unmanageable pile of information and that the growth of information leads to less effective and efficient communication, threatening in turn the effectiveness and the efficiency of the science process itself.
Recent publications (Reference) address this issue, but in 1979, William D. Garvey (1) already stated:
" ... in some disciplines, it is easier to repeat an experiment than it is to determine that the experiment has already been done".
This is pure destruction of invested capital, and as research funding becomes more and more an issue to the political agenda in many countries, the effectiveness and efficiency of scientific communication is becoming crucial. The inefficiency is partly due to the fragmented information over the many different information sources we have.
at the same time there exists an increasing competition in science: not only competition arising from the dynamics of the competing research programmes, but also for economic and funding reasons; this competition leads in turn to upward pressure on the communication and information system.
In the previous section, we have formulated a number of theses on the communication process as an important engine for the science process. Communication needs are seen to be related to, and have different impact on, the different stages in every research process. The main question then is: how can we increase the effectiveness and efficiency of the communication process for the individual researcher? What are the main elements, what are the main expectations and desires a researcher has?
For our research we identify as key issues:
The information needs: whereas the system is now more fragmented, and segmented, what expectations do researchers have with respect to a more integrated system?
The infrastructure of the system: are we moving from the present more closed system to an open, distributed and fully transparent system, where transparency is defined from the user's end?
3.1- Research design
The above-mentioned key issues are being addressed in field research comprising a number of in-depth interviews with individual researchers.
In our heuristic model there is a tendency to an open infrastructure and an integrated system. This model is being investigated on a stratified sample of individual researchers in the following scientific disciplines:
The objective of the research is to identify the expectations and desires researchers have with respect to the above themes. A number of pertinent themes is probed on a structured way using so-called provocative statements. Opinions of researchers are then further probed, using expert interviewers from the publishing departments of Elsevier Science. In that way, we allow hypotheses and other issues to be put to test, and to be criticised or falsified. Motives for certain opinions, expectations and desires can then be identified. A full description of the research method is given in (Reference). An example of a provocative statement, and some results for the mentioned disciplines, are given in Table 1.
4.1- Main functions
It is useful to distinguish four main functions in scientific communication.
Technological dynamics will clearly influence all these functions, however, not conceptionally, but much more in the way these functions can be performed in the future. Recent technological developments allow novel ways of access to stored information, and this again impacts on the way information needs to be structured (see below). Technological dynamics can then lead to a new architecture of scientific communication, provided this architecture is accepted by the scientific community. This scientific community has in the past proven to be rather conservative in its acceptance of new technology, as is illustrated in the following quote (Reference):
" resistance to new media stems from scientists' concern that the goals of the scientific system would not be fulfilled by these media".
4.2- Acquisition needs
The results of the survey show that researchers have rather well- defined expectations and desires with respect to acquisition needs. We can separate acquisition needs into two parts: demands with respect to the information proper and demands with respect to the process of acquiring information.
4.2.1- Information needs
Reliability - is a conditio sine qua non for information. Whereas some researchers may want to rely on their own judgement, and then only when they are highly familiar with the research reported, the overall majority of researchers wishes to rely on an independent quality check that meets external, known and accepted standards. The main reason is overall expediency and efficiency in the process, as well as convenience. At the same time, the present refereeing system is sometimes questioned. Smaller, highly formalised research areas with a well-defined social structure, such as high-energy physics, tend to move more towards self- evaluation.
Relevance - related to subject, scope, and level of research. Relevance can only be judged by the individual researcher. Structuring of the information and linking of mutually relevant sources of information facilitates this process.
Timeliness - the desired time to access information depends very much on the dynamics of the research. Demands on timeliness therefore vary per research field. Dynamic, closed and formalised research fields with a high demand for priority over certification lean towards self-publishing, either in an informal or formal way. Early access coupled to a proper refereeing system is however preferred.
Presentation - is related to efficiency of communication and convenience. Presentation on paper is still considered superior to screen presentation; however this is not an impediment to acceptance of the latter, as improvements are taken for granted.
Storage - this is probably the most important issue to be addressed for all agents in the publishing chain. The scientific community at large has strong expectations with respect to the following issues:
4.2.2- Process of acquisition
There are a number of different strategies to select, retrieve, and process information. The following main elements come to the fore:
4.3 Dissemination needs
Dissemination of information is seen to serve two main goals (9):
The research indicates that the following familiar issues are considered as remaining important or to becoming even more important:
time to reader
interaction - this is particularly important for feedback.
In general, researchers have high expectations that more direct interaction using electronic facilities for informal and formal communication will increase feedback, and therefore effectiveness and efficiency of the research process.
4.4 Summary of our first results
In summary, the research allows us to conclude the following:
researchers expect and desire a communication system allowing for the integration of needs, as defined from the reader's viewpoint. Integration is not restricted to text, but includes also data, pictures, film, sound, etc.
this requires an open infrastructure. This is not always appreciated by individual researchers.
The agents in the publishing chain may well focus on the following main aspects:
content - there is, as before, a clearly defined, growing need for reliable information, that is easily accessible. Improved standards of certification and preparation of information are being requested.
structure - dissemination of information requires structuring, taking advantage of the modularity of information.
infrastructure - an open, sophisticated infrastructure is in demand.
information management - personal information management tools need to be developed. These could be based on the internal structure of the information.
5. Design for the future
5.1- Introductory remarks
From the studies discussed in the first part of this paper, it is clear that scientific information is contextual in a double sense. Firstly the type of information is different in different fields. A geological chart is a totally different object from a histogram of radioactive decay rates, though both can be displayed as large colour posters. Secondly the usage of different types of information (including the cutting and clipping) is different. The emerging electronic tools already heavily influence the way scientists think and represent their thinking and research results. These two contextual levels of will be expressed differently in different media.
Present day digital information acquisition, storage, and handling techniques represent the apogee of the development which started with the possibility of using electrical devices for information handling. Given the flexibility of these techniques, we see that reporting of scientific research and its technical expressions will be further entangled.
All this is not new; in the early sixties, Marshall McLuhan's famous book "Understanding Media" (Reference ) already heralded discussions on the deep influences new technologies have in shaping culture. Most of these discussions, however, were developed in departments of Mass Communication and Media Studies. Within the sciences, we spent a lot of time and energy in developing these new tools but we hardly analysed the decisive role new technologies have in reporting our own results. In order to be able to understand, shape and use the new media proper, without loosing the essential objectives of scientific communications discussed in section 4. of this paper, we have to dissect the various interacting levels and their components.
5.2- Preparing a research programme
Within the context of our research programme which aims at defining and developing the employment of the new electronic media, we would like to discuss here two different but intertwined components:
- The research and development of different ways of presenting, manipulating, and storing information (see section 5.2.1).
- The developments of methods and tools to enhance the disclosure of information (see section 5.2.2).
Within the following, we take the burgeoning development of sheer storage and transport (bandwidth) capacity as given. These exploding technologies provide the technological infrastructure for novel methods. As interesting as they are, as objects of scientific research per se, they are, however, not critical of the conceptual developments needed to address issues in scientific information handling as outlined in section 4.
5.2.1- Presenting and storing information.
Over the last years, we already saw a most promising development towards a better structuring of information. The Standardised Mark-up Language (SGML), and Hypertext Mark-up Language (HTML) are well known and accepted working standards today (Reference). A quite different approach than just loading classical documents on electronic storage media, leads to research to reveal and structure the inherent modularity of information. Text, pictures, films, animation's, and sound are all separated and independent ways of presenting information. Until now, technology has confined the bulk of information presentation to text with illustrations. At the moment we see an explosion of technical possibilities which make available in addition to texts, all non-textual forms of information. The point is, however, that we do not need additions to texts, but that we need integrated information systems (as already discussed in section 4).
Every kind of presentation of information has its own character and is a different expression of the reported object, phenomenon, or theory. If we really want to value the possibilities of including sound, colour, movies, etc., into regular scientific reporting, we have to analyse their specific riles in the communication process (see section 4.3). Historically, communication is confined to the printed journal, with the result that text is now the most important ingredient. Pictures started as illustrations of the text: as extensions. In the course of time, visual display of quantitative information became a craft in itself: the picture expresses more than a thousand words can do (Reference). In an electronic environment, the picture might become a similar prime source of information, whilst the text then becomes the explanation to the figure in complete symmetry with the figure as an illustration of the text. In the same way, films, sounds, animation's, etc., will become full expressions of scientific results in their own right. We will deal with this point further on in the next section.
Within the Library Sciences, information retrieval (IR) research is already a well established field. In this contribution, we will not spend much time on these aspects. At the moment, it is sufficient to list the following fundamental problems IR research is facing (Reference ):
1- In systems where we use the full text of articles, so called free text searching systems, the search possibilities are confined to the words provided by the author. The manipulable information is restricted to the work as provided by the author. As already emphasised above, research and hence the authors language is very contextual, full of jargon and very much the expression of more or less closed social environments. For that reason free text searching systems are very difficult to handle for readers who are not conversant with the jargon of the particular field. This might be readers from other (adjacent) fields, but also readers within the field but reading from another perspective, be it geographically (American scientist reading Russian science), or temporally (today's scientists reading old work in their own field). From an other point of view, one can say that free text searching approaches the problem from the authors point of view.
2- In systems with controlled keyword lists and thesauri (externally added keys), we are confronted with the almost impossibility of mapping content onto a fixed list of concepts. Whilst in the case of free text systems, we are able to maximally manipulate the texts as given, in the case of controlled keywords we reduce (or coalesce) language into fixed notions. However, to be useful, these notions need to be stable, at least for some time. Thus controlled keywords and thesauri always lag behind the research language used. It is important to note that, opposite to free text terms, controlled terms express in a way the readers point of view. Unfortunately, articles are now only indexed once, and retrospective indexing of collections of articles in order to identify old work to new concepts, and vice versa, never happens.
3- In cases where we use references to disclose works that we need, we take the list of references as transmittal indicators. Not the works we have accessed, but the cited works are wanted. The problem is that the reason a reference is given by the citing author is not always clear. Is it just to show the author knows his field, is it to flatter a possible referee, is the reference to the competition deliberately left out, etc.? What is needed is a better link between the cited work and the context in which the citing author deems this reference useful. Fortunately, due to the speed-up of the publication process by electronic means, the time-lag inherent in the use of references as disclosure tools will be reduced. The use of references as disclosure tools emphasise their context, or embedding, of the wanted information.
Thus the research programme that we propose entails the development of domain-specific information representation structures which link scientific or related information concepts to the specific context in which they are used. One way to do this is to create a collection of flexible domain-specific thesauri. Even if terms in different thesauri within a collection are literally the same, they do not necessarily represent the same concept. Every term which will be put into context in a specific domain is therefore a much more powerful tool. If we now allow the domains to overlap slightly, we will be able to generate a collection of thesauri which, like an atlas of road maps of different scale and lay-out, guide the searching researcher from one domain to another. A programme on overlapping thesauri in mathematics and physics starts soon. Here we try to develop a mathematical theory (Reference ) to match overlapping terms (and there synonyms) extracted from a large and coherent set of articles within well-defined fields in mathematics and physics. The ultimate goal of this research programme is to develop techniques for the generation of an Atlas of contextual scientific index terms.
6. First steps to a new architecture
Following the requirements and expectations on storage, retrieval, etc., as resulted from our investigations, reported in the beginning of this paper, and in order to appreciate the new possibilities and fit them into the framework of conscientious scientific discourse, we have to clarify and define the various characteristics of the different kinds of information.
The essay form of scientific documents is a typical result of the use of print on paper sheets. The portability, browsability and comprehensiveness of the paper product is the end of a century long historical development process. In an electronic environment the characteristics might well change. All components of the paper product which are repetitive can be deleted as recurring objects, as they are always retrievable from the archive when needed for the integration of information by the reader. For example, it is customary (or even obligatory) to have an introduction which explains the authors' goals and serves to embed the reported work into a wider context. In an electronic environment, say a kind of hypertext structure, introductions might be reduced to pointers which link reported work to a review article in which the whole context is fully explained. Furthermore repetitive reviews of one's own and other researchers' work can be reduced if the structure of the reporting has a more modular build-up instead of the present linear story-telling structure. The aim then is to structure texts in different types of modules, in such a way that each kind of module has its own information value. It is important to note that scientific articles are already well structured according to well established rules and have familiar headings such as: Introduction, Methods, Data, Results, Discussion, Conclusion. However, this does not mean that all sentences dealing with, say, methods, can be found under that heading. Analysis shows that linear texts are generally much less structured then section headings suggest.
In our research programme we analyse a coherent collection of scientific papers in two different ways. Firstly, we analyse the different types of information contained in the documents (e.g., Goal, Embedding, Tools & Methods, Results, Data-handling, Apparatus, Discussions) as a first break-up of the linear structure. We take this set of types as basic modules and try to fit the original text therein. Of course such a simple linear set of modules is not sufficient. Within every module we make a further subdivision which relates this module to others. So, within the module "Apparatus" we can, e.g., distinguish the description of the apparatus used, the apparatus in context to other machines (the embedding of the experimental set-up), the apparatus in contrast to apparatus used by others (apparatus as part of the discussion). The main goal here is to reveal a possible modularity of information by analysing existing articles, in order to come to a heuristic model for a non-linear modular way of writing articles.
This part of the analysis is augmented by a linguistic study where the same set of articles is analysed as argumentative texts. According to well-established models of the Pragma-Dialectical approach in argumentational theory (Reference), we try to reveal the line of reasoning in a scientific article with the aim to use it as a tool for better structuring. The goal here is to develop a model for the relationship between the above mentioned modules. This way, we can assign to each module not only a scientific tag, but also a rhetorical one, e.g., a module "Goal" has a completely different character than a module "Data-Handling". While in the "Goal" module the author can express all kinds of speculations freely, the value of the module "Data-Handling" demands very strict adherence to well-established standards and procedures. Integrating both approaches will result in a model for a modular presentation of scientific texts, where each model has a well defined scientific as well as contextual character. The advantage of such a structuring is clear for the following modes of use:
- Modularly structured information fits the characteristics of electronic media which are intrinsically more than linear. Modules fit nicely in the hyper-text philosophy and transcend the present use of hyper text as a structuring on top of intrinsic linear essay's.
- By putting the various components of scientific discourse in context, the refereeing standards can be improved as they can be defined as a function of the module (refereeing the module "Data-Handling" demands more rigorous standards in contrast to the module "Goal").
- In case of a modular build-up, the searching reader can confine the search to particular modules and does not need to retrieve the entire communication as is the case with document retrieval; e.g., if a researcher wants to know about the design of a particular detector, only those parts of the work are of interest which deal with the detector, independent of how interesting and important the rest of the communication is for the original author.
6.2.- Active mathematics and simulations
Although text-based, mathematics represent a totally independent way of representing results. The research in this field is now aimed mainly at defining a (SGML) grammar for mathematics which will enable manipulation of formulae and their use in calculation of symbolic manipulation packages.
Simulations contain again an independent way of communicating scientific ideas. Here the reader has to have the possibility to change the model and/or the parameters to develop one's own further research based on published research. The publication of computer programs, be it simulations or calculation packages, demands the development of one's own standards and rules. Some experience is actually gained in the management of program libraries, such as the Computer Program Library from the Queens University of Belfast, which is integrated in the paper journal Computer Physics Communications.
6.3.- Still Pictures.
The analysis of potential applications of non-textual material still has to start. Pictures will be more than just "illuminations" of the text. Pictures have their own intrinsic value. At first sight, we can already appreciate the great difference between a graph (in any dimension) and a colour picture of an aberration of an optical device. Interestingly, in the peer review process, no standards or rules are established to review pictures as independent objects. In the analyses of pictures and their rôles, the results of textual studies will be helpful. Important items are:
- There are differences between data, data-handling and data-presentation. One can imagine a hierarchy of modules: first raw data (a module a reader cannot change, and which integrity is pertinent); then a module data-reduction and handling (a module a reader can change and replace), and finally a module data-representation (a reader certainly can change, use, and manipulate).
- Similarly, there are differences between pictures of immutable objects and pictures resulting from, e.g., calculations or recorded data. In the first case the whole picture has to be preserved as well as possible (e.g., a photograph of a phenomenon, or the design of a chip). In cases where we deal with a digital picture (e.g. a CCD camera picture), the data instead of the picture can be stored. In the second case, e.g., a non-linear map or other structure which results from a calculation, it might be advantageous to have the algorithm (and parameters) stored as well. With the rising speed of data-handling, a reader might want to redo instead of view the picture.
6.4.- Motion Pictures
Apart from the items mentioned for still pictures the following extra features have to be tackled.
Film or video (a sequence of still pictures) differs from animation. In the case of film and video we still have the difference between immutable and re-creational pictures. In the case of animations, however, we can also think of including a tool for the reader's adaptations and modelling.
The case of sound is special because digital sound is a very well developed field with an almost total manipulation capacity. Nevertheless, the use of sound as an independent way of presenting scientific results is hardly considered at present, except in speech research or general sound recording. The cognitive value of sound objects is so different from visible objects that a completely new field can be opened up.
7. General Conclusions
In this paper, we first try to define the rôle of information in the science process and describe investigations where we try to explicate the communication needs of researchers in different fields. This information provides us with a backbone and yardstick for the development of new ways of organising the scientific communication process. It clearly points to a greater integration of various types of information as well as the capacity of the reader to manipulate this freely. This way, social, cognitive and intellectual demands can be met by the emerging technologies in a cross-fertilising way.
This "user" research is a starting point for our collaboration in various university projects under the umbrella programme "Communication in Physics". In this programme, we investigate the opportunities modularity of scientific information offers, to make optimum use of electronic media. We also research sophisticated combinatorial techniques to develop an Atlas of overlapping controlled index term systems.
Although the programme "Communication in Physics" is focused on physics as main corpus of investigation, the results are expected to be applicable to other research domains as well. However, in line with our conclusions, specific cultural differences should then be taken into account.
Our main message in all this is, that in order to go beyond the "electronification" of the classical publishing process, we need to have an in-depth knowledge of the use, needs and presentation requirements and possibilities of scientific information.
The work described in this paper is a collaboration of the Faculties of Arts, and Mathematics, Informatics, Physics, and Astronomy (WINS) of the University of Amsterdam, the National Research Institute for Mathematics and Computer Science (CWI),and Elsevier Science. The work is partly financially supported by: Stichting Physica, Royal Academy of Science and Arts (KNAW), Royal Library (KB), Shell Research Amsterdam (KSLA), Elsevier Science.
1) W.D. Garvey. Communication: The essence of science. Pergamon Press, Oxford 1979.
2) H.E. Roosendaal and A.P. de Ruiter. The Journal at the cross-roads of developments in scientific information and information technology. Paper presented at Conference in Helsinki 1990.
3) T.S. Kuhn. The structure of scientific revolutions, 2nd enlarged edition. Chicago Univ. Press. 1970.
4) I. Lakatos. Falsification and the methodology of scientific research programmes. In: I.Lakatos and A. Musgrave. Criticism and the growth of knowledge. Cambridge Univ. Press. 1970. p.135.
5) B. Gholson, W.R.Shadish Jr., R.A. Niemeyer, and A.C. Houts (eds.). The psychology of science. Cambridge Univ. Press. 1989.
6) S. Jasanoff, G.E. Markle, J.C. Petersen, and T. Pinch (eds.). Handbook of science and technology studies. Sage Publ. London 1995.
7) I. Lakatos. The methodology of scientific research programmes. In: J. Worrall and G. Currie (eds.) Philosophical papers, vol. 1, Cambridge Univ. Press. 1978.
8) G. Panhuijsen and R. van Hezewijk. To be published Univ. of Utrecht.
9) A.G. Gross. The rhetoric of science. Harvard Univ. Press, Cambridge 1990.
10) R. Merton. The sociology of science: theoretical and empirical investigations. Univ. of Chicago Press. 1973.
11) F. van Rooy. The rôle of electronic media in scientific communication. Thesis University of Utrecht 1995.
12) D. Schauder. Electronic publishing of professional articles: Attitudes of academics and the implications for the scholarly communication industry. JASIS vol.45(2), 73-100., See for example: J. Maddox. Nature. Vol.376. p.11, p.113, and p.385 C. Bell. Nature. Vol. 376. p.375
13) P.A.Th.M. Geurts and H.E. Roosendaal. Mixed market research for strategic management. To be published.
14 Marshall McLuhan. Understanding Media: The Extensions of Man. Routledge & Kegan Paul Ltd. London, 1964.
15) For a good overview of the developments towards the actual situation see:
J. André, R. Furuta, V. Quint (eds.). Structured Documents. Cambridge University Press, 1989.
Edward R. Tufte. The Visual Display of Quantitative Information, Graphic Press, Cheshire, Conn. 1983
Edward R. Tufte. Envisioning Information, Graphic Press, Cheshire, Conn. 1990.
For a critique see:
Joost G. Kircz. Rhetorical structure of scientific articles: The case for argumentational analysis in information retrieval. Jnl. of Documentation, 47(4), 1991, pp.354-372.
D.C. Blair . Language and representation in information retrieval. Elsevier, 1990.
For a recent collection of overviews see, the "Special Topic Issue: Evaluation of Information Retrieval Systems". Edited by Jean M. Tague-Sutcliffe. JASIS, vol.47(1), January 1996 .
16) M. Hazewinkel. Tree-tree matrices and other combinatorial problems from taxonomy. CWI report AM-R9507, April 1995.
17) Frans H. van Eemeren, Rob Grootendorst, and Tjark Kruiger. Handbook of Argumentation Theory: A critical survey of classical backgrounds and modern studies. Floris Publications, Dordrecht, 1987.
18) Frans H. van Eemeren and Rob Grootendorst (eds.). Studies in Pragma-Dialectics. Sic Sat, Amsterdam, 1994.
Listing by Author's Name, in alphabetical order
Return to the ICSU Press/UNESCO Conference Programme Homepage
University of Illinois at Urbana-Champaign
The Library of the University of Illinois at Urbana-Champaign
Comments to: Tim Cole