De Mulder R., van Noortwijk K., & Combrink-Kuiters, “Jurimetrics Please”, in European Journal of Law and Technology, Vol 1, Issue 1, 2010


Richard De Mulder, [1] Kees van Noortwijk [2] and Lia Combrink-Kuiters [3]

Cite as: De Mulder R., van Noortwijk K., & Combrink-Kuiters, “Jurimetrics Please”, in European Journal of Law and Technology, Vol 1, Issue 1, 2010.


Jurimetrics, the empirical study of the law, has never really come into existence. Although, given the way in which society has developed during the information age, it could have been expected that jurimetrics would become an important discipline, until now it has not conquered much ground in the universities or outside. In this article, some elements of the history of jurimetrics are presented as well as the academic and practical potential of this discipline. Finally, an attempt is made to explain the slow development of jurimetrics and a possible future perspective is given.

1 Introduction

'Jurimetrics' or 'jurimetrical' are not terms that are often used as qualifications for research activities. Even in the American journal called Jurimetrics, The Journal of Law, Science, and Technology, [4] which refers to itself as 'a forum for the publication and exchange of ideas and information about the relationships between law, science, and technology in all areas' and 'the oldest journal of law and science in the United States', very few of the articles mention the term jurimetrics. Furthermore, although the journal covers a variety of subjects such as:

  • Physical, life, and social sciences

  • Engineering, aerospace, communications, and computers

  • Logic, mathematics, and quantitative methods

  • The uses of science and technology in law practice, adjudication, and court and agency administration

  • Policy implications and legislative and administrative control of science and technology.

The approaches adopted can hardly be said to belong to the realm of jurimetrics.

It was the American Lee Loevinger who launched the term 'jurimetrics'. [5] He stressed the importance of scientific, and therefore statistical methods for lawyers. He saw a number of possibilities for using these applications in the law. Loevinger contended that knowledge about the law could be obtained by observation, rather than through speculation. 'Jurimetrics promises to cut windows in the house of law, so that those inside can see out, and to cut doors, so that those outside can get in.' [6] Loevinger's willingness to apply scientific methods to the law did not receive undivided support. In particular, his interpretation of scientific was criticised because it made no distinction between the activities of practising lawyers and those of academic researchers, as long as their work was of a quantitative nature. The quantitative approach became some headway in the United States in the 1960s and 1970s, but later appeared to go out of fashion.

One of the problems that arose in those early days was how jurimetrics should be defined. For example, Franken [7] defined jurimetrics as 'the application of quantitative methods to legal problems'. Franken did not agree with the behavioural, positivist approach as proposed by Loevinger, but would only accept quantitative methods if applied on a theoretical basis with explicit ethical and political principles. He proposed cybernetic systems theory as candidate for this. [8] It has been argued that this definition is, on the one hand, rather broad, while, on the other, rather restrictive. The 'application' is not just meant as something done by researchers, but also by practising lawyers (e.g. the documentation of data), while jurimetrics is restricted to quantitative activities. It would seem, however, that jurimetrics should also involve at least some non-quantitative (but nevertheless mathematical) approaches. De Mulder [9] in 1984 equated 'jurimetrics' to 'the empirical legal science', that should be concerned with the world of experience. He agreed with Loevinger that there are strong similarities between jurimetrics and econometrics, the scientific approach of economic phenomena. Lawyers, unfortunately, are not familiar with quantitative approaches and cannot build upon a tradition of mathematical models. This approach has had to be developed from the start. [10]

2 Our definition

In this article, we would like to use the following definition:

Jurimetrics is the empirical study of the form, the meaning and the pragmatics (and the relationships between those) of demands and authorisations issuing from state organisations with the aid of mathematical models and using methodological individualism as the basic paradigm for the explanation and prediction of human behaviour .

Or put more simply, it is a definition containing the following elements:

  • The empirical study of legal phenomena

  • With the aid of mathematical models

  • On the basis of methodological individualism (= rationality).

With respect to the first element, the object of investigation for the legal scientist, the legal phenomena, is legal texts. Legal texts are analysed in a variety of ways - their form, their meaning, their effect and how and why these texts came into existence. The second element explains that jurimetrics research uses a model building approach. By this is meant that an attempt is made to express the theory in mathematical, for example statistical, models. This usually entails quantification, which is often unavoidable because of the necessity of calculating probability. The third element requires a theory to describe, explain and predict human behaviour. Over the years, a number of models of man have appeared and have been used in various disciplines. Many social scientists and most lawyers base their approach on a sociological image of man. In this theory, man's behaviour will conform to the norms of the group to which he belongs. However, the success of the market economies has coincided with a wider acceptance of the basic paradigm (in the sense of Kuhn [11]) which is used to study human behaviour and hence to explain, predict and direct it in modern economic theory: it is the 'homo economicus' or (resourceful, evaluating, maximising person, REMP, or REMM, resourceful, evaluating, maximizing model) which now provides the image of man. [12] Processes are studied from the point of view of methodological individualism. In other words, processes are described, explained and predicted on the basis of the behaviour of individuals. A REMP is a person who wants all his decisions to be of maximum use to himself. Ideologically, this may sound rather denigrating. Modern day newspaper articles accuse the REMP of being a 'calculating citizen'. Yet in practice this selfishness is not necessarily anti-social because REMPs realise that their own interests are better served by taking other people into account. Negotiation is the lifeblood of the model. The REMP model of man is similar to the economic model. However, the REMP model is based upon utility maximising and does not restrict itself to the maximising of money, which is the basis for the economic model.

The criticism of the REMP model is usually based upon the fact that the REMP is a rational decision model whereas people do not always act rationally; emotions could affect decision-making. However, if the REMP model is understood properly, it is clear that emotions form a part of the utility function. The rationality of the model lies in the presumption that an individual will decide what will produce the most utility for that individual, and that includes emotional factors.

The rational model would never have become as important as it is now, if it had just been a model to explain and predict individual human behaviour. On the basis of this model, Jensen and Meckling developed their principal agent theory that is now the dominant approach in organisation science. In Jensen and Meckling, general and specific knowledge, the degree to which decisions in organisations are made at a central level rather than locally, is explained from two kinds of costs: agency costs and information costs. [13] Fama and Jensen in "Separation of ownership and control", [14] explain how the individual interests of shareholders and managers lead to typical organisational structures such as public companies. Textbooks such as Brickley, Smith and Zimmerman [15] show that this approach, using the REMP model of man, has become the foundation of modern organisational theory.

Those who would reject the REMP model as being ideologically coloured would, presumably, also reject the selection of this model as the basis for predicting human behaviour in jurimetrics research. However, there are good grounds for choosing the REMP, as the ability of this model to explain and predict is considerable. The model can be used as a basis for jurimetrical research, for example to analyse, explain and predict judicial decisions. It is the empirical, quantitative and economical approach to law that will enable lawyers to come up with advice that will be relevant, reliable and comprehensible to their clients. It makes finding results easier and more accessible for users in modern organisations, with their focus on costs and profits, while it also facilitates the use of knowledge from economics and business studies. Choosing the REMP model does not mean, however, that the achievements of sociology and psychology should be ignored in jurimetrics research. The REMP model gives direction to the research, it does not exclude elements from other disciplines. The REMP model, however, is essential in the empirical study of the law that concentrates on the pragmatics. Pragmatics is the core subject of law and economics.

Below, several jurimetrical studies will be presented. These illustrate the variety of areas to which the approach can be applied. They also provide insight into the practical application of jurimetrical tools to legal problems.

3 The form of legal language

With respect to the form of legal texts, comparatively little research has been carried out. However, the form of legal texts should be studied scientifically. The foundation of legal empirical knowledge is, after all, the study of the properties of the form of legal texts. It is difficult to see how scientific knowledge can be gained about the meaning and pragmatics of legal decisions and legislation without systematic knowledge of the form of those texts, for example by comparing them to non-legal corpora.

3.1 Quantitative linguistics

For legal language, several aspects of the form could be studied. One of these is the structure of the word use in legal texts. An interesting question with respect to this is for instance if this word use differs measurably from word use in other text types. To examine this, methods from the field of quantitative linguistics (sometimes also referred to as statistical linguistics) [16] can be used. This is a branch of linguistic science in which the measurability of linguistic phenomena plays a central role. Quantitative linguistics provides a number of methods to analyse the word use in a set of documents, or 'corpus'. Characteristics which play an important role in these methods are for instance word frequency (how often does a certain word appear), frequency distribution (what is the pattern of the word frequencies of all the different words in a corpus) and distribution of word types (is a certain word used in every document in a corpus, or only in a subset of documents). These characteristics can be analysed by compiling a frequency list of the corpus. This is a list of all the different words (or 'word types') in the corpus, plus the number of times the word appears in the corpus and the number of documents of which it is a part. [17] This list is sorted according to word frequency, the most common word being at the top. Based on this frequency list a number of linguistic measurements can be made, such as the 'characteristic K of Yule/Herdan. [18] The value of these measurements provides a typology of what could be called the 'structure of word use'. Apart from these data, which characterise the way in which words are used, the words themselves have also been studied in this research project. Points which have been taken into account in this respect are, for instance, word lengths and the specific words which appear at the 'head' of the frequency list (the most common words in a corpus).

3.2 Legal language versus 'general Dutch'

A study on the characteristics of legal word use, carried out with respect to the Dutch language a decade ago by Van Noortwijk, [19] showed interesting results. For instance, word frequency distributions (the 'pattern' of word use) proved to be quite different in legal texts than in 'general Dutch'. This is illustrated by the following conclusions that were drawn:

  • Word types (different words) are repeated more often in the legal texts than in general texts. General texts usually contain a higher number of different words.

  • For legal texts, the (most common) words at the top of the frequency list have higher frequencies than words at the same position on the list of general texts. Together with the former point, this means that in legal texts there is a smaller 'core' vocabulary, of which every word type is used more often.

  • The same conclusion can be drawn from a comparison of the frequency distributions. Another fact which emerges from these distributions is that although the general Dutch texts contain a far greater number of different words, many of these words (a higher percentage than in the legal corpora) have very low frequencies (they appear less than ten times).

  • Word use in each of the corpora can be effectively characterised by several 'linguistic constants'. The value of the characteristic K which was mentioned earlier, for instance, ranged from 0.0128 for the statute law corpus, via 0.0111 for the case law corpus, to 0.0106 for the general Dutch corpus.

Given these differences that were found for Dutch legal texts, it would of course be interesting to investigate whether the same holds true for other languages and whether there would be differences between the characteristics of Dutch legal texts and texts in other languages. Therefore, a similar research project for British English texts was recently initiated at the Erasmus University. Two corpora containing thousands of legal documents (one containing legislation and one containing case reports) and a corpus containing general texts, all in British English, have since been compiled. The corpora are of roughly equal sizes (around 16 million words each). The 'general' corpus consists of a random sample from the British National Corpus, [20] the two legal corpora consist of legal texts available on the Internet. Cases for the case reports corpus have been selected in such a way that the percentage of cases heard by various courts in the British hierarchy of courts (House of Lords, Court of Appeal, High Court, County Court cases etc.) more or less corresponds to that in the ten year old Dutch case law corpus, which will facilitate inter-language comparison. Using these corpora, it is possible to map precisely the differences between word use in the respective language types.

3.3 Characteristics of the English corpora

In the field of quantitative linguistics, several characteristics (measurements) to compare corpora have been proposed. One of these is the so-called 'Characteristic K', as defined by Yule and Herdan. This is an indication of the average frequency of the repetition of word types. According to Yule, at least, this characteristic is therefore also an indication of the size of the vocabulary in a corpus (i.e. it could be used to predict the number of word types from any given number of word tokens). In the pilot project it was found that it appears to be sufficiently stable in samples of different size, taken from a corpus. [21] K can be calculated as follows:

where r stands for the rank number of a frequency class, equal to the frequency in the corpus of the word types in that class, and nr stands for the number of word types in the class. These values for K are calculated for the three corpora:

Table 1. Characteristic K for the different corpora

The two legal corpora yield values that are almost identical, whereas the general language corpus yields a K that is considerably lower. In the Dutch pilot project similar results were found, but with an in-between score for the case law corpus. An explanation could be that the word use in British case law reports is rather formal - much like that in legislation texts - whereas Dutch case reports might contain a mixture of formal (legal) language and a more general discourse.

Another relationship between the number of word tokens and word types, as defined by Erikstad, [22] has also proven to be relatively stable, no matter what the size of a corpus. In this relationship, the number of word types in a sample is considered equal to a power C of the number of word tokens in the sample, multiplied by a constant R.

By using known values for the numbers of tokens and types in different samples, the values of C and R can be calculated by means of regression analysis. The results, for the British corpora as well as for the Dutch pilot, are given in Table 2.

Table 2. Token/type ratio constants

Explaining the different values is beyond the scope of this brief introduction to the subject. Interestingly, the British corpora show different values for R and C than perhaps could be expected from the pilot project. Because the square of the Pearson product moment correlation for the regression analysis underlying the values of these constants is very high (above 0,989 for all three corpora), the reliability of formula (2) for the estimation of the number of word types in a sample of any given size is also high. In the next section, we will describe a possibility to create a practical computer application based on word use statistics. The system which is being introduced has been developed in the past few years at Erasmus University. It makes use of the characteristics of the word use in documents in order to implement more intelligent [23] (legal) document retrieval systems.

4 The analysis of case law

The analysis of case law with the aid of mathematical models is probably the most promising possibility for jurimetrics. The general idea of this research is to examine judicial decision-making or, more precisely, the relationship between the input - the identifiable case factors - and the output - the decision. A number of ways have been found to represent this relationship in simple mathematical formulae or just by using a conceptual model. As computers became more powerful, the interest in this kind of research initially increased strongly. However, interest in the computer-assisted analysis of case law quickly waned, probably because this method of research still remained very time consuming, despite the use of computers. A significant part of the work had to be done manually, as a legally trained individual must perform the coding operation. It is for this reason that most of the research has been limited to a pilot study. These studies showed that, with the use of a certain mathematical model, satisfactory results could be achieved.

In making a case analysis, a list of factors must be drawn up. One of the case variables that have been examined is the role of the judge him/herself in the decision-making process. Some results indicate that the judge who makes the decision can be a determining factor in the result of a case. In one of the approaches, the so-called 'behavioural approach', a relationship is supposed between the personal characteristics of judges and the content of their decisions. In this line of research two kinds of characteristics are distinguished: on the one hand, the opinions, preferences and attitudes of judges and, on the other, their personal attributes. Sometimes a third personal characteristic is mentioned: the group behaviour of judges. This kind of research studies the way judges communicate with their peers during their deliberations.

Glendon Schubert's [24] bundle gave some examples of the case analysis focusing on the person of the judge as a factor. Stuart Nagel [25] reported a project in which the relationship between the attitude of judges with respect to 24 separate items and decision-making had been investigated. A correlation with progressivism was found. In an earlier study [26] Nagel had found that background factors such as family origin, religion, education and work experience influence judicial decision-making. Ulmer [27] investigated the influence on decision-making of the personal characteristic 'leadership in small groups' and obtained some positive results.

Underlying this kind of jurimetrical research is the work of the North American 'legal realists'. The US judge, Oliver Wendell Holmes argued that judicial decision-making was not simply a logical exercise in which an established rule of law was applied to the facts of a particular case:

. the life of the law has not been logic, it has been experience. The felt necessities of the time, the prevalent moral and political theories, intuitions of public policy, avowed or unconscious, even the prejudices which judges share with their fellow men, have a good deal more to do than the syllogism in determining the rules by which men should be governed. [28]

In the 1920s, Herman Oliphant, a late adept of this approach, was an outspoken critic of the usefulness of the ratio decidendi as a guide to the real grounds of a decision. [29] The legal realists contend that the arguments judges formulate in their decisions do not necessarily express their real considerations. Their decision-making is not determined by rules (law, treaties, doctrine, customs etc.) but depends upon specific factors in cases. 'Fact-pattern' research tries to investigate all the possible case factors and not just those mentioned in the judicial argumentation. Kort [30] attempted to create mathematical models that could predict judicial decisions based on such case factors. In later research he started to apply linear regression models as well as other models to a number of legal areas. Kort's research revealed some relationships between case factors and decisions. [31] Reed Lawlor's [32] approach was to identify a linear relationship between two 'constant' factors, the person of the judge and the law, and a number of variable case factors. [33] He supposed that judges' opinions would not change over time. He developed an extensive manual for case analysis. [34] Lawlor's work showed some very promising results, although in the grey area between clear 'pro' and clear 'con' decisions the prediction was often not clear. Since the last legal realist (Reed Lawlor [35]) retired from research, the jurimetrics front has become rather quiet. Although Franken, Snijders, Tyree, De Mulder, Malsch, Combrink-Kuiters, Ashley, and Suyudi [36] can be mentioned as researchers involved in case analysis, at the moment there are only a few researchers in the whole world who are working on quantitative empirical research into judicial decision-making. The deep-structure research carried out by Smith and Deedman is, in essence, jurimetrical in nature and the same can be said as regards at least part of the research by Zeleznikow and Hunter into case-based reasoning. These researchers are also searching for structures and regularities within the process of judicial decision-­making in order to implement these patterns into legal knowledge based systems. [37]

4.1 Steps in a jurimetrical research plan

Below is a description of how a jurimetrical research project is carried out. [38] The examples used here are from Combrink's study. [39]

  • A legal domain is chosen, based on the availability of (preferably) lower court cases and the suitability of the subject matter. Within the domain, at least one specific legal topic has to be chosen and 'operationalised' in the form of a 'legal item'. For instance, in the field of family law the legal item that has been used is whether or not the father was granted custody. The item has to be dichotomous: the answer is 'yes' (the father was granted custody) or 'no' (he was not).

  • A selection of cases within the chosen domains has to be made. All cases that are possible candidates for selection should be read and assessed. Only when the set of cases is homogeneous can a search be made to determine the influence of certain facts and circumstances. The decisions have to be on the same item. Insufficiently relevant cases must, therefore, be removed from the selection.

  • All cases have to be thoroughly read and a list of factors has to be drawn up.

Figure 1. List of factors

An example of such a list is provided in figure 1.

  • All cases must then be manually coded by, preferably, at least two different assessors.

  • On the basis of the coded data, a statistical prediction model is calculated. Correlation between factors can be analysed in order to improve the prediction methods. Techniques for validation have to be applied. Factors that turned out to be important in the custody cases could be, for example, the expert's advice and which parent was already taking care of the child at the commencement of the court procedure. For each case a score is calculated on the basis of the weights of the factors. Without such scores, cases can only be ordered according to whether the judicial decision was 'for' or 'against'. See figure 2. [40]

Figure 2. Cases ordered according to actual decision (+1/-1)

Ordered according to the weight of the factors, the picture as shown in figure 3 emerges.

Figure 3. Cases ordered according to weight of factors

This graph shows that there are strong cases e.g. in favour of the father (at the left, particularly case 5), strong cases in favour of the mother (on the right, particularly case 21) and some cases in the intermediate zone (for example case 18 and case 22). With the data depicted in this graph, the probability that a case would be decided for or against the father or the mother could be estimated.

The outcome that would result from combining the two graphs is shown in figure 4.

Particularly interesting here is case 19. This case was ranked between the 'for mother' cases, but was decided in favour of the father. Such a statistical 'outlier' is often particularly interesting from the legal point of view. Which were the factors that determined its position in the ranking?

Figure 4. Data from figures 2 and 3 combined

Furthers analyses can and often should be carried out at this point. Particularly important is the validation of the results: to what extent do the results provide information about the probabilities of positive or negative decisions? For example, if the 'a-priori' probability of a positive decision is far more or far less then 50%, then the a-priori outcome of a decision would often be the best predictor and the prediction method has to be relatively strong to improve on that. [41]

4.2 Conceptual Retrieval Systems

Selecting a series of cases with suitable characteristics is an essential step in a jurimetrics research project. Unfortunately, case law databases often fall short in this respect. The only retrieval mechanism these databases use consists of 'Boolean searching' (searching documents that contain certain (combinations of) words. This method, already in use for more than 50 years, focuses exclusively on the 'form': the words present in the documents. A more 'intelligent' retrieval system should have the ability to select documents based on their subject matter, in other words, based on their meaning. In Wildemast and De Mulder 1992 [42] an overview was given of attempts that have been made to build such 'conceptual' retrieval systems.

The methods proposed in literature for legal conceptual retrieval are aimed at three different aspects of the retrieval process:

  • the interface with the users

  • the representation of documents

  • the search operation itself.

It is the interface which makes communication between the user and the computer possible. It assists in the translation of the user's question into an actual search instruction for the computer. [43] When the search operation has been carried out, the interface is responsible for the presentation of the results. On the basis of these results, the user can decide to reformulate the request. Conceptual retrieval can be realised here by assisting the user in finding the right words to describe the concept and by providing the legal context in which concepts are described.

As for the representation of documents, it would be impractical to let the computer search directly in the digital documents. They have to be 'indexed' in order that the computer can search in a list of words rather than in a list of documents. This indexing has to be done in a way that no information is lost and full text search remains possible.

By search operation is understood the function which ensures that the concrete search instruction (whether or not already re-worked in the interface) is carried out on the documents represented in the system. Most search operations (for instance the 'Boolean' search mentioned earlier) make use of the occurrence of a term rather than, for example, the term frequency in a document. The result of the Boolean search operation is the answering of a yes/no question for each document as to whether the document satisfies the search instruction. Other search operations look for a standard which indicates the extent to which the document satisfies the search instruction. This may possibly be expressed in the form of an estimation of probability. [44] A similar result is achieved by search techniques which make use of 'neural networks'. Conceptual retrieval with the help of neural networks was proposed in Belew 1987 and in Rose and Belew 1989.

The analysis of the advantages and disadvantages of the techniques presented in current literature leads to the conclusion that, in most cases, the method of text representation, or the interface, does not allow the users to define their own concepts. It would, however, be desirable to allow users to search according to their own understanding of a concept. These concepts could then be more precisely re-defined on the basis of the results of search operations or interpretations by the interface. Such a system could store the user's concepts: it would become a 'learning' system.

As regards the interface, it is especially important that the user can bring his or her knowledge into the system and modify his own concepts. We would argue that the quality of the interface is, therefore, the constraining factor in conceptual legal information retrieval at present. Research efforts should be concentrated on this area as a lot more can be done. For example, in the available literature hardly any attention is paid to the obvious method of allowing the user to make his own ideas explicit: the user can give the system examples of clearly relevant documents with which he is familiar. [45] The choice of search technique is not a crucial design decision as, given the design choices for interface and document representation, various search techniques can be used as alternatives or supplements to each other.

A prototype of a system, which contains a very large collection of legal cases and legislation, operating with these techniques, has been constructed at the Erasmus University. It could be called a learning 'concept processor'. Using it for document retrieval purposes usually requires an initial training session. During this session, the user indicates example documents ('exemplars') that he considers to be relevant for his legal concept. Consequently, the searching facility of the system will search for documents that are similar to the exemplars. In order to fulfil this task, the programme compares the properties or attributes of potentially relevant documents with those of the exemplars. These attributes consist of the words used in the documents, their frequency, possibly the order in which the words appear and other properties of the text. The extent to which documents are similar can be calculated using statistical measures.

By calculating the similarity between each document and all the exemplars, the system is capable of ordering documents according to their relevance. The documents that are ranked at the top of the list are the ones that the user will be interested in the most. If the system comes up with a document that the user identifies as relevant, he/she can decide to add it to the list of exemplars. The next search operation will then be based on more information than the initial one. Usually, several 'rounds' of adding (and possibly also removing) exemplars are necessary before the ranking becomes 'stable'.

Documents the system initially ranks highly, but that the user identifies as not being relevant, also have a very important function. These documents can serve as 'counter exemplars' for that particular retrieval concept. Typically, the user would inform the system that documents that are put forward as candidates for relevant documents are in fact counter exemplars. These non-relevant documents will 'teach' the system the finesses of the concept the user has in mind. A concept, as used in such a conceptual retrieval system, could therefore be defined as follows:

A concept is an ordered pair of sets of documents. The first set of the pair is the set of exemplars (of relevant documents). The second set of the pair is the set of counter exemplars (a set of non-relevant documents that are as similar as possible to the relevant documents).

A concept can be referred to by a term that indicates membership of the first set of the pair, and stored accordingly. For example: '(documents which contain) civil law (cases)'.

During the initial training session, users have to evaluate the results continuously. There is, for instance, always the possibility that a whole category of documents, which in fact belong to the concept and should therefore be selected from the database, are missing in the top part of the ranking. It is then necessary to look for at least one or two exemplars of this category, as it is probably not 'covered' by the concept yet. At a certain moment, the user will find that adding new (counter) exemplars hardly affects the ranking anymore. All relevant documents should now be positioned in the top part of the ranking. At this stage, the user's only remaining task is to find the exact point in the listing at which relevant documents stop and irrelevant documents begin. In some cases, a graphical representation of the final probability measures can be helpful in locating this point.

The application of conceptual retrieval, for instance using the method described here, usually improves the recall (proportion of the available relevant documents that is actually retrieved) and the precision (proportion of the retrieved documents that are indeed relevant) of a search operation considerably. It can, therefore, be an extremely valuable tool in jurimetrics research projects, as described in sections 3 and 4, as the selection of a set of cases with the right characteristics is essential for the success of these projects.

4.3 Advanced jurimetrical analysis of case law

In spite of the new technological possibilities, as mentioned above, the main obstacle in carrying out jurimetrical research into judicial decision-making is the labour intensive and time-consuming manual preparation phase. This covers not only the process of selecting the most suitable cases, but in particular the coding, which is the actual data generation for all cases. These cases are coded with respect to the decisions ('for' or 'against'), as well as with respect to all the factors that have been determined to be relevant. The coding process demands that a decision is made as to whether a factor appears and if so, depending on the kind of factor involved, the extent of appearance can also be coded. If this time-consuming preparation could be carried out in a more efficient way with the help of computer algorithms, this would lead to a number of positive effects. Firstly, it would increase the knowledge about juridical decision making because more cases, in a variety of domains, could be analysed. Secondly, because the research would become less tedious, more researchers would be attracted to this field.

Given the enormous amount of potential data to be found in case law, this prospect would be both scientifically and socially significant. One such technique that could be applied is the conceptual technique for working with large databases described above. After all, in this technique, the probability is calculated for each document that it is relevant to a certain concept. Therefore the technique could be an aid both in the selection of suitable cases and in the factor coding procedure. This application is innovative, but on the basis of pilot studies it seems promising. [46] The kind of knowledge created in this way can also be used as input in legal decision support systems.

The application of conceptual retrieval systems to facilitate the coding process is not the only way to improve the jurimetrical analysis of case law. By applying new mathematical models and techniques it is possible to improve the predictive value of the analysis. In most of the mathematical models that have been applied so far, the factors have not been ordered and are supposed to be independent of each other. Although these models are highly valid and robust, it is a theoretical challenge to develop mathematical models that would express more complexity. In particular, the nature of the relationships between the factors and the decision, the strength and the nature of the correlation between factors and the interaction effects should be further refined. It seems possible that, for example, organising the factors in a tree structure, or clustering correlating factors into a new factor, could improve the models. Here lies a possible link between legal theory, particularly argumentation theory, and jurimetrical research.

5 Why is jurimetrical research important?

There are good reasons why the jurimetrical analysis of case law, and the study of the form of legal language as a supporting discipline, should be a lot more popular than it is. Empirical and quantitative methods were accepted in the traditional scientific disciplines long before the advent of modern computer technology. Other disciplines have gone on to embrace the empirical approach during this century. Law is one of the very few disciplines which has not followed suit.

The world has changed: the demands made by lawyers' clients will change. Unless lawyers are prepared to deal with data in a more technologically advanced and scientific way their profession will become obsolete. Lawyers will have to come up with more reliable and valid estimations of legal risks and costs. Knowledge management has, therefore, become a subject that has received a lot of attention from lawyers recently. Knowledge is a vital factor, used by almost every organization for the purpose of realizing its goals. With respect to legal organizations, knowledge is perhaps an even more fundamental requirement, as the very product on offer is legal knowledge and expertise. When a law firm tries to improve effectiveness and efficiency, knowledge management is increasingly the tool. [47] Essential questions when applying this tool are: 'What knowledge is available to whom?' and 'Where is it needed the most?' People have to be able to learn from each other. This makes communication essential, and, therefore, facilities such as databases and knowledge-based systems will increasingly play a role.

Knowledge management and information technology are closely connected. [48] The role of IT in collecting, assessing, applying and disseminating knowledge has become considerable. In the legal field, for instance, we see that case law databanks have already become indispensable. Together with traditional sources of information, these databanks are an important tool for legal knowledge management. They can play a role in direct knowledge sharing.

At the state level, there is an important new use for jurimetrical case analysis. In modern democratic states there is an increasing demand for transparency of the functioning of state authorities and of the judiciary as a part of those. Monitoring and auditing are state functions that have strongly increased in importance. The instruments to perform these functions with respect to the judiciary will be empirical, quantitative and systematic research. Jurimetrical analysis is especially useful in the hands of those who have the task of monitoring the powers in the Trias Politica. This monitoring function will become increasingly important in the information age. [49]

The need for jurimetrical analysis will become more critical in the future, as the volume of case law is constantly expanding. Although a few years ago carrying out jurimetrics research was difficult because case law was often not available in a digital form, that obstacle has now largely been overcome. Jurimetrical analysis has, therefore, become a practical option. However, jurimetrical techniques must be constantly upgraded in order to keep step with these developments. The older form of jurimetrical analysis is still feasible where the number of cases do not exceed the hundreds. Where large numbers of cases are involved, new forms of computer assistance, i.e. conceptual retrieval systems and similar techniques will be necessary to support the jurimetrical analysis. Fortunately, as outlined above, these techniques are in the process of being developed and promising results have been shown. Nonetheless, given their importance, surprisingly few people are active in the field.

6 Hope and resistance

With the possible exception of law and economics, there is simply not much research being done that could be labelled as jurimetrical research. As mentioned above, even in the journal Jurimetrics, the Journal of Law, Science, and Technology, most of the articles are not jurimetrical in the sense that they deal with the statistical analysis of case law or similar subjects, or even in the broader sense of our definition. The study of the form of legal texts as well as the study of legal linguistics is a neglected area. [50] So too is the quantitative empirical study of case law. Sometimes there is a glimmer of hope: the advent of the Journal of Empirical Legal Studies could be mentioned. It contains empirical research, although the term 'jurimetrics' never appears to have been used since its inception in 2004. Most of its articles deal with qualitative research rather than quantitative, but there have been some contributions relevant for jurimetrics. [51]

6.1 Law and economics

The dynamic field of law and economics offers even more hope, although it never seems to use the term 'jurimetrics' as a proper label for its main activities and, more seriously, seems to avoid jurimetrical techniques when case law is studied. This is surprising. The third level of study in jurimetrics, the pragmatic level, is concerned with the social and economic effects of the law, as well as the way in which legal demands and authorizations are made. For example the work of Van den Berg and Visscher [52] is a straightforward application in this area. The law and economics field has produced a framework for optimal legal policies when individuals behave rationally. Within society, enforcement agents should aim at an optimal level, as the benefits of more enforcement have to be weighed against the costs. In the past, these areas were mainly studied within the sociology of law, but over the last decades this new discipline of law and economics seems to have conquered this field. Law and economics is an interesting and quickly developing new discipline.

According to our definition, this research falls within the ambit of jurimetrics, but this is not how those who are active in the field see it. Based on the work of such pioneers as Posner, [53] they have built their own discipline, independent from law as well as economics. They have their own textbooks with foundations of the discipline, [54] their own Journal for Law and Economics, an America n Law and Economics Review, an International Review of Law and Economics, an Encyclopaedia of Law and Economics, etc. Furthermore, there is a European Masters for Law and Economics [55] and there are a number of cooperative networks such as the European Network for Better Regulation. [56]

Although the approach of law and economics would seem to promote the empirical study of legal phenomena, there appears to be an almost oxymoronic tendency in this field to apply on the one hand, scientific economic methods to the study of legally relevant phenomena but, on the other hand, to avoid the empirical method when analyzing case law. A recent example is Arcuri [57] who is a strong proponent of the use of economic theories in order to improve 'contemporary legal systems', but who is disappointingly traditional [58] when she gives an overview of the relevant case law for her area of study.

6.2 Some explanations

The question that arises is why is there so much resistance to the development of jurimetrical research at universities and the use of jurimetrical techniques in legal practice? One possible explanation could be fear. New forms of technology are often initially greeted with resistance: buying train tickets from a machine rather than from a human being over the counter; using a pin number to get money from a cash point, buying products over the Internet are a few examples from everyday life of technology that was at first greeted by many with suspicion. Conservatism, a characteristic typical of the legal profession as a whole, could be another reason. It is 'new school' versus 'old school': a classical legal education is 'old school'; 'new school' would be to use the new techniques, and integrate them in legal practice and research in order to realise innovation and progress. [59]

This resistance to jurimetrical methods is not necessarily irrational. It may be the case that lawyers do not consider it to be in their interest to switch to new methods that would require a different education. Leith and Hoey see this as a possible explanation:

Of course the real difficulty in carrying out this work is that the researchers have to have both a legal background and a mathematical one. Few lawyers have this, and often those who have the mathematical understanding have a poor feeling for law. [60]

Not only are most lawyers not familiar with mathematics, few seem to be eager to see mathematics as a part of legal education. It is also possible that lawyers do not wish to see the nature of their profession change. Arguably, it is in their own interest to keep legal knowledge individualized and personal rather than disseminated.

7 Conclusion

The world has changed, but law schools and legal professionals seem to be intent to turn a blind eye to science and technology. It could be expected that, in the modern competitive world, legal firms and professionals would be eager to apply the new techniques for innovation in their services. This would certainly be according to what the 'New School of Law and Technology' proposes. In practice, however, the legal profession and legal services have hardly changed their modus operandi. Most lawyers are simply not familiar with quantitative, empirical or computer supported approaches. Furthermore, they try to avoid such contact as much as possible. In some cases this reluctance could possibly be explained in terms of a perception of their self-interests. This negative attitude towards innovation will, however, turn out to be too costly. In the modern world of globalisation, innovation is essential for all organisations and those in the legal field will not be an exception.

Given the potential of the jurimetrics technology, it is very surprising that its implementation in the legal field has been so minimal. It is generally recognised that knowledge management is already an important part of modern management. Its significance will only increase. Since knowledge is such a fundamental aspect of legal services, law firms and other organisations that are active in the legal field will also need to understand how to manage this resource.

With respect to case law, the trend worldwide is to make case law available in a digital form. Conceptual retrieval systems can make case law databases into an efficient and effective tool. They are an aid to the individual in searching the database, as personalized concepts can be stored and used again. These conceptual systems are also of great use for advanced jurimetrical research. The kind of techniques dealt with in this article can be applied quite easily, even by traditional lawyers. They do not have to be familiar with the rational model or even the mathematical techniques used. Specific training in order to carry out this work is hardly necessary anymore. There are, in fact, no more rational reasons for legal practices not to apply these techniques.

Computer aided case prediction will become a reality. It is now possible to do a quantitative analysis of a set of cases within a reasonable time. Furthermore, better techniques for validation mean a more reliable prediction of new cases. This is useful for legal knowledge management, i.e. for the efficiency of legal firms and services. Furthermore, at the state level, jurimetrical analysis is useful if not necessary to maintain the transparency of the decision-making of the judiciary. These are exciting prospects for the legal field - professional as well as academic.

Jurimetrics, the new legal science, will provide the necessary practical and theoretical structure for these developments. It comprises an empirical and mathematically supported study of the law and bases its description and explanation of human behaviour on the reliable REMP model. It is hard to conceive that jurimetrics, as an academic subject, as an instrument for monitoring state authorities and as a source of techniques for legal practice, can go on being ignored.

[1] Professor of Computers and Law, Centre for Computers and Law, Erasmus University Rotterdam, The Netherlands.

[2] Associate Professor, Centre for Computers and Law, Erasmus University Rotterdam, The Netherlands.

[3] Senior Researcher, Dutch Legal Aid Board, Utrecht, The Netherlands.

[4] (ISSN 0897-1277), published quarterly, is the journal of the American Bar Association Section of Science and Technology Law and the Center for the Study of Law, Science, and Technology at the Sandra Day O'Connor College of Law, Arizona State University. Jurimetrics was first published in 1959 under the leadership of Layman Allen as Modern Uses of Logic i n Law (MULL). The current name was adopted in 1966.

[5] Loevinger L., "Jurimetrics, the next step forward" (1949) Minn. Law Review 455.

[6] Loevinger 1949, p. 490.

[7] Franken H., Maat en regel (Arnhem: Gouda Quint 1975).

[8] Franken H., Systeemtheorie en Rechtswetenschap. Preadvies voor de Vereniging voor Wijsbegeerte en het Recht Nederlands Tijdschrift voor Rechtsfilosofie en Rechtstheorie, 1982.

Franken H., "Jurist en computer: theoretische achtergronden", in A.H. de Wild and B. Eilders (eds.), Jurist en computer (Deventer: Kluwer 1983), 13-32.

[9] De Mulder R.V., Een model voor juridische informatica [A Model for the application of computer science to law], with a summary in English (Lelystad: Vermande 1984) 239.

[10] Mathematical models, however, are not necessarily quantitative.

[11] Kuhn T.S., The Structure of scientific revolutions (Chicago and London: University of Chicago Press 1970).

[12] Jensen M. and Meckling W., "The nature of man" (1994) 7:2 Journals of Applied Corporat e Finance 4-19.

[13] Meckling W. and Jensen M., "Specific and general knowledge and organizational structure", in M. Jensen (ed.), Foundations o f organizational strategy (Cambridge, Mass: Harvard University Press 1998).

[14] Fama E. and Jensen M., "Separation of ownership and control", in Jensen (ed.), ibid., and (1983) Journal of Law and Economics 26. Available at SSRN: or DOI: 10.2139/ssrn.94034.

[15] Brickley J., Smith C. Jr. and Zimmerman J., Managerial Economics and Organisationa l Architecture (Boston: McGraw-Hill 2007).

[16] Guiraud P., Problèmes et Méthodes de la Statistique Linguistique (Dordrecht: Reidel 1959) and Herdan G., The Advanced Theory of Language as Choice and Chance, (Berlin: Springer verlag 1966). For an overview of the developments in the field of quantitative linguistics, see Baayen R.H., A Corpus-based Approach to Morphological Productivity (Amsterdam: CWI Free University 1989); Bailey R.W., "Statistics and style: a historical survey", in: Dolezel, L. and Bailey R.W. (eds.), Statistics and Style (New York: Elsevier 1969).

[17] See for examples of the use of these characteristics for instance Kucera H., and Francis N.W., Computational Analysi s of Present-day American English (Providence: Brown University Press 1967).

[18] Van Noortwijk C., Het woordgebruik meester. Een vergelijking van enkele kwantitatieve aspecten van het woordgebruik in juridische en algemeen Nederlands e teksten [Legal word use, a comparison of some quantitative aspects of the word use in legal and general Dutch texts], with a summary in English (Lelystad: Koninklijke Vermande 1995).

[19] See Van Noortwijk, ibid.

[20] The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English. For this project, only the written sources were used. See (Accessed 20 August 2009).

[21] See Van Noortwijk 1995, op. cit. 87.

[22] Erikstad O.M., "Appropriate document units for text retrieval systems", in J. Bing and K.S. Selmer, A Decade o f Computers and Law (Oslo: Universitetsforlaget 1980) 220-238, 223.

[23] 'Conceptual' retrieval systems, such as developed by Smith J.C., Gelbert D., et al., "Artificial intelligence and legal discourse: the Flexlaw legal text management system" (1995) 3 Artificial Intelligence and Law 55-95 and De Mulder et al. (1994) op. cit. 733.

[24] Schubert G. (ed.), Judicial Decision-Making (New York: The Free Press of Glencoe, 1963).

[25] Nagel S., "Off-the-bench judicial attitudes", in Schubert G. (ed.), Judicial Decision-Making (New York: The Free Press of Glencoe,1963) 30.

[26] Nagel S., "Judicial backgrounds and criminal cases", Journal of Criminal Law, Criminology and Police Science (1962) vol. 53, 333-339, 333.

[27] Ulmer S.S., "A leadership in the Michigan Supreme Court" in Schubert G. (ed.), Judicial Decision-Making (New York: The Free Press of Glencoe,1963) 13.

[28] Holmes O.W., The Common Law (Chicago and Boston: Little Brown 1881) 1.

[29] Oliphant H., "A return to stare decisis", (1928) 14 ABA J 37.

[30] Kort F., "Simultaneous equations and Boolean algebra in the analysis of judicial decisions" (1963) 28 Law & Contemporar y Problems 1.

[31] Schubert, 1963, op. cit. 133

[32] Reed Lawlor influenced Richard De Mulder's work substantially after the two met in Swansea, 1979 at a conference organized by Brian Niblett.

[33] Lawlor R.C., "Personal stare decisis" (1967) 41:1 University of Southern Californi a Law Review, 73-118.

[34] Lawlor R.C., "Case analysis manual, applied jurimetrics". Printed in the United States of America, 1969.

[35] Lawlor R.C., "Personal stare decisis" (1967) 41:1 University of Southern Californi a Law Review 73-118, 73. C.f. Ulmer 1967, op. cit. p. 67. Goldman S., "Behavioral approaches to judicial decision-making: towards a theory of judicial voting behavior" (1971) March Jurimetrics Journa l 142.

[36] Aria Suyudi performed an impressive analysis of several hundreds Supreme Court cases in Indonesia; Suyudi A., Insolvency systems and risk management in Asia. An inquiry to Indonesian judicial decision making behaviour on bankruptcy cases (1998-2002). A jurimetrical analysis (Jakarta: Center for Indonesian Law & Policies Studies, 2004).

[37] The work of the authors mentioned here can be found in the references below.

[38] De Mulder R.V., Een model voor juridische informatica [A Model for the application of computer science to law], with a summary in English (Lelystad: Vermande 1984).

[39] Combrink-Kuiters C.J.M., Kennis van zaken. Een jurimetrisch onderzoek naar rechterlijke besluitvorming inzake voogdij en omgang (Deventer: Gouda Quint 1998).

[40] In the example, for simplicity only 23 cases are taken.

[41] De Mulder R.V. and Combrink-Kuiters C.J.M., "Is a computer capable of interpreting case law?" (1996) 1 The Journa l of Information Law and Technology (JILT), <>.

[42] Wildemast C.A.M. and De Mulder R.V., "Some design considerations for a conceptual legal information retrieval system", in Grütters C. et al. (eds.), Legal Knowledge Based Systems: Information Technology & Law, Jurix 1992 (Lelystad: Koninklijke Vermande 1992) 81-92.

[43] de Vries W.S., van den Herik H.J. and Schmidt A.H.J., "Separate modelling of user-system cooperation", in Breuker J.A. et al (eds.), Legal Knowledge Based Systems. Model-Based Legal Reasoning, Jurix 1991,(Lelystad: Koninklijke Vermande 1991) 28-39.

[44] Salton G., Automatic Text Processing; The Transformation, Analysis, and Retrieval o f Information by Computer (Reading Mass: Addison-Wesley Publishing Company 1989); Bookstein A. and Klein S.T., "Information retrieval tools for literary analysis", in Tjoa A.M. and Wagner R. (eds.), Database and Expert Systems Applications (DEXA), Proceedings of the International Conference in Vienna, Austria: 1990, 1-7.

[45] Bookstein A. and Klein S.T., ibid.

[46] De Mulder & Combrink (1996), op. cit.

[47] Apistola M. and Oskamp A., "Knowledge management for law practice: do we really need it?", (2002) Proceedings of the 17th Bileta Annual Conference, (Amsterdam: Bileta 2002) 2

[48] Gottschalk P., "Use of IT for knowledge management in law firms", The Journal of Information, Law and Technology (JILT), 1999 (3). paragraph 2.3.

[49] De Mulder R.V., "The digital revolution: from trias to tetras politica" in Snellen I.Th.M. and van de Donk W.B.H.J. (eds.), Public Administration in an Information Age (Amsterdam: IOS Press 1998) 47.

[50] Mattila H., Comparative Legal Linguistics (Aldershot, U.K.: Ashgate 2006).

[51] See for instance Evans M., et al., "Recounting the courts? Applying automated content analysis to enhance empirical legal research" (2007) 4:4 Journal of Empirical Legal Studies 1007-1039,; Spencer, B.D. "Estimating the Accuracy of Jury Verdics" (2007) 4:2 Journa l of Empirical Legal Studies 305329, 117994343/home .Spencer 2007.

[52] Van den Berg R. and Visscher L., "Optimal enforcement of safety law", in: De Mulder R.V. (ed.), Mitigating Risk in the Context of Safety and Security (Rotterdam: Erasmus University 2008).

[53] Posner R.A., Economic Analysis of Law (6th edition) (New York: Aspen Publishers 2003).

[54] For example, Cooter R.D. and Ulen T.S., Law and Economics (4th Edition) (Boston: Pearson Addison Wesley 2004).

[55] Supported by the European Union as an 'Erasmus Mundus' programme.

[56] (Accessed 090820).

[57] Arcuri A., Governing the Risks of Ultra-hazardous Activities. Challenges for Contemporar y Lega l System s (Rotterdam: Erasmus University 2005).

[58] 'Hermeneutic' is an appropriate qualification of her method here.

[59] C.f. Old School means: fear of new technology, not using it, regulating against it. New School means: use new technology in a rational way, try to innovate, let new technology have its impact on society as well as on norms and rules.

[60] Leith P. and Hoey A., The computerised lawyer (2nd edition) (London: Springer-Verlag 1998). 212.