Current Issues in Research Access to Public Register Databases

Philip Leith [1]

Cite as: Leith, P., 'Current Issues in Research Access to Public Register Databases', European Journal of Law and Technology, Vol. 2, No.2, 2011


Research thrives - indeed can hardly do without - data. It is the raw information upon which better understanding, and better knowledge is based. Indeed one whole assumption underpinning some visions of Electronic Government - such as Wiki-Gov - is the idea that data which has been collected for a specific government task can be re-used in ways which further public service goals through citizen oversight and input (through "open data sets).

There are very many public service collections of data which comprise Public Registers which the researcher might wish to utilise, but which are presently difficult or impossible to access. With the development of database techniques and ever more data being collected and processed in digital formats, the potential pool of data is growing considerably. Importantly, public registers usually relate to individuals giving valuable insights which cannot be easily gleaned in other ways.

This kind of information includes public registers which, it can be argued, consist of "public information" of the sort which might be accessible under the Freedom of Information Act or through the Re-Use of Public Sector Information Directive. Should researchers have the possibility of piggy-backing upon these Re-Use and Freedom of Information rights to access the underlying public register data set as a whole? If so, what are the limitations upon using these sources? The primary questions are: When most such data sets have a value to the agency which is responsible for their collection are they willing to make this available freely? Where many public record data sets will have information relating to individuals, how can access be enabled without infringing data protection law? And, is the current Freedom of Information regime suitable for enabling access to this information?

This article looks to examine how public these public registers are in terms of research usage. It is clear that public registers have already been commoditized (with more commoditization currently being planned) and are used in a variety of ways. Their contents have not, to date, been openly available. Will - in these days of open data - they become an important and novel resource for the researcher?

Part I - Context

1. Introduction

Research based upon data sources is too frequently missing from law. A field which is focused upon the interplay of two parties in the courtroom and which has a belief that the best methodology to assess truth is through aggressive questioning, is perhaps not a field which sees such research as a vital part of the toolbox. A contrary view would be that research is indeed important - perhaps vital - and particularly so when legislation is being used as a tool for social engineering, a use which is itself usually contentious. The rise of "information law", particularly within the framework of the Information Society over the past couple of decades is surely such a contentious example of social engineering, as the EU has attempted to control the use of personal data in a domain where personal data has significant commercial value.

This article arose from an interest in privacy and data protection and how one might investigate the data protection regime using alternative data sources. [2] There are, of course, a number of traditional methods - interview, questionnaire, analysis of Information Commissioner Notices and decisions from the Information Rights Tribunal - but new technology also offers new research paths: the collection of data as part of the data controller notification process (which is a trans-European collection of information and thus even more useful) is a potentially constructive resource to help look at what was happening in the data protection field: e.g. what information is being collected?; what information is not being collected through registration?; would statistical analysis of the data indicate factors which were hidden by other methods? ("letting the data speak" as one distinguished statistics colleague used to say). The "Register of Data Controllers is a public register available in each country across Europe, in that it can be examined by members of the public at will. [3], the method of presentation in the UK has been limited and allowed only basic searching for information where one is presumed to know the subject of that information (e.g. if you wish to view the register entry for Marks and Spencer). Since the information is clearly stored electronically, a more useful research methodology appeared to be to carry out processing of the entire register using some of the data mining techniques which are now common in business.

There are, of course, many, many collections of public sector data which could be available for the detached researcher (as opposed to the researcher who is "approved" and enabled with privileged access) and which might be used in a whole host of research fields - think of social geographers who might wish access to detailed information from government databases, for example. Or even tie in criminal records with geographical patterns? Should these sources be seen as "information held by a public body" and be easily available through Freedom of Information requests? If not, what factors should be taken into account to disable the researcher's desire to access raw data?

It is important to note that the rationale for desiring access to these accumulations of data is that it opens up new ways of doing research: the kinds of tactics which the commercial world have been using for some years to understand the consumer and the marketplace (e.g. data mining) becomes a possible route to new forms of research methodologies. [4] These new forms require access to the digital raw data - simply having access to small amounts of data or rights to inspect visually are not sufficient for these new kinds of research methods. My own research has, in the past, been substantially improved by having access to open data sources so I am aware that this is not a theoretical benefit but an actual one. [5]

Access to these sources of information could offer a kind of Athenian democracy which is vastly different from the system we have now: presently information is collected by government and commerce and they know about us, but we know very little about our fellow citizens. In Athens, the citizen knew about other citizens. Is this the ideal for E-Gov and participation in this new transparent world? Is this the natural evolution of the digital revolution?

2. Gov 2.0 and Data

Morison has effectively outlined the nature of Gov 2.0 in the context of public administration when he writes of the transformative power of technology and access to information:

'These ideas - of interactivity, user generated content and qualitatively new levels and forms of information - combine to suggest important new possibilities. These will often involve using new information technology but Gov 2.0 is wider than this alone. In the context of government the 2.0 approach coincides with an ongoing, wider process of modernisation of government. Although this is less eye-catching than Labour's constitutional reform package, it has been more far-reaching and radical. It is has sought to re-engineer public services, re-construct ideas of the public, the citizen and the consumer, and govern through these ideas in new market-based citizenship models that privilege consumer power as a means of securing equality and participation through the exercise of choice. This is a large-scale, new political project and it is producing a new technology for governing.' [6]

What is this transforming technology which offers more to radical reconstruction of the constitution than the more typical topics of interest to the administrative and public lawyer? Basically, it is the idea that data can "freed" and can become the basis of knowledge which reflects back upon the process of government, improving it and making the public more active participants in the process. The essential quality of this technology is ironically that it is not technology based, rather it is information based. The technology simply enables access and it is the information which becomes available and which can then be processed in a variety of ways.

The UK Government's own Gov 2.0 provision, ("opening up government") demonstrates the basic operation. Data is provided in the form of "data sets" by government agencies and made available to the citizen, who can produce programs (sometimes as "apps") which will present the raw data in more meaningful ways (e.g. diagrammatically). Thus information about crime has been codified into a uniform format for all police forces in England and Wales, containing street information. A developer can then draw this information down and transform it - perhaps by linking to mapping software - to show a graphic representation of the incidence of crime across the country. Of course, any problems with the underlying raw data will be reflected in the mapped data, but if there are problems with that data then public accessibility might highlight this and lead to an improvement in the data collection itself. The system thus become more self-reflective than if it was out of public view. Essentially the public is able to use the same kinds of tools as police forces throughout the world have been using for many years - though without the information pertaining to the individual, of course.

Currently the UK's site is relatively sparse, but the government is encouraging potential users of data to request more [7] indicating clearly that this is a process which has positive support from government rather than a tepid introduction of a "good idea". [8] In the US their equivalent web site is and has substantially more material available, both in terms of raw data and also in "apps" which have been produced within government and by external users. [9] In Europe, too, there is a general move towards opening up data. Generally, it appears that all advanced countries are taking similar steps towards "open government". As Morison points out, the underlying rationale for this kind of participation and access is that it puts the citizen into the position of consumer, who has access to data about services offered and who can make considered choices about these. It is the continuation of the "Modernizing Government" [10] approach which Labour introduced, when they attempted to make public service more responsive to what they saw as the failure of the public services to change outdated operational methods.

There are technical issues around the nature of the data which are important for actual research methodologies - i.e. the interlinking between data elements and/or data sets where more linkage means more powerful analytical techniques can be used - but in this article we pass over these, simply assuming that all forms of open data is research useful. Shadbolt gives a readable overview of these more technical issues. [11]

Commentators such as Beth Noveck have given academic credence to the process of opening up data to public scrutiny by suggesting that bureaucracy is by its very nature isolated and paternalistic and scrutiny and/or input offers positive advantages. Open government, in her view, offers a better route to good public service:

'There are far too few opportunities to collaborate in governance. With new technology, government could articulate a problem and then work with the public to coordinate a solution among and across government institutions and with nonprofit organizations, businesses, and individuals. Instead of only promulgating a law or a regulation to mandate safety in school science labs, for example, it is now possible to organize a volunteer corps to survey the labs and distribute goggles. Instead of enacting sweeping policies on broadband deployment, technology might make it possible to map Internet penetration locally and devise more targeted technical and legal approaches. Instead of prescribing the solution, the government might offer a prize to elicit ten new solutions.' [12]

Noveck was an advisor to President Obama after his election, where the President's first day of office was marked by a commitment to Open Government. [13] One example of Noveck's activity in the field is initiation of the Peer to Patent project at New York Law School, encouraging involvement by the public in order to find prior art to which examiners at the USPTO may not have access. This project is highly interesting providing a demonstration of interaction between examiner and public, but also raising issues of how successful such partnerships can actually be in breaking down barriers between the public and the "isolated" bureaucrat. [14]

3. Limitations on Open Government

Making open the data which arises from public records is an ideal. In practice though, there are limitations upon what the researcher might access. It is important to recognize that while open government is being supported positively in all advanced countries, there are a number of elements which are particularly relevant to the European context: data protection and re-use/copyright/database differences. Europe has Data Protection legislation which requires protection of personal information. Since the view of what personal information actually is has been cast very widely indeed (including information about car servicing, for example [15]) and indeed differs throughout the European Community [16], we can see that simply taking raw data from public registers and making it available could infringe these data protection rights and be seen as infringing privacy. For example, Privacy International, in its Big Brother Awards 2005, shortlisted, 'The Land Registry. For openly placing details of all house purchases and purchasers online for a fee.'

Early examples of the research access problems which arise from data protection rights came from medical registers: for example, cancer registration is a vital tool in epidemiological investigation of cancer but when there is only one incidence of a rare cancer located in a relatively wide geographical area, it is possible to link the cancer register to an individual. [17] Further, the information collected as part of that particular register is quite detailed, covering ethnic origin, employment status, etc., of the individual sufferer - all of which is of import to the epidemiologist. [18] Cancer statistics are available through the Open Government site, but clearly the publicly available information does not contain all this detail. Whereas we may expect some limitation upon what is public, the primary users of such data have raised issues about the effect of data protection upon research: it has long been argued by some epidemiologists that the data protection regime has seriously affected their work, and that journalists have preferential rights:

'Society seems willing to accept that, in the interests of wider public good, journalism may sometimes invade individuals' privacy and do them harm, but it is not prepared to offer epidemiology an equal measure of tolerance.' [19]

Although data protection legislation contains a research exemption, [20] it is not only epidemiologists who have raised concerns. Erdos has recently suggested that the field of social research is being seriously threatened by the data protection regime. [21] The underlying problem is that research is necessarily - in the social sciences - about the individual (how can we know about society if we dont know about individuals?). If we are to understand the individual in society as part of researching the impact of social engineering by law, can we put the individual's right before that of the researcher's goal of increasing our knowledge?

A recent report by OHara for the Cabinet Office [22] contrarily suggests that privacy should be of higher concern in EGov and that data protection issues should be highlighted by sources such as, with anonymization of data routinized. Since public registers, of course, will always contain data which relates to specific individuals, this is obviously a debate which will continue: between the value to researchers of data which is particularised and that which has particulars removed. [23]

Another problem for those wishing to utilise open government data, and one which has been well aired in various locations, is the copyright problem. In the US copyright legislation specifically excludes federal information from copyright protection. [24] In Europe, copyright for government data is treated in the same way as that of commercial or artistic works, that is as a privatised commodity. Access to information through Freedom of Information does not simultaneously allow commercial re-use of information or republication - one may be able to access a document, for example, but that does not mean that it can be freely published. The primary reason for opposition to government asserted copyright [25] is that copyright restrictions make use of that data expensive and thus difficult for the non-commercial user and researcher - e.g. Ordnance Survey in the UK has valuable map data and the business model to support Ordnance Survey requires income from selling map data. Access to this data, it has been have argued by critics of the current regime, is essential in order to make open government work. OS has resolutely argued that the data is too valuable and that their business model would not allow the data to be given away freely. [26] Many early innovative information projects failed to take off due to the costs of licensing this map data, [27] but more recently free-to-user resources such as Google maps have enabled developments where individual licensing of OS data is not required. The debate continues over access and re-use to this data but the typical current model of use of mapping data involves licensing. [28]

In essence, the topic of this paper is intrinsically related to the two aspects of data protection and allowing re-use (by declining to exert government copyright/encouraging low or no licensing costs). If open government is really to succeed - even if the optimists such as Noveck are overstating the advantages [29] - it seems clear that more data sets than are currently available must be made accessible free from data protection and copyright restraints. If open government succeeds, then will researchers also have a wider source of data to analyse?

4. Information in Public Registers

It is difficult to define just what a public register is, or to provide a template which sets out what is included, and where and when access is allowed. Most public registers are set up to fulfil some public function - which function is not always explicitly stated - and access and contents are described in the originating legislation. As a typical example which will probably be mirrored by legislation across all of Europe, the UK Licensing Act 2003 requires "licensing authorities" (i.e. who license selling of alcohol) to keep a register:

'8 Requirement to keep a register

  1. Each licensing authority must keep a register containing
    • a record of each premises licence, club premises certificate and personal licence issued by it,
    • a record of each temporary event notice received by it,
    • the matters mentioned in Schedule 3, and
    • such other information as may be prescribed.
  2. Regulations may require a register kept under this section to be in a prescribed form and kept in a prescribed manner.
  3. Each licensing authority must provide facilities for making the information contained in the entries in its register available for inspection (in a legible form) by any person during office hours and without payment.
  4. If requested to do so by any person, a licensing authority must supply him with a copy of the information contained in any entry in its register in legible form.
  5. A licensing authority may charge such reasonable fee as it may determine in respect of any copy supplied under subsection (4).
  6. The Secretary of State may arrange for the duties conferred on licensing authorities by this section to be discharged by means of one or more central registers kept by a person appointed pursuant to the arrangements.
  7. The Secretary of State may require licensing authorities to participate in and contribute towards the cost of any arrangements made under subsection (6).' [30]

There is no mention of online access to this register in this Act (despite being enacted in 2003) or in the Regulations referred to in s.8(6), [31] but most licensing authorities now offer some mode of online searching of their licensing register at no cost - the access is frequently primitive, and sometimes appears as a PDF file but can be accessed via external search engines. Some registers will have information which is available to the public and some which is not - and, of course, cultural aspects will be relevant across jurisdictions. [32]

With the rise of electronic search, there has been concern in some quarters about access and usage of these public records. In one of the earliest UK cases, Robertson v Wakefield, [33] the applicant was unhappy that commercial use was being made of the electoral register, which in the UK has been public. His argument was that data protection law gave him a right to opt out of giving consent for certain functions and this right was being denied through the sale of the register (which sale by the public authority was required by other legislation). The electoral register has important democratic functions, including who might be a donor to a political party, [34] and who might vote in elections, [35] so it could certainly be argued that collection of data is for a specific purpose. Robertson's argument was that to use it for a different purpose required his consent, an argument which was successful. Giving primacy to the data protection regime's requirement to consent is problematic for researchers: socio-legal researchers have become used to consent forms prior to carrying out interviews, but after Robertson, any use of information relating to an individual from a public register could be conceived as similarly requiring consent. The register is - post-Robertson - made available in several forms:

  • for visual inspection of the full, local register
  • commercially available digital version which includes only those who do not object to their details being sold, or used for non-specific purposes. This can be used for marketing.
  • commercially available digital version including all eligible to vote (e.g. fraud protection which under data protection legislation does not require consent).

Information is thus commercially available from various sellers based upon the electoral register or which incorporates register information along with other derived information. One online provider, The UK Electoral Roll [36] provides a number of packages covering those eligible to vote and who have not requested exclusion from commercial re-use. [37] Other companies allow free searching of their materials with more detailed information available at charge. For example, was able to provide free and accurate personal information concerning this author with little difficulty:

  • Name
  • Address
  • Phone number
  • Date of birth
  • Family members resident with author
  • Occupation ("Law teacher")

If registered with the site, marriage details would also be available and perhaps also information linking to siblings and parents. advises it can: '[g]ive you much more information including age guide co-occupants, length of occupancy, property prices, neighbours, Director Reports and more!' Linked in with other data (from private or public databases) such as maps and access to Google street view enables the other to see whether one's garden was well kept and house well painted. Some may view such systems as invasions of privacy since the availability of information from a number of these differing registers can build up a picture of an individual which is currently only available to commercial concerns - e.g. firms such as Experian or dunhumby who have very large databases covering credit information, bank information, shopping patterns, etc. for practically the whole UK population (and which is used by government, too, when it requires information on individuals).

Public registers cover a whole range of information. Licensing, planning applications [38], and in the US some States make available prison records (and photographs) [39] , civil court records, are all available for some specific public purpose but which also has other research usages when combined. Public registers are thus important sources of information. In very many ways these are an important basis upon which to build the EU's "Information Society" project, as personal information becomes a product which enables commercialization or use in everyday government tasks. To the researcher, the existence of linked data sets built upon public register data made available for Open Gov or Re-Use goals could potentially transform the quality of their research outputs.

5. Re-Use of Public Sector Information

The Re-Use Directive was an important step by the EU, taken to try to overcome the copyright problem. [40] The original aim was to do so by modelling the system in the US (where Federal information is not copyrightable) and encourage the development of new commercial products based upon public information free from copyright. However, opposition from European governments revised this and allowed copyright to remain, but encouraged re-use of copyrighted information. The recitals to the Directive clearly outline the new aim:

'(2) The evolution towards an information and knowledge society influences the life of every citizen in the Community, inter alia, by enabling them to gain new ways of accessing and acquiring knowledge.

(3) Digital content plays an important role in this evolution. Content production has given rise to rapid job creation in recent years and continues to do so. Most of these jobs are created in small emerging companies.

(4) The public sector collects, produces, reproduces and disseminates a wide range of information in many areas of activity, such as social, economic, geographical, weather, tourist, business, patent and educational information.

(5) One of the principal aims of the establishment of an internal market is the creation of conditions conducive to the development of Community-wide services. Public sector information is an important primary material for digital content products and services and will become an even more important content resource with the development of wireless content services. Broad cross-border geographical coverage will also be essential in this context. Wider possibilities of re-using public sector information should inter alia allow European companies to exploit its potential and contribute to economic growth and job creation.'

Development of Re-Use in the first years of the Directive's life was not spectacular, however, more recently we have seen a considerable movement from governments - partly as eGov becomes acceptable practice - to loosen the constraints upon re-use of data through various licensing schemes and access systems such as

It is important to note that not all registers produced by public bodies are publicly accessible as of right and in totality, but often there are a variety of access methods possible depending upon the context. For example, in the UK, the primary purpose of the vehicle licensing register is to keep details to verify that tax has been paid on vehicle use. Having such a valuable collection of data, though, has meant that the licensing authorities have been prepared to make re-use or related access available to commercial and other interests. [41] Members of approved trade organisations such as The British Parking Association, The Association of British Investigators and The Finance and Leasing Association (FLA) can have direct electronic access to the register. The public can also request information about the person who the vehicle is registered to (if they have the registration details of a car with which they were involved in an accident) but have to persuade the registration authorities that they have "reasonable cause" and their use of this data is lawful. Access to information about a vehicle (e.g. when it was first registered) is viewed as non-problematic and offered cheaply by the licensing authority. This demonstrates the commercial value of data held by government, and also that government is happy to share this data in ways in which the person registering a car is most probably unaware.

6. Freedom of Information and Access Limitations

Freedom of Information legislation across Europe has had a mixed history, with some countries having well developed law (Scandinavian countries in particular from the 18th century), some with recent legislation (the UK), and some with no such access method (e.g. Spain, although an access to information law is currently being considered). In Access Info Europe v Council of the European Union, [42] dealing with the practice of blocking out names of countries from documents dealing with negotiations the General Court demonstrated that Europe, too, must follow the openness path:

'If citizens are to be able to exercise their democratic rights, they must be in a position to follow in detail the decision-making process' and 'to have access to all relevant information.'

In simplified terms, we need to keep in mind that access to information concerning others [43] through the two available routes is not identical

  • Freedom of information access is based upon openness of government
  • Public sector re-use access is based upon commercial processing of data

Both offer the researcher access for data use which are not the formal goals of these regimes. Thus the researcher need not be concerned about practice of government (think of Durkheim using government data on suicide for his theory of anomie) or reprocessing of data for commercial profit, to utilise data gleaned by either method.

The impact of the data protection regime affects both of these routes, as we have already pointed out. However, data protection has been viewed by the legislators as less significant in terms of public access to documentation/freedom of information, since this is not about re-use of personal information in a commercial context. The Data Protection Directive recitals acknowledge this difference by stating:

'72. Whereas this Directive allows the principle of public access to official documents to be taken into account when implementing the principles set out in this Directive.'

However, in the implementation to UK national legislation this "taking into account" has not meant that the data protection principles are over-ridden. In the FoI Act, s.40(2) excludes third-party access to personal data when that access would contravene any of the data protection principles, and s.41 excludes information which is confidential (e.g. most salary information). [44] There is a further "limitation" to access which is to be frequently found in practice. Section 21 of the FoI Act provides an absolute exemption to the duty to communicate information, that is, that if the information is available from other sources or there are means of getting that information, then FoI is not an appropriate route. The purpose, presumably, is that government departments should not be inundated with requests for information when the requestor can go elsewhere. The basic test of whether s.21 applies or not, is whether the information is "reasonably accessible":

21 Information accessible to applicant by other means

(1) Information which is reasonably accessible to the applicant otherwise than under section 1 is exempt information.

(2) For the purposes of subsection (1)

(a) information may be reasonably accessible to the applicant even though it is accessible only on payment, and

(b) information is to be taken to be reasonably accessible to the applicant if it is information which the public authority or any other person is obliged by or under any enactment to communicate (otherwise than by making the information available for inspection) to members of the public on request, whether free of charge or on payment.

(3) For the purposes of subsection (1), information which is held by a public authority and does not fall within subsection (2)(b) is not to be regarded as reasonably accessible to the applicant merely because the information is available from the public authority itself on request, unless the information is made available in accordance with the authority's publication scheme and any payment required is specified in, or determined in accordance with, the scheme.

Various guideline documents have been produced to elucidate just what this might mean. For example, the UK Information Commissioner's (ICO) guidance suggests that if mobility or distance is an issue, then the public authority should consider providing that information. [45] The ICO suggests that,

'[t]he main consideration is likely to be whether the authority wishes to charge a fee in accordance with the Fees Regulation. These provide that authorities are able to recover the cost of disbursements such as photocopying and postage.'

Fees for dataset provision should logically be low: there is no printing involved, postage costs are irrelevant (if emailed), and the material is easy to extract by means of standard query languages. [46] Given that datasets are usually set out with standard field format, there should be no requirement to scan documents for personal information which may be affected by the data protection requirements (such fields can simply be excluded from the supplied dataset). Section 21, though, is potentially problematic to the researcher who needs a data set and who does not have a mobility or disability issue. It effectively allows the data holder to limit the amount of information, or to make it available in such a form that it is of little or no use to the researcher. We see examples of this below.

6. Access to Electronic Materials under Freedom of Information

Clearly, the ideal for the researcher is that when access to the data is achieved, it is in a form which is processable by digital means. Thus offers data sets, rather than printed documents. Re-use, too, will provide digital data sets which saves the effort and cost of rekeying data. Freedom of Information rights do not specify that the researcher can request the "document" in any specific format, which differs from those rights relating to access to environmental data. Environmental data access is regulated in the UK by The Environmental Information Regulations 2004 and it specifies in s.4 that:

4. - (1) Subject to paragraph (3), a public authority shall in respect of environmental information that it holds

(a) progressively make the information available to the public by electronic means which are easily accessible; and

(b) take reasonable steps to organize the information relevant to its functions with a view to the active and systematic dissemination to the public of the information.

In s.6 of the EI Regulations, it is required that the applicant's preferred format should be taken into account. The FoI Act does not explicitly say that information should be made available in electronic format, nor that public authorities should work towards making materials available in such format, and neither does it provide a "preferred format" option for requestors of information. The difference between the two arises because the EI Regulations implement the Aarhus Convention: [47]

'3. Each Party shall ensure that environmental information progressively becomes available in electronic databases which are easily accessible to the public through public telecommunications networks.'

This raises the question of whether a similar obligation is implicit (rather than explicit) in the FoI Act idea of "publication scheme" or whether in order to utilise online access, the relevant section of the FoI Act requires amendment. We have already seen that many public authorities have taken a pro-active path in providing online access to their public registers, and certainly it could be argued that any definition of "information" must include electronic information and that requires an electronic means of access. S.1 of the FoI Act states:

'(1) Any person making a request for information to a public authority is entitled

(a) to be informed in writing by the public authority whether it holds information of the description specified in the request, and

(b) if that is the case, to have that information communicated to him.'

A common sense reading of this would imply that if the information is in electronic format, then it should be provided in electronic format: there is no limiting definition of what "information" actually is, and it is clear that in the current legal context, it should be read widely. Even if "information" had been replaced by "document" in the original Act there would be no reason to think that this is a limiting concept either since in European public law the meaning of "document" is widely construed with Art 3(a) of The EU Regulation on Public Access to Documents (1049/2001) taking a much more accurate view of reality when it describes what a "document" is in the current technical framework:

'"document" shall mean any content whatever its medium (written on paper or stored in electronic form or as a sound, visual or audiovisual recording) concerning a matter relating to the policies, activities and decisions falling within the institution's sphere of responsibility", ie: a 'non-paper' is a 'document'.'

The wide ranging nature of this view of a document has been emphasised by The European Ombudsman in Public Access to Information in EU Databases [48] when he stated:

'This definition of "document" clearly has the purpose of grasping the electronic reality of modern administration. The definition implies a "content" contained in a "medium".'

We see below that not all UK public authorities utilise this common sense reasoning concerning the format of the information which should be accessible.

A related point is that databases have - intrinsically - no structure at the "query level". By this I mean that while the individual fields in the database are structured, the actual search mechanism does not reside at the field level: how one views the interrelationships between the data elements in a database is purely down to the way that they are searched ("queried" in database terminology). In a Freedom of Information request made by the author to the Information Commissioner, [49] the request was refused because they did not "hold the information":

'Our first ground for declining to supply the information requested is that we do not hold this information. We neither can nor do routinely extract this information from our system. We certainly hold the raw data which would form the basis of the analysis you request, indeed this information is available on the public register of data controllers, but we do not hold results of the analysis because it is an analysis we never perform ourselves.'

In this wider view of electronic information - outlined above - there is no suggestion that access to content should be constrained by the manner in which a public authority sets up it its database queries: that would effectively prevent the public being properly informed (which is the role of the Freedom of Information Act) and would provide loopholes which would prevent many kinds of analytical insights into how a public authority acts. The ICO clearly - in my view at the time - held the information I had requested and a member of the public has - unless due to exemptions such as confidentiality, commercial nature etc. - the right to receive that information. Since the data which I had requested did not contain any information which was exempt, my interpretation was that the content of the database should have been made available in order to be "accessible" under the Act. If the public authority is not prepared to carry out querying of the database itself (for whatever reason [50]) in order to answer a request for information should it simply accept that it must make the database available as a whole for the requestor to run those queries on the raw data him or herself?

Part II Access in Practice

The goals of Freedom of Information and Re-Use are quite clear: public oversight and enabling commercial information products, both of which might make data more accessible to the researcher. In practice, however, the recipients of requests have not always been so happy to produce materials as required, particularly in the earlier days of the regimes. More recently access has much improved with well considered publication schemes becoming the means of making information available, and with government licensing of data being enabled through simplified licensing schemes - the "one-click" scheme in the UK and its newer models. But not all access has been made simple, and we look briefly here at examples of public registers which - a researcher might think - should be accessible under both Freedom of Information and under Re-use, but where access has not been easily enabled and indeed has been hindered.

7. Examples: Ship Registers, Census & House Sales

As a non-legal example of access where the holder of the public authority justifies a refusal to communicate the contents of public register database via the s.21 exemption, we can look at the UK's Registry of Shipping and Seaman which holds information on a variety of ships and various boats which are in UK ownership, log books and also records relating to individual seamen. The information might be relevant to a host of researchers - from transport historians, researchers into commercial transport, etc. Early records are held by the Public Records Office, and available for inspection. [51] It can be seen that a collection of more recent digitised data such as this could be both valuable to the commercial section and to various research communities. How might research access be enabled? Unfortunately, access is not enabled by the public authority since it suggests that the information is reasonably available (at charge) and thus exempt through s.21. For the researcher, the charges would be considerable: 'Transcript of Registry', 'Closed Transcript of Registry' or 'Historical Transcript of Registry' naming the vessel and its official number have a statutory fee of 21 per vessel for a current transcript and 32 per vessel for a closed or historic transcript. Given that the register contains information on around 1,600 large vessels [52] the costs for the information as a whole would be considerable. Although a request was made to determine Re-Use charges from the Register of Shipping, at the time of writing no reply had been received.

A different model is that of UK house sales data where the requirement to state the price paid is a registration requirement. Land Registry, the public authority, has utilised a variety of models - selling data to the consumer themselves, making data available to re-users who either charge for information or make the information free through advertising. The Land Registry approach is in line with that of the Re-Use Directive, in that it focuses on commercial re-users of data, rather than the needs of the researcher who has little intent to commercialise the data. It seems unlikely that Land Registry would do anything but use the s.21 exemption to refuse a Freedom of Information request to the underlying database of 22 millions records as a whole.

An example of where the public register has time and access limitations placed upon it is in the UK Census. Records have been collected and digitised and due to the amount of data contained these have been of great interest to researchers. For example, since family research is an important element of census research one approach has been taken to utilise various resellers of information The National Archives,, and have digitised early census records and work from a commercial model where basic searching is free, but fuller information is charged. On a different level, the government's UK National Statistics service and also available to researchers for more up-to-date information via (where licensing has been agreed with the data providers). The data is thus made available in usable formats, but for the public only early data is available. The result being that the public-oriented access is to enable individuals to be located, but the academic model is for large scale processing for social research purposes. The model is made more complex because digitisation of the early records was carried out in partnership with the commercial partners of the National Archives and also the census document is not made public until 100 years after the census date.

This Census example also highlights a potential problem when a public authority enters commercial partnerships with others: is the data public data? It derived from public records, but it is not clear where the ownership of that data resides. Similarly, prior to the inception of BAILII [53], the UK Court Service had entered commercial partnerships with suppliers of legal services which meant that judgments were effectively privatised and hidden from public view unless the end user was willing to pay the commercial entity for access. The rise - in parallel to eGov - of commercial partnerships has meant that such ownership questions may arise again.

8. More detail: Accessing the Data Protection Register

The Data Protection Register is available online and can be searched, however, the search facility is basic, allowing only searching on the fields: registration number, name, address, postcode, organisation sub-division. The input to the search fields must be accurate - maks and spencer will not be interpreted as marks and spencer, for example, though spelling correction in search engines is today usually considered an essential facility. For someone who learned to program in the 1970s with teletype input and pre-graphic screens, the ICO's approach to information provision certainly brings back memories. As a research tool into who is holding information and how they are describing their processing of that data, it is pretty useless. A request was thus made to the Information Commissioner for a copy of the full database of data controllers who were registered through the Notification process. This was refused but:

' to provide assistance (and as a discretionary disclosure) we have provided a complete list of data controllers and reference numbers. This list has been provided in text format (compressed to reduce the size of the file) and includes names and reference numbers. Any further information can be obtained by searching the online register using the details provided.'

Such an offering is, of course, not useful for any meaningful data mining or other research purposes. A request for review was made, highlighting the following points:

  • The data was not personal or sensitive personal data: It is not clear how many entities are registered on the Register of Data Controllers but may be around 377,000. [54]
  • This register contains - to my knowledge - no information relating to the personal or sensitive personal data on individuals. It is thus a relatively objective block of data which might be amenable to providing information on the kinds of processing and/or descriptions of processing of personal and sensitive data.

The view of "information" taken by the ICO:

'The ICO has taken the most limited approach to the definition of "information" which is possible, and is one which seems to deny the development of digital technology. By providing a very, very small window upon a corpus of information, the ICO makes the claim that the information is accessible when in fact only partial information is being provided.'

That the EIR should be a model for FoI access:

'If access to electronic data was required by the ICO in its Decision Notice under EIR 6(1) then I cannot see why it should not be made available under FoI Section 11.'

The Decision Notice by the Information Commissioner supported the original view of the Information Commissioner. [55] In outline the Notice answered the points:

  • The information was reasonably accessible since there is no obligation on the ICO to provide information which is already published.
  • Arguments based upon good practice in access to European public documents are "misplaced".
  • FoI access differs from EIR access and the latter should not be used as a model for FoI access.

One reason for requesting a copy of the register was that it would demonstrate the view which the ICO took towards the role of the FoI Act and how it should be interpreted as the culture of openness of data and open government developed. Essentially, they were not prepared to allow the Act to be dragged into the digital age: s.21 was being used by the agency with initial responsibility for the correct operation of the Freedom of Information Act to deny access to information under that Act. If the ICO was refusing to allow information to be squeezed out of it, then the researcher could hardly expect other agencies to be more co-operative. The original expectation of receiving the data through a FoI request was low (given their previous response to a request) and the intention had been to move towards the Information Tribunal. Unfortunately this was not possible. [56] Would the tribunal have taken a different approach?

Part III - Improving the Legal Context for Researchers

Freedom of Information requests seem to be being hindered by s.21 - where, to protect the underlying database the public authority need only set up a primitive form of access to that database and inform researchers that access is available and the terms of the Act are being met. This was, effectively, the approach taken by the Information Commissioner's Office in the UK. There is, though, a larger context which is having an increasing effect upon these access limitations. A growing pressure from individuals and groups requesting information, a government keen to be seen as being "open", and a desire from the EU to move the Information Society project along so that commercial entities can profit from re-using government data and information are all beginning to have an affect. In particular, the proposals for revisions to the Freedom of Information Act and the general cultural landscape promise a better future than past for researchers who wish to utilise public record databases.

9. Dataset Revision to the Freedom of Information Act

The Protection of Freedoms Bill, a bill designed - amongst other things - by the coalition government to remove some of the "authoritarian" elements of the previous Labour government's legislative programme includes changes to the Freedom of Information Act which encourages public authorities to make data sets more freely available and also to make the form of supply closer to that of the Environmental Information Regulations. S.11 of the Bill includes:


(a) an applicant makes a request for information to a public authority in respect of information that is, or forms part of, a dataset held by the public authority, and

(b) on making the request for information, the applicant expresses a preference for communication by means of the provision to the applicant of a copy of the information in electronic form, the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.'

In many ways, this Bill seems to go towards the criticisms of the current FoI act and how it has been interpreted - that is, that it takes a holistic view of "information" and introduces the concept of "dataset" into the legislation. Of course, one might suggest that this was already implicit in the notion of "information" under the original Act. A dataset is defined in the Bill as factual information, held in electronic format, and which has been, 'obtained or recorded for the purpose of providing a public authority with information in connection with the provision of a service by the authority or the carrying out of any other function of the authority.' It should also not have been been organised, adapted or otherwise materially altered since it was obtained or recorded.' The Bill also conflates Re-Use with Freedom of Information, requiring datasets (under the Freedom of Information definition) to be made available for Re-Use under the typical licensing terms (previously known as "one-click" and now even less onerous). Since many licences are currently free [57] this effectively means that electronic information will be provided with very few limitations or costs to the researcher.

Such access is certainly to be welcomed - giving much support to the idea that government should provide data for analysis. However, it may be too early to think that all problems are over. In particular, the "problem" of s.21 may well remain: if the material is available through other channels, it does not seem that the Bill's proposed revision or the concept of dataset will be sufficiently useful to allow access to public registers where information is being provided on a charged for or free individual query basis. More clarity in the Bill over just what the purpose of s.21 in the FoI Act is, and how it will be balanced by these new elements would have been useful.

10. Cultural Changes

Given that the purpose of the FoI Act was to provide information to the public, it is somewhat ironic that after my request for access to the Register of Data Controllers was made and refused, the ICO decided that it would make the information available anyway. It is not clear what we are to make of this - the primary Act enabling access is used deny access, but the public authority responsible for FoI guidelines decides to consider making the information available outwith that Act. As the Decision Notice was being prepared which would confirm that the database was not available under the FoI Act, a consultation was set up to look into producing a register in electronic format, with particular concern about "sole traders":

'However, a number of entries on the register relate to individuals, such as sole traders, and there are therefore data protection considerations. For example, is it fair that data collected for a statutory purpose is made available in a form that could make it more widely available and usable?' [58]

The arguments I had made appeared to have been accepted as the basis for the consultation, but the new problem (not raised in the Decision Notice) of data protection was put forward: that of sole traders. It is not clear why sole traders are problematic: they are traders and the information about them refers to their professional capacity. The ICO has been in the past keen to prosecute individuals for non-notification, even though it has been in the ICO's powers to exclude these from the notification process. [59] It is difficult to produce a rational explanation of exactly what the ICO is up to - except that it is having substantial difficulties in coping with the digital age (also seen in the current "cookie" problem [60]). Public bodies have certainly been - following the ICO practice - utilising s.21 of the FoI Act ('Information accessible to applicant by other means') to deny researchers access to the information as a whole.

The result of the ICO consultation has been that the full Register is now available for research use. Access under the FoI Act would be preferable, of course, rather than upon the whim of whichever ICO happens to be in place at a given time, since the history of access to public documents has been that a right to access has been more important in developing openness than a right to consider allowing access.

Whatever the view of the ICO, governments are now supporting more open and transparent approaches to information. Currently the UK has a public consultation underway on "Making Open Data Real" [61] and a recent UK proposal has been the Public Data Corporation:

'A Public Data Corporation will bring benefits in three areas. Firstly and most importantly it will allow us to make data freely available, and where charging for data is appropriate to do so on a consistent basis. It will be a centre where developers, businesses and members of the public can access data and use it to develop internet applications, inform their business decisions or identify ways to run public services more efficiently. Some of this work is already taking place but there is huge potential to do more. Secondly, it will be a centre of excellence where expertise in collecting, managing, storing and distributing data can be brought together. This will enable substantial operational synergies. Thirdly, it can be a vehicle which will attract private investment.' [62]

The PDC approach appears to be that - found in the Protection of Freedoms Bill - of breaking down the barriers between freedom of information and re-use and is certainly to be welcomed. For the researcher, whether this conflation will allow rights to access data freely (or at low cost) where Re-Users must pay (i.e. whether there will be a distinction between commercial and non-commercial use), will be an important element to keep in view.

11. Conclusion: The future of access to public registers

We are beginning to see that research access to public registers is changing. From a near Victorian attitude to access (think of clerks making handwritten "extracts" which are charged to produce income to cover the staff time involved) towards one where there is a presumption that non-personal (or sometimes personal) information which is collected by Government should generally be made available at request in the form of a dataset. Since most of this information is now being harvested electronically and stored in that format, it is a trivial exercise to then enable access to the information at almost no further cost. Along with the technical framework which reduces transactions costs there has been a rise in the belief that information should be available to the public, where possible. Provision of datasets through is one clear example of this. Re-Use is another example.

But there are problems for the researcher, the inclusion of personal information in public registers being the primary one in a context where privacy is seen as a need; and a secondary one being s.21 of the Freedom of Information Act which - being an absolute exemption - is being used, not for its original aim of encouraging users to find other sources of information before they approach public authorities, but for limiting access to either small chunks of information or information which is charged at high rates. Third, is a charging mechanism under Re-Use - is public data to be seen as a commodity to improve governmental income?

Access is but one aspect of public registers, of course. As important as access is the content and form of the register - do they include material which is best designed to elucidate the nature of the public authority's task? In the case of the ICO's register of data controllers, this is probably not the case - it appears to have designed more as a source of revenue rather than as a source of information about who is doing what with data.

However, perhaps when fuller access is enabled to researchers, the next step will be the demand that registers themselves are made useful to users outwith the needs of the public authority itself. A register dealing with data processing could, for example, contain information on quantities of records being processed, types of processing etc. [63] Such information would really provide a window into understanding the data protection regime.

[1] Philip Leith is Professor of Law at Queen's University Belfast.

[2] See Leith P., "Squeezing Information out of the Information Commissioner: Mapping and measuring through online public registers", (2006) 3:4 SCRIPTed 389 for an earlier attempt to carry out the same investigation.

[3] Art. 18, Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data

[4] See, for example, Mobility, Data Mining and Privacy: Geographic Knowledge Discovery, Fosca Giannotti F. & Pedreschi D. (Eds), 2008, Springer where the collection of data from various resources (e.g. phone data) is used to understand the movement of populations based upon the movement of individuals.

[5] Research into the examination of software patents for Leith, P, 2007, Software and Patents in Europe, Cambridge UP, was substantially improved by having online public access to the correspondence file for the European Patent Office.

[6] Gov 2.0: Towards a User Generated State, The Modern Law Review, Vol. 73, Issue 4, pp. 551-577, July 2010

[7] We therefore encourage you to engage with us and give us your feedback and input on what datasets you would like to see released. We know that by doing this, creative people like yourselves will be able to build innovative new applications and websites which will make a real difference to people in their everyday lives. Francis Maude, Minister for the Cabinet Office

[8] See Cabinet Office, The Power of Information: An independent review by Ed Mayo and Tom Steinberg (2007).

[9] The US budget allocation to open government initiatives has, though, been much reduced for year 2011 onwards. It is not clear how this will affect the provision of data through or whether this is a temporary reduction in funding.

[10] Modernising Government, March 1999, Cm 431, Stationary Office, London.

[11] Shadbolt, N., 2011, Towards a EU Data Portal

[12] Noveck, B.S., p xiii, Wiki Government: How Technology Can Make Government Better, Democracy Stronger, and Citizens More Powerful, Brookings Institution Press, 2009.

[13] Memorandum for the Heads of Executive Departments and Agencies, Transparency and Open Government at

[14] See Leith, .,P E-Participation and E-Participants: solving the patent "crisis", forthcoming.

[15] See Article 29 Data Protection Working Party, WP 136, Opinion 4/2007, on the concept of personal data Example No. 6: car service record The service register of a car held by a mechanic or garage contains the information about the car, mileage, dates of service checks, technical problems, and material condition. This information is associated in the record with a plate number and an engine number, which in turn can be linked to the owner. Where the garage establishes a connection between the vehicle and the owner, for the purpose of billing, information will "relate" to the owner or to the driver. If the connection is made with the mechanic that worked on the car with the purpose of ascertaining his productivity, this information will also "relate" to the mechanic.

[16] For example tax payments are viewed as public in Sweden, but highly sensitive private information in the UK.

[17] Registries in the UK are publicly funded. Mapped instances are available via the National Cancer Intelligence network at See also,

[18] The Office for National Statistics is responsible for the collection of this data. A standardized methodology has been followed which was set by various international health agencies.

[19] The ethics of data utilisation: a comparison between epidemiology and journalism, Westrin C-G & Nilstun T., BMJ, Volume 308 19 February 1994

[20] Art. 13(2).

[21] Erdos, D. Informatin and Communications Technology Law, Vol. 20, No. 2, June 2011m 83-101.

[22] O'Hara, K. (2011) Transparent Government, Not Transparent Citizens: A Report on Privacy and Transparency for the Cabinet Office.

[23] See Leith, P. The socio-legal context of Privacy,International Journal of Law in Context (2006), 2: 105-136 for a critical view on privacy vs social utility.

[24] Title 17, 105. Subject matter of copyright: United States Government works: Copyright protection under this title is not available for any work of the United States Government, but the United States Government is not precluded from receiving and holding copyrights transferred to it by assignment, bequest, or otherwise.

[25] I include the database right within this concept of copyright. Clearly most of the data in databases will be raw data which is under the level required for copyright protection, so the sui generis right is the most effective protection for this.

[26] Part of the modernization programme in government has been to set up "agency" departments which have a degree of autonomy from government, and are expected to fund their own operations: Ordnance Survey is an independent non-ministerial government department with Executive Agency status operating as a Trading Fund under the Ordnance Survey Trading Fund Order 1999 (SI 1999/965). Ordnance Survey is accountable to Parliament though the Secretary of State in Communities and Local Government (CLG). CLG will approve the policy and financial framework within which Ordnance Survey operates and the Corporate Business Plan and Agency Performance Monitors.

[27] One EU funded project in which I was a partner was unable to move further than experimental stage because of an inability to fund a license for map data. See information on Add-Wijzer project in Beyond PPGIS: Legislative maps and semantic web supporting democratic processes, Peters R., van Engers T., Wilson F., at

[28] For example, central purchase of a license has allowed all government services in England and Wales to access data without further cost. See Public Sector Mapping Agreement (PSMA) for England and Wales at

[29] Leith P., E-Participation and E-Participants: solving the patent "crisis", forthcoming.

[30] Further information has been required in The Licensing Act 2003 (Licensing authority's register) (other information) Regulations 2005

[31] The Licensing Act 2003 (Licensing authority's register) (other information) Regulations 2005

[32] In California, it is possible to require that one's marriage record is not made publicly available, and in France death certificates are not part of any public register. In Spain, judgments from the criminal court have names removed.

[33] Robertson, R (on the application of) v City Of Wakefield Metropolitan Council [2001] EWHC Admin 915 (16th November, 2001)

[34] The Electoral Commission v City of Westminster Magistrates' Court [2009] EWHC 78 (Admin)

[35] For example, Spain's complaint re: Gibraltar. Case C-145/04. RVJ.


[37] 'People not entitled to vote include members of the House of Lords, foreign nationals resident in the UK (other than Commonwealth citizens or citizens of the Irish Republic), some patients detained under mental health legislation, sentenced prisoners and people convicted within the previous five years of corrupt or illegal election practices. Only registered voters who are listed on the edited electoral roll can be found using our searches.'

[38] See online systems such as which allows viewing of all letters, plans, etc. These UK documents, of course, have always been available for inspection in person, frequently requiring information about the inspecting individual having to be provided.

[39] For example, see the "Find an Offender" at Georgia Department of Corrections at which provides full details, including a recent image of each inmate.

[40] Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information

[41] See the ICO's response to the consultation on the use of this information: Release of Vehicle Keeper Data from the UK Vehicle Registers: Department for Transport Consultation, JB/LS 30/3/06. At

[42] Case T-233/09

[43] Access to information about oneself is via the data protection regime.

[44] And for Re-Use, too - despite the view of the Article 29 Working Group that it should not - data protection significantly affects access rights (following on from Robertson v Wakefield): 'A re-use of personal data envisaged under the re-use Directive is, as opposed to the two cases mentioned above [right of access as first party, FoI] intended as input for commercial activities, thus presents an economic asset for business, which neither has the human rights not the transparency aspect.' ARTICLE 29 Data Protection Working Party. 10936/03/EN, WP 83, Opinion 7/2003 on the re-use of public sector information and the protection of personal data - Striking the balance. 12 December 2003

[45] Freedom of Information Act Awareness Guidance No. 6: Information Reasonably Accessible to the Applicant by Other Means, Information Commissioner's Office. 2006.

[46] More problematic is access to documentation which requires searching and analysis. See The Freedom of Information and Data Protection (Appropriate Limit and Fees) Regulations 2004 No. 3244

[47] Convention On Access To Information, Public Participation In Decision-Making And Access To Justice In Environmental Matters done at Aarhus, Denmark, on 25 June 1998. Available at

[48] Strasbourg 2008

[49] Leith P., "Squeezing Information out of the Information Commissioner: Mapping and measuring through online public registers", (2006) 3:4 SCRIPTed 389

[50] Which could include many reasons - for example, they use external IT services and the terms of the service contract did not include such day to day requests to be handled.

[51] Lloyd's provides an overview of this information

[52] MCA Annual Report 2009-10.

[53] Leith, P & Fellows C., Enabling Free On-line Access to UK Law Reports: The Copyright Problem. I. J. Law and Information Technology, 2010: 72~93

[54] Prior to the rise of fees for larger data users, the standard was 35 which has been divided into income generated from the 2009/10 accounts.

[55] FS503342323, 28th March 2011.

[56] Deadlines (especially short 28 day ones) are not my forte.

[57] See the Open Government Licence at

[58] Information Commissioner's website.

[59] See Leith P.,"Squeezing Information out of the Information Commissioner: Mapping and measuring through online public registers", (2006) 3:4 SCRIPTed 389 for a discussion of this.

[60] Where the ICO's own advice on implementation of DIRECTIVE 2009/136/EC appears to be less than helpful: ...we do not intend to issue prescriptive lists on how to comply. You are best placed to work out how to get information to your users, what they will understand and how they would like to show that they consent to what you intend to do.

[61] 61 Francis Maude, Minister for the Cabinet Office. At Accessed June 2011.

[62] Francis Maude, Minister for the Cabinet Office. At Accessed June 2011.

[63] See Leith P., "Squeezing Information out of the Information Commissioner: Mapping and measuring through online public registers", (2006) 3:4 SCRIPTed 389