Pospisil, Skrob: Actual trends in improvement of risk area security using combined methods for biometrical subject identification

Actual trends in improvement of risk area security using combined methods for biometrical subject identification.

Radek Pospisil [1], Milan Skrob [2]

Cite as: Pospisil R. & Skrob M., "Actual trends in improvement of risk area security using combined methods for biometrical subject identification", European Journal of Law and Technology, Vol. 4, No. 2, 2013.


The article deals with the description of components of the project Improvement of risk area security using combined methods for biometrical identification of subjects , which solves the identification of subjects through multi-factor biometric identification. The article gives a project basic idea, description of methodology for solutions of biometric identification of subjects, acquired knowledge and formal possibilities of the proposed algorithms.

Since these are modern methods of solving the issue, it is possible through the article to get an idea about current mechanisms that deal with biometric identification of subjects in the video and audio recordings. Nowadays it is a very sensitive issue which fits not only technical, but also with closely related areas of law and protection of privacy and their personality.

1. Introduction

Currently used biometric identification methods in video and audio sources show a number of limitations that prohibit them from being accepted as sufficiently reliable for use in security systems. The Improvement of risk area security using combined methods for biometrical subject identification project was founded with the purpose to improve attainment of subject information in video and audio data.

The key idea of the project being described is to use a combination of detection and identification methods that will enable the higher efficiency of subject detection and identification.

As priority areas in identification, we have chosen identification based on face recognition [5], identification of speech utterances, the detection of the human body [4], emotional analysis from the person's behaviour and speech signal and detection of common sounds. Each mentioned area is processed by a specialised detector whose output is transferred to a uniform database system where evaluating, pairing and subsequent data-mining is done. The structure is shown in the diagram in Figure 1.

The primary development of this project is based in the Czech Republic and must, therefore, correspond to Czech legislation. The Office for Personal Data Protection solves recording of biometric identifiers, storage and handling in accordance with Act No. 101/2000 Coll.

According to Paragraph 5, Item 1, Letter e) of Act No. 101/2000 Coll., on the Protection of Personal Data and on Amendment to Some Acts, as amended, personal data only has to be preserved for the period of time that is necessary for the purpose of their processing. After expiry of this period, personal data may only be preserved for state statistical service purposes, and for scientific and archival purposes. When using personal data for these purposes, it is necessary to respect the right to protection of the private and personal life of the data subject from unauthorised interference and to make personal data anonymous as soon as possible.

Within the database, video and audio data is converted into templates to meet the aforementioned requirements of Paragraph 5, Item 1, Letter e) of Act No. 101/2000 Coll., on the Protection of Personal Data and on Amendment to Some Acts.

By comparing these templates, it is possible to determine whether or not a person corresponds to the previously detected template and, thus, to one of the descriptions listed in the database. Templates are not reproducible and it is not possible to recreate their original image or sound data. In principle, it is a mechanism analogous to hash codes [15] and it is not necessary to save video or audio samples to the database itself because the algorithms can work with online data stored in memory (after processing a particular image it can be permanently discarded).

Figure 1

Figure 1: Schematic principle of combined methods for biometric subject identification

Detection algorithms process the input audio and video sources and any relevant extracted data is saved to a common database, where the subsequent pairing of relevant data through a comparison layer is done. Data is stored in an abstract form, possibly in a form usable for the specific identification of the subject. There is no need to store the actual biometric data such as photo, video, sound recording and others since these are not required for subsequent processing and data-mining.

The project idea is based on data abstraction. This means that the detection algorithms analyse the selected image and sound scene and store the information which they detect. This information may include: the occurrence of a person, hair colour, shirt colour, pants colour, voice emotion, environmental sounds, human height, head trajectory, body trajectory. Individual data is not semantically valuable; however, it is possible to obtain extensive information through the application of data-mining to help detect illegal behaviour or search for lost persons.

1.1 Case study - Lost child

For example, when a child is lost that has been seen in the area covered by the biometric identification system, the operator can simply request through the user application to find a girl with a height of approximately 110 cm who was wearing a blue shirt, red skirt and was last seen playing in the playground on 09/08/2012. The data is processed by data-mining layer and returns any information found on the likely occurrence of the girl after 09/08/2012.

So the system can greatly facilitate the work of security forces that would otherwise have to go through dozens and dozens of hours of recordings. In the case of combined use of biometric identification technology, a considerable number of records are eliminated due to their fractional part.

1.2 Case study - Dangerous area

The city is a place that can be a threat to young children (as with such places as a remote lake or a place with dangerous animals). The security service marks the surrounding area as potentially dangerous for children and transmits this information to the user application detection algorithms. In the event that a child is close to that danger point, the situation will be detected by the combined system of biometric identification and will alert the security person, who subsequently prevents the potential threat of injury to the child.

1.3 Case study - Subject position

Thanks to the combined biometric subject identification, we will be able to detect the position of the subject, such as whether it is in a vertical or horizontal position. In the situation that an elderly subject will go down the street and while walking suddenly changes from a vertical to horizontal position, the system will detect this condition and mark it as potentially dangerous. The competent security service will then be notified that there is someone lying on the street with a likely health problem.

1.4 Case study - Sound

In the sound analysis we are working among others on the detection of undesirable sounds and audio abnormalities. These abnormalities include gunshots, car crashes, shattering glass or the spraying of walls on houses and monuments. If an identifiable abnormality in the monitored area occurs, the information is transmitted to the voice data descriptor, which will forward it for subsequent processing by the database system. It is thus possible to react very quickly to an undesirable situation such as a collision between two cars.

2. Detection Mechanisms

The implemented methods are able to monitor what is happening in the scene and provide an overall systematic review. For the operator, working with hundreds of CCTV cameras, it is of course necessary that the system generates only a small percentage of false alarms. The combined biometric identification system thus engages selection and verification algorithms that will not saturate the overall solution with a higher error rate than it is absolutely necessary.

2.1 Face detection

With subject detection and identification from video sources (CCTV systems, video, still images), we face a number of particular problems and limitations, namely:

  • image distortion
  • low image resolution
  • camera view angle
  • camera position

For face detection it is appropriate that the camera is placed at the eye level of a subject; the face is thereby not rotated in relation to the axis of the camera and the image has sufficient detail.

Table 1 shows three types of shots that must be processed through an algorithm for face detection. Views are ordered from the easiest identifiable scene to the most difficult scene.

Table 1

Table 1: Success rate of the face detector depending on camera location

Face detection is based on the Viola-Jones object detector, working with gray-tone images [5]. The advantages of the solution are speed, immunity to lighting variations and adequate reliability. This detector was extended by a new diagonal type marker and image luminance histogram equalisation. This method can achieve high accuracy and sensitivity.

It is then possible through determining image significant points to start face tracking and transmit the data results through a face descriptor into the database. Face tracking is principally implemented according to the block diagram shown in Figure 2.

Face tracking algorithm

Figure 2: Face tracking algorithm

The face tracking algorithm is functional even if the face angle is too big, where the detection possibilities of the Viola-Jones object detector are limited (Table 1). In this case, it is possible to predict the position by subject tracing and keeping the focus on his facial area.

2.2 Body detection

For body detection the HOG algorithm (Histogram of Oriented Gradient) can be used, which is designed to detect objects in static images and video sequences. The advantage of the algorithm is its invariance to geometric and photometric transformations [4]. The limitation of this algorithm is high computational complexity, which takes around 268 ms for the detection of a single image frame (laboratory certified by the project). The HOG algorithm was therefore supplemented by background modelling using multiple Gaussian distribution, thus making it possible to achieve a ten-fold increase in detection efficiency [4].

Figure 3

Figure 3: Background modelling for the detection of a human body

Figure 3 shows the output of the background modelling process, respectively the output binary image [4]. This image was subjected to a subsequent cluster analysis using the method of K-Mean clustering, which enables attainment of the aforementioned decrease in computational cost.

2.3 Voice and sound detection

Sound analysis can be divided into voice and sound [3], [5] and [6]. The input audio signal is pre-processed and divided into segments of approximately 32 ms length with an overlay of 50%. By way of this process an approximate stationary signal required for subsequent calculations can be obtained.

To recognise subjects depending on the speech, we work with and verify the different speech markers [6], [9], respectively:

  • MFCC (Mel-Frequency Celistral Coefficients)
  • LFCC (Linear Frequency Celistral Coefficients)
  • liLli (liercelitual Linear lirediction coefficients)
  • LliCC (Linear lirediction Celistral Coefficients)
  • ACW (Adalitive Comlionent Weighted coefficients)
  • TDCT (Temlioral Discrete Cosine Transform)

3. Base Verification

Since the available face detection algorithms [2] [7] [12] of the human body [4] [8] [13] [14] and voice [6] [9] have a variable detection success rate depending on the placement of cameras and microphones, a test scene was prepared which provides input data to verify the detection mechanisms and biometric subject identification. Thus we created (in controlled conditions) a mechanism for testing algorithms and their subsequent optimisation. The laboratory environment is schematically shown in Figure 4.

Figure 4

Figure 4: The scene for algorithm testing

Camera points (blue points in Figure 4) and microphone points (green points in Figure 4) can be placed at different heights (from 0.5 to 2.5 m) and at different angles to simulate the distortion of the image and scene. The test scene allowed us to create a knowledge-based database on the characteristic movements of different subjects and made it possible to verify and modify the algorithms of biometric subject identification for the purpose of optimising the algorithms and functions to increase their efficiency. Our knowledge database currently contains data on the movement of dozens of subjects of different age and gender. Each person participated in the project knowing that the recordings are for scientific purposes, and each person was recorded four times (after several weeks or months between records).

Subsequent manual tagging of captured video and audio recordings was performed. The purpose of tagging is to provide a user-defined description of the scenes that can be taken as a reference value in the verifying process of the functionality of the algorithms and the overall model of combined biometric subject identification. The principle description of the scene (tagging subjects) is shown in Figure 5.

Figure 5

Figure 5: Tagging reference records

Data processing and recording of the project participants in the Czech Republic is subject to Act No. 101/2000 Coll., and in accordance with Section 9 a) of the Act it is necessary that each of the participants (subjects) gave their express consent to the processing of sensitive data. Express consent can be provided electronically. The procedure of granting consent over the Internet must be designed so that the data subject has the option to express their will in a way that would minimise error. It is therefore considered unacceptable to give consent by a single button click. Consent has to be divided into at least two separate acts performed by the activity data subject, in terms of the Internet called the "double click principle". Our project addresses the issue of consent to the processing of personal data by form, in that each participant signs and agrees to the processing of their personal data.

Processing time is not further specified by Act No. 101/2000 Coll. However, it must be objectively, organisationally and technically justified. In our case, the date of completion of the processing is the date of the termination of project research and development.

4. Common Database System

All information is stored in an MS SQL database and the whole system is tested on MS SQL 2008 R2.

In the stress tests on designed database structure, we tested the system throughput against stored templates and data processed by detection algorithms (Figure 1). Initially, we considered two possibilities:

  • Store main templates to a varchar data type

  • Store main templates to a varbinary data type

Through the stress tests (Table 2), we concluded that it is appropriate to process the data in the form in which it is used by the detection algorithms, namely in varchar type data form. The sufficient bandwidth of the designed and used infrastructure was further verified. We are able to store the output of descriptors of one newly detected record at a rate of approximately 1 to 2 ms. These values ​​are sufficient for general use and the system can be deployed under full operation.

Table 2: Database system bandwidth test

Local SQL server

Remote SQL server

30,000 rows / 1 row

30,000 rows / 1 row



command type SELECT



command type INSERT



command type DELETE



4.1 Data structure mechanism

The data structure mechanism in a common biometric database is created in such a way that it can perform tasks important for combined biometric identification system use.

Through long-term testing, verification, modifying database structure and descriptors, we concluded that for the successful creation of a combined biometric identification system it is necessary to have a mechanism which allows us to perform the following tasks:

  • a) assign and delete templates[3] that represent the biometric subject identity
  • b) merging subjects into groups representing the one particular person
  • c) assign, modify, and delete an identity[4] belonging to a particular subject

The above listed procedures were identified as critical to maintaining data consistency in a database. Since computational and detection algorithms are not unmistakable, it is necessary to be able to make corrections to the stored data and their interconnections. The structure of the described approach for storing and connecting biometric data and their dependencies is schematically shown in Figure 6.

Figure 6

Figure 6: Data structure

Sub data are complementary data to the master biometric data which is stored as a main data database record. While the main data stores data such as information on the occurrence of a subject at a particular time and at a certain point, any number of linked items of sub- ​​data contains information such as trajectory, body height, eye colour. The subject is identified on the basis of a specific template (belonging to them) and therefore the entire concrete multi-factor biometric record is linked directly to a specific template, which is database-dependent. The described system is specified in the above text as an act a) and makes it possible to easily make corrections of incorrectly assigned templates. In the case of a change of template ownership, all related multi-factor biometric records are changed automatically.

Another very crucial point is binding subject - multi subject (act b)). This binding is needed because of possible false positive events in the image, when detection algorithms consider the found person as a new unknown subject and create a new data structure in the database. Consequently, it is necessary to have a tool that allows us to link more subjects into one (there may be the case that more of these subjects is in fact one and the same person).

The last step is a critical task marked by act c), where linkage of detected subjects with a specific identity (a description of the person) is performed. The system user is responsible for adding a description of the person (e.g. assigns the subject an internal number, name, description, contact address, various notes). Identity itself has no significant effect on the multi-factor biometric identification. It is especially a guide for users of the system.

Maintaining the correctness of acts a), b), c) is the primary task of detection algorithms and descriptors. The editing of stored data by a user is also a possibility. For the purpose of algorithm control activity and the possibility of user editing the DBMan application was created, which is used to manage our database with biometrically identified subjects (URL http://www.jimi.cz/dbman/).

Figure 7

Figure 7: Database manager for access to a multi-factor biometric database

Figure 7 shows an example of the DBMan application interface. Specifically, comparison of the theoretical and detected movement trajectory of the subject, together with rectangular areas where the face was detected. Thanks to our application, now we can obtain very important insights into the correctness of the proposed algorithms and we can make adjustments of mechanisms to increase the efficiency of the system.

5. Conclusion

We are currently performing optimisation of algorithms and of the binding descriptor-database. Within the project, we reviewed the possibilities of biometric subject identification and the advantages of its combined use with the subsequent use of data-mining. We are therefore able to handle all of the areas listed in Chapter 1 and then work with recorded data that describes events in the video and audio scene.

We believe that the final system will be able to successfully process and resolve very difficult problems, such as the quick scan of a large number of input data sources, and thus will be very helpful in dealing with the situations described in Chapters 1.1 and 1.2.


[4] SMIRG, O., SMEKAL, Z., MICA, I., CIKA, P. Methods for recognition and separation of the human body. Elektrorevue [online]. 2010, vol. 4, p. 6. ISSN 1213-1539. Available from URL: http://www.elektrorevue.cz/en/articles/analogue-technics/0/a-methods-for-recognition-and-separation-human-body-1/

[5] PRINOSIL, J., MICA, I. Efektivní detekce významných bodů částí obličeje. Elektrorevue [online]. 2010, vol. 57, p. 5. ISSN 1213-1539. Available from URL: http://www.elektrorevue.cz/cz/clanky/zpracovani-signalu/15/efektivni-detekce-vyznamnych-bodu-casti-obliceje/

[6] MEKYSKA, J., FAUNDEZ-ZANUY, M., SMEKAL, Z., FÀBREGAS, J. Fast and Efficient Approaches in Text-dependent Speaker Recognition Systems. Elektrorevue [online]. 2011, vol. 1, p. 6. ISSN 1213-1539. Available from URL: http://www.elektrorevue.cz/en/articles/analogue-technics/0/fast-and-efficient-approaches-in-text-dependent-speaker-recognition-systems/

[7] SMIRG, O., MIKULKA, J., FAUNDEZ-ZANUY, M., GRASSI, M., MEKYSKA, J. Gender recognition using PCA and DCT of Face Images. Advances in Computational Intelligence [online]. 2011. DOI: 10.1007/978-3-642-21498-1_28. Available from URL: http://www.utee.feec.vutbr.cz/files/mikulka/publikace/2011_Gender_recognition_using_pca_and_dct_of_face_images.pdf

[8] SMIRG, O., SMEKAL, Z., FAUNDEZ-ZANUY, M. Measurement of joint position for human recognition by locomotion. Elektrorevue [online]. 2011, vol. 2, p. 4. ISSN 1213-1539. Available from URL: http://elektrorevue.cz/en/articles/analogue-technics/0/measurement-of-joint-position-for-human-recognition-by-locomotion/

[9] MICA, I., ATASSI, H., PRINOSIL, J., NOVAK, P. Voice activity detection under the highly fluctuant recording conditions of call centres. Advances in Communications, Computers, Systems, Circuits and Devices. pp. 334-336. ISBN: 978-960-474-250-9.

[10] FAUNDEZ-ZANUY, M., MEKYSKA, J., ESPINOSA-DURO, V. On the focusing of thermal images. Pattern Recognition Letters. 2011, vol. 32, issue 11, pp 1548-1557. Available from URL: http://www.sciencedirect.com/science/article/pii/S0167865511001358

[11] MEKYSKA, J., FAUNDEZ-ZANUY, M., SMEKAL, Z., FABREGAS, J. Score Fusion in Text-Dependent Speaker Recognition Systems. Lecture Notes in Computer Science. 2011, vol. 6800, pp 120-132. Available from URL: http://www.springerlink.com/content/m5046265837416l2/?MUD=MP

[12] PRINOSIL, J. Blind Face Indexing in Video. Proceedings of the 34th International Conference on Telecommunications and Signal Processing TSP 2011. Budapest, Hungary, 2011, vol. 4. ISBN 978-1-4577-1411-5.

[13] MEKYSKA, J., FONT-ARAGONES, X., FAUNDEZ-ZANUY, M., HERNANDEZ-MINGORANCE, R., MORALES, A., FERRER-BALLESTER, M.A. Thermal hand image segmentation for biometric recognition. Proceedings of 45th annual IEEE International Carnahan Conference on Security Technology. Mataró, Barcelona, 2011, vol. 26-30. ISBN 978-1-4577-0902-9.

[14] Software Optimisation of the Stereoscopic Acquisition Chain. Proceedings of the 2011 6th International Conference on Teleinformatics. Dolní Morava, 2011, vol. 5. ISBN 978-80-214-4231-3.

[15] VAN ROMPAY, B., Analysis and Design of Cryptographic Hash Functions, Mac Algorithms and Block Ciphers. Ph.D. Thesis, June 2004.

[16] PYSZKO, P. Security of MSSQL. The Faculty of Electrical Engineering and Communication Brno University of Technology, 2012. 48 pages. Bachelor Thesis.

[1] JIMI CZ, a.s., Praha, Czech Republic

[2] JIMI CZ, a.s., Praha, Czech Republic

[3] Template is not a backward reproducible biometric subject marker (face, body or voice)

[4] Identity is specific information about the subject (especially name and surname)