An Eye Tracking System for High Performance Augmented Reality Applications

Kemal Doğuş TÜRKAY -MS Student – Medical Informatics METU –

Kerem CALIŞKAN -PhD Student – Medical Informatics METU


A Human Machine Interface (HMI) system that is capable of tracking eyes and putting augmented images/scenes following the gaze in the scene is implemented. The system works with an infrared camera connected to a frame grabber. The streaming data captured from the camera is then processed and gaze coordinates are found. This system will be fundamentally an augmented reality (AR) system and provide us the combination of real-world and computer-generated data produced via eye tracking. The computer generated objects will be placed on real world data by considering the point of gaze. Basis of this application especially tracking part can be used for many projects leading to new scenarios and innovations in the era. So the implementation will be in a modular style for further compliance.


In recent years eye tracking has become very popular owing to usability testing and commercial applications. A wide variety of disciplines use eye tracking techniques, including cognitive science, psychology (notably psycholinguistics, the visual world paradigm), human-computer interaction (HCI), marketing research and medical research (neurological diagnosis). Specific applications include the tracking eye movement in language reading, music reading, the perception of advertising.[1] By the help of new processor and computing technologies multi-core programming has become a reality and many core applications use the advantage of multi processors for increasing their performance. We worked on an Intel© CPU with two cores to exploit the multi core programming structure. Intel TBB [2] ( Threading Building Blocks – An open source multi threading library ) was used for multi threading the application. A Matrox Morphis frame grabber was used for capturing the camera images and the image buffer was displayed using MS DirectX libraries. Owing to the developments and tools mentioned we can easily combine Eye Tracking and Augmented Reality applications. While one processor grabs and processes the images for face & eye tracking, the second processor handles the AR part. This gives us the ability to render more realistic scenes for AR part.


Eye tracking is the process of measuring either the point of gaze or the motion of an eye relative to the head and an eye tracker is a device or software for measuring eye positions and eye movements. Video-based eye trackers are most widely used designs. A camera focuses on one or both eyes and records their movements as the viewer looks at some kind of stimulus. Most modern eye trackers use contrast to locate the center of the pupil and use infrared or near-infrared non- collimated light to create a corneal reflection. The vector between these two features can be used to compute gaze intersection with a surface after a calibration for an individual.

Two general types of eye tracking techniques are used: Bright Pupil and Dark Pupil. The difference between them is based on the location of illumination source with respect to the optics. If the illumination is coaxial with the optical path, then the eye acts as a retro reflector as the light reflects off the retina creating a bright pupil effect similar to red eye. If the illumination source is offset from the optical path, then the pupil appears dark.

Bright Pupil tracking creates greater iris/pupil contrast allowing for more robust eye tracking with all iris pigmentation and greatly reduces interference caused by eyelashes and other obscuring features. It also allows for tracking in lightning conditions ranging from total darkness to very bright. But pupil techniques are not effective for tracking outdoors as extraneous IR sources interfere with monitoring.


Augmented reality is a field of computer research which deals with the combination of real-world and computer generated data. Today’s most AR applications are concerned with use of live video streams that are processed and augmented by the addition of computer generated data.


Before eye tracking technology entered AR Systems, gaze direction was estimated from head pose. There are some applications which uses gaze tracking to disambiguate multimodal 3D interaction in immersive VR and AR environments.

Figure 1. Student-teacher collaborative object selection scenarios: (a) Selection of a local object, (b) Selection of a remote object


As a result of rapid improvement of computing technology there are processors which have multi cores on the market at reasonable prices. These CPUs are nearly ubiquitous now. Software developers can program for concurrency in their application to take full advantage of multi-core processors. And there is Intel® Threading Building Blocks (TBB) to help us to achieve this goal.

Intel® TBB is a popular software C++ template library that simplifies the development of software applications running in parallel. (This is the key to any multi-core computer). Intel® TBB extends C++ for parallelism in an easy to use and efficient manner. It is a C++ template library that adds parallel programming for C++ programmers and uses generic programming to be efficient.

Intel® TBB includes algorithms, highly concurrent containers, locks and atomic operations, a task scheduler and a scalable memory allocator. These components in TBB can be used individually or all together to ease C++ development for multi-core. It provides an abstraction for parallelism that avoids the low level programming inherent in the direct use of threading packages such as p-threads or Windows threads. It has programmers express tasks instead of threads.


Intel’s Integrated Performances Primitives (IPP) is a library of multi-core ready, optimized software functions for multimedia and data processing applications. The library takes advantage of processor advances including MMX, SSE, SSE2, SSE3, SSE4 and multi-core processors. IPP functions include:

  • Video Decode/Encode
  • Audio Decode/Encode
  • JPEG/JPEG2000
  • Computer Vision
  • Cryptography
  • Data Compression
  • Image Color Conversion
  • Image Processing
  • Ray Tracing/Rendering
  • Signal Processing
  • Speech Coding
  • Speech Recognition
  • String Processing
  • Vector/Matrix Mathematics


In our application we have the following steps:

  • Image acquisition.
  • Face tracking.
  • Eye tracking.
  • Displaying.

7.1. Image acquisition

In order to acquire image sequences from IR camera we used Matrox Morphis frame grabber. After the images acquired from frame grabber, they are transferred to image buffer which are kept on Intel UIC architecture.

7.2. Face Tracking

As our eye tracking algorithm “Starburst” running efficiently on closely acquired face images, we need to track face in order to improve the algorithm efficiency and to make lower the computational load. We used haar-like features to detect the face.

Figure 2. Face Tracking

7.3. Eye Tracking

After getting the face, we run “Starburst” algorithm on it. The “Starburst” algorithm is shown below.

1 ) Input: Eye image, Face image.

2 ) Output: Point of gaze.

3 ) Procedure:

4 ) Detect the corneal reflection.

5 ) Localize the corneal reflection.

6 ) Remove the corneal reflection.

7 ) Iterative detection of candidate feature points.

8 ) Apply RANSAC to find feature point consensus set.

9 ) Determine best-fitting ellipse using consensus set.

10 ) Model-based optimization of ellipse parameters

11 ) Apply calibrations to estimate point of gaze

(a) (b)

Figure 3. (a) Original image, (b) The image with corneal reflection removed after noise reduction.

Figure 4.

7.5. Displaying

At this point we have the point of gaze, our ultimate intention in this project. We used Microsoft’s DirectX graphics libraries to placing the computer generated data which can be either 2D or 3D and displaying the final image.

All the steps we have in our approach are running in simultaneously fashion on a dual-core Intel processor, because the application needs to be real-time in order to provide full immersion for the user of the system.


In our work we proposed an approach, which can track eyes fairly enough. Surely it is possible to improve that. As a starting point we implemented an architecture which is independent of frame grabber that is used. Other frame grabbers can be used with a few code lines addition. With implementing a face tracking algorithm, the input image of eye tracker routines was kept small in size. Thus the algorithm runs faster with the help of Intel’s multi-core processor and multi-code enabled threading libraries. In order to take advantage of easily displaying 3D computer generated data Microsoft’s DirectX graphics libraries were chosen.

As a result of the project we achieved a smooth interaction method between computer and user. Unfortunately these work was not a fully optimized and accurate solution for our aim, but it can be improved easily with spending additional time on it and showed us that it is possible to use efficiently in commercial and research areas.




[3]Istvan Barakonyi, Helmut Prendinger, Dieter Schmalstieg, Mitsuri Ishizuka: Cascading Hand and Eye Movement for Augmented Reality Videoconferencing.

[4]Alex Poole, Linden J. Ball: Eye Tracking in Human-Computer Interaction and Usability Research: Current Status and Future Prospects.

[5]Carlos H. Morimoto, Myron Flickner: Real-Time Multiple Face Detection Using Active Illumination.

[6]Zhiwei Zhu, Qiang Ji: Eye and gaze tracking for interactive graphic display.

  • InfoDif Yazılım
  • İletişim

  • “An Eye Tracking System for High Performance Augmented Reality Applications” için 2 cevap

    1. Hello,

      I have to implement also the starbust algorithm but not with a camera head mounted.
      I read that you have used the algorithm after face detection.
      Did you create a version for Windows?
      Does your code is open source ?

      Thank you

    Bir Cevap Yazın

    E-posta hesabınız yayımlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir