Common to many of these systems is the requirement to calibrate the templates or hand model to suit each individual user. They also tend to have high computational requirements, taking several seconds per frame on a conventional workstation, or expensive multiprocessor hardware for real time implementation.
Our approach differs from these general systems in an important respect: we wish only to recover the line along which the hand is pointing, to be able to specify points on a ground plane. This considerably reduces the degrees of freedom which we need to track. Furthermore, because the hand must be free to move about as it points to distant objects, it will occupy only a relatively small fraction of the pixel area in each image, reducing the number of features that can be distinguished.
In this case it is not unreasonable to insist that the user adopt a rigid gesture. For simplicity, the familiar `pistol' pointing gesture was chosen. The pointing direction can now be recovered from the image of the index finger, although the thumb is also prominent and can be usefully tracked. The rest of the hand, which has a complicated and rather variable shape, is ignored. This does away with the need to calibrate the system to each user's hand.
The tracker's motion is restricted to 2D affine transformations in the image plane, which ensures that it keeps its shape whilst tracking the fingers in a variety of poses [15]. This approach is suitable for tracking planar objects under weak perspective; however it also works well with fingers, which are approximately cylindrical.
The positions of these sampling points are expressed in affine coordinates, and their image positions depend on the tracker's local origin and two basis vectors. These are described by six parameters, which change over time as the hand is tracked.
Figure 3: The finger-tracking active contour, (a) in its canonical frame (b) after an affine transformation in the image plane (to track a rigid motion of the hand in 3D). It is the index finger which defines the direction of pointing; the thumb is observed to facilitate the tracking of longitudinal translations which would otherwise be difficult to detect.
The offsets are used to estimate the affine transformation (translation, rotation, scale and shear) of the active contour model, which minimises the errors in a least-squares sense. A first order temporal filter is used to predict the future position of the contour, to improve its real-time tracking performance. The filter is biased to favour rigid motions in the image, and limits the rate at which the tracker can change scale - these constraints represent prior knowledge of how the hand's image is likely to change, and increase the reliability with which it can be tracked.
To extract the hand's direction of pointing, we estimate the orientation of the index finger; the base of the thumb is tracked merely to resolve an aperture problem [17] induced by the finger's long thin shape.