This method for computing the indicated point proves to be robust in the presence of tracker uncertainties and noise. Its accuracy depends on the geometry of the stereo cameras, and is best when they are at least an angle of 30 degrees apart. The system does not require camera calibration because all calculation takes place in the image and ground planes. By tracking at least 4 points on the plane it could be made invariant to camera movement.
The main problem for this system is tracking a pointing hand reliably in stereo. At present, this is only possible in an environment where there is a strong contrast between the hand and the background. Our current system requires the index finger and thumb to be kept rigid throughout operation. Tracking speed is limited by our hardware (a single Sun SPARCstation) and could be improved by adding faster computing or image processing equipment. Improvements to hardware performance would allow more sophisticated tracking mechanisms (such as stochastic deformable models [18]) to be incorporated, permitting more degrees of freedom for hand gesturing.
Although subjective pointing direction depends upon eye as well as hand position, it is not necessary to model this phenomenon. Instead, by providing the operator with feedback about the objective pointing direction (e.g. having a robot follow the pointing hand in real time), the hand can be aligned with any desired object on the working plane. Points can then be indicated with sufficient accuracy to guide simple pick-and-place operations that might otherwise have been specified using a teach pendant.