Unfortunately, I think that is just because the Leap is fitting fairly sparse observed data to a more detailed skeletal model. Perhaps referencing things to the hand basis will make things easier? For example, here's an except from a Java Processing script that changes the joint positions into the hand frame of reference and then draws an ellipse at each finger joint (everything but the ellipse drawing is part of the Leap API):
Frame frame = controller.frame();
Hand hand = frame.hands().get(0);
currentHand = hand;
Matrix handTransform = hand.basis();
handTransform.setOrigin(hand.palmPosition());
handTransform = handTransform.rigidInverse();
for(Finger finger : hand.fingers()){
for(Bone.Type boneType : Bone.Type.values()) {
Bone bone = finger.bone(boneType);
Vector transformed = handTransform.transformPoint(bone.prevJoint());
Vector normalized = normalize(transformed);
ellipse(normalized.getX(), normalized.getZ(),5,5);
if(boneType == Bone.Type.TYPE_DISTAL){ //get tip of distal phalanx
Vector transformedN = handTransform.transformPoint(bone.nextJoint());
Vector normalizedN = normalize(transformedN);
ellipse(normalizedN.getX(), normalizedN.getZ(),5,5);
}
}
}
This isolates the movement of fingers from the hand. You still get some movement from other fingers, but I think should be easier to find the minimum motion you can reliably detect as intentional.
On another level, though, even if the Leap detection was flawless, will your user's be flawless in their performance of your gestures? Even based on variability in how still people are willing to hold their hand when they tap, you will still have a "smallest" reliable tap distance that I suspect will be larger than the unwanted motion shown in your video.