Hi SiMoNBB, I ran into a similar problem trying to recognize multiple finger taps as a gesture itself. I'm a C++ developer but I think the general idea should also work for you.
My workaround was to add a new gesture layer that uses a sort of buffer to maintain all the native tap gestures detected for some time window. Once the time window runs out, the buffer is processed in order to verify how many tap gestures it has, and depending on that value a X-finger tap gesture is triggered, where X is the tap gesture count for that time frame. This way, you can trigger an only tap gesture based on multiple previously recognized taps, so you can recognize for example one-finger tap, two-finger tap, etc. as they were simple gestures. Also all this can be generalized for the other native gestures (swipes, circles, screen taps, and the others).
The drawback of this approach is that it adds some latency to the tap recognition, as it is not triggered as soon as it is detected by the device but few milliseconds later (i.e.: after processing the gesture buffer). Anyway, in practice that is not a major problem because the buffer size is (should be) smaller enough compared with the device frame rate so the latency is almost imperceptible.
Hope this helps!
Cheers,
Matias