๐๏ธ RaceAssist: Gesture-Based Racing Game Controller using MediaPipe + PyAutoGUI

Siddhant Bali, an aspiring tech entrepreneur, is an Undergraduate Research Scholar at IIIT Delhi, currently pursuing a B.Tech in Computer Science Engineering with a focus on design (CSD). Excelling in college activities and event management, Siddhant's entrepreneurial spirit propels him into innovative ventures. Connect on LinkedIn or reach out at siddhant22496@iiitd.ac.in for more info.
๐ฝ Download & Try Now
๐ฎ RaceAssist | Plug & Play Edition
๐ง Overview
Repo Link: https://github.com/kintsugi-programmer/race-ist
RaceAssist is an innovative, vision-based steering system that allows users to control racing games using hand gestures > with no physical controller or keyboard. Built using Python, MediaPipe, and PyAutoGUI, this lightweight yet robust interface turns your webcam into a fully functional racing game controller.
Design Evolution and Zone Mapping for RaceAssist: A Hand Gesture-Controlled Racing Interface Using MediaPipe + PyAutoGUI
The system supports multiple control schemes across 9 evolving versions, each exploring a unique interaction paradigm > from basic wrist tracking and gesture recognition to multi-key zones, flicker-resistant logic, and parallel processing pipelines.
Designed to run on low-spec consumer hardware using only a standard webcam, RaceAssist offers an accessible, cost-effective alternative to traditional game controllers; eliminating the need for specialized sensors, GPUs, or proprietary equipment.
Whether you're a gamer, developer, or HCI researcher, RaceAssist offers a smooth, intuitive, and extensible gesture-based experience for both simulation and real-time control.
๐ฆ What's Included?
| Component | Description |
RaceAssist.exe | โ Plug-and-play Windows executable (no setup needed) |
run.py (v1 | v9) |
requirements.txt | ๐ฆ All required Python packages (MediaPipe, OpenCV, etc.) |
RaceAssist.png | ๐ง Visual overview of gesture zones across versions |
RaceAssist.excalidraw | ๐ Editable diagram file (UI logic flow) |
README.md | ๐ This documentation |
๐น๏ธ Research and Analysis on RaceAssist: A Vision-Based, Gesture-Driven Game Control System for Real-Time Steering Interfaces Using MediaPipe and PyAutoGUI
๐ Abstract
In an era increasingly driven by touchless interaction and immersive computing, RaceAssist explores an intuitive and low-cost alternative to traditional game controllers through real-time gesture-based input systems. This research investigates the development and evolution of RaceAssist, a modular, vision-powered interface enabling users to control racing games using only their hand gestures, captured through a standard webcam and interpreted using MediaPipe for landmark detection and PyAutoGUI for simulated keystrokes.
The project presents a comparative study of nine evolving control models, ranging from basic zone-based wrist detection to advanced multi-threaded pipelines and gesture-recognition-enhanced input. Key challenges such as gesture ambiguity, input flickering, and detection latency are systematically addressed through novel solutions including turn decay logic, 2-hand brake state inference, and parallel control architecture.
The study evaluates each version across metrics of usability, responsiveness, and cognitive load, contributing insights into designing effective gesture-based HCI systems. The final version (v9) integrates a robust 3x3 control grid with intelligent input decay and flicker suppression, showing high promise for real-world applicability in both recreational gaming and experimental HCI setups.
This research aims to demonstrate how accessible hardware and open-source CV tools can be leveraged to create fluid, natural, and expressive interfaces, especially relevant for users in accessibility contexts, human-robot interaction prototypes, or low-cost simulation environments. Future work includes exploring analog gesture mapping, AI-based gesture classification, and VR/XR integrations, advancing toward adaptive, personalized gesture interaction systems.
๐ง Phase 1: Foundation & Prototyping
๐ง v1 | Zone-Based Wrist Steering
๐ง Features
Uses both wrist positions (if available)
Screen divided into:
Left / Center / Right (for steering)
Top (Nitro)
One hand = Reverse
No hand = No key pressed
Actions:
Left,Right,Straight,Reverse,Nitro,None
โ Pros
Very intuitive zone-based layout
Minimal computation (only wrist coordinates)
Both hands together = clean intent detection
โ Cons
Both hands position become more tiring
No gesture type (e.g., fist) recognition
No reverse with both hands low
No dynamic steering (discrete zones only)
๐ค v2 | Fist Recognition-Based Steering
๐ง Features
Detects fist gestures using
index_tip - index_mcpdistanceActions determined based on:
Y-coordinates of 2 fists โ Nitro, Left, Right
1 fist โ Straight
No fist โ Brake
Draws a line between 2 fists for feedback
โ Pros
Fist recognition adds gesture clarity
Prevents accidental movement (must clench fists)
Brake when hands are relaxed
โ Cons
Requires accurate fist recognition
Sensitive to small hand variations
Difficult for some users to clench both fists constantly
๐งญ v3 | Advanced Grid-Zone Wrist Control
๐ง Features
Zones:
Left & Right (for steering)
Top (Nitro)
Bottom (Reverse)
Any hand in bottom zone = Reverse
Any hand in left/right = Directional steer
Both hands up = Nitro
No hand = Brake
โ Pros
Flexible zone design
Clear logic:
bottom = reverse
top = nitro
Supports both 1 or 2 hand control
Highly responsive
โ Cons
No gesture recognition (just wrist location)
No fine control (zone only, no analog steering)
Slightly more logic-heavy than v1
โ๏ธ Phase 2: Systemization & Expansion
๐ v4 | Parallel Processing with Threads
๐ง Features
๐งต 4 threads:
capture_frames: webcam feeddetect_hands: process landmarkscontrol_action: apply zone-based logicdisplay_output: draw interface
Zones:
Top = Nitro
Bottom = Reverse
Left / Right / Center = Steering
Key logic matches v3 but split for performance
โ Pros
Fast & responsive due to parallelism
Non-blocking architecture; useful for higher FPS
Clean separation of concerns
โ Cons
Requires thread-safe resource handling (
coords_lock)Slightly higher memory & CPU usage
Complex to debug compared to single-threaded versions
๐งฑ v5 | Basic Single-Hand Zone Control
๐ง Features
One hand only
3 vertical zones:
Left = 'a'
Center = 'w'
Right = 'd'
No gesture or vertical detection
Uses wrist
xposition only
โ Pros
Extremely lightweight and minimal
Easy to understand and extend
Ideal for demos or fallback mode
โ Cons
No Nitro or Reverse
No multi-hand or multi-key combos
Not immersive for racing gameplay
๐งญ v6 | Smart Dual-Direction Control
๐ง Features
Combines horizontal and vertical wrist zones
Zones:
Top Center = Forward (
w)Bottom Center = Reverse (
s)Left =
a + w(Left+Forward)Right =
d + w(Right+Forward)
Multi-key press logic using Python
setoperations
โ Pros
Smooth forward turning:
a + w,d + wNatural vertical โ + horizontal โ division
Works well with only one hand
โ Cons
No Nitro support
Requires precise control around center split
Could confuse users with two key outputs unless well trained
๐ Phase 3: Stability & Realism
๐ง v7 | 2D Grid-Based Stable Steering
๐ง Features
Simple vertical (L/R) and horizontal (Straight/Reverse) split
4 Zones:
Left โ
aRight โ
dTop Middle โ
wBottom Middle โ
s
Only 1 key pressed at a time
โ Pros
Very stable for 1-hand use
Minimal key flickering
Easy to learn and use
โ Cons
No Nitro
No multi-key turn+forward combinations
No 2-hand brake detection
๐งฑ v8 | 3x3 Grid with Combined Controls
๐ง Features
Full 3x3 zone grid:
X: Left, Center, Right
Y: Top (Nitro), Center (Straight), Bottom (Reverse)
Allows combinations like:
Top-Left โ
shift + aCenter-Left โ
w + aBottom-Right โ
s + d
โ Pros
Powerful control: supports all combinations
More expressive zones for racing turns
Nitro supported
โ Cons
Flickering possible when zone is unclear
Slightly unstable in fast hand transitions
No 2-hand brake support
๐งญ v9 | Smart 3x3 with Turn Decay Logic
๐ง Features
Same 3x3 layout as v8
Adds:
โ 2-hand Brake Mode โ Presses
spaceโ Flicker-resistant turn decay using state machine
Only triggers new turns if cooldown passed
Prevents constant release-repress of
a/d
โ Pros
โ Stable + responsive
โ Realistic flick behavior (short burst then decays)
โ Nitro, Reverse, and precise control supported
โ Ideal for real racing simulation
โ Single-hand operation reduces strain, enabling longer and more comfortable gameplay sessions.
โ Cons
Slightly more complex logic
Hand must exit zones clearly to reset decay
๐ฏ Final Thoughts
v1: is great for fast prototyping and simplicity.
v2: adds gesture sophistication (fist control) but may struggle with detection.
v3: provides the most robust and flexible control using vertical & horizontal zones, suitable for real gameplay.
v4: Best for responsive gameplay, modular systems, or integration with game engines.
v5: Best for quick testing, educational examples, and when you want barebones logic.
v6: Best for simplified, immersive 1-hand steering games (like mobile or low-input games).
v7: Demo for kids or beginners
v8: Full steering + Nitro combo gameplay
v9: Realistic game / Decay + Brake Support
๐งช Challenges Faced & How They Were Addressed
๐งฉ 1. Single Process Consumes All Key Input
Problem: Regardless of how many keys were sent via
pyautogui, only one key was effectively recognized at a time in terminal-based or focused game windows.
Why it happens: Most terminal or native Windows processes buffer only one keystroke at a time. Also, pyautogui simulates key events sequentially in the same thread, which isnโt truly parallel.
How I addressed it:
Introduced simulated key-holding (e.g., hold
w+ tapa) instead of toggling.Added interleaved multi-press loops in
v8and cooldown decay logic inv9for more natural control.Future fix: implement parallel input injection via tools like
pynput,autopy, or native OS key injection APIs.
๐ 2. Detection Latency Even on High-End Laptops
Problem: Hand tracking and landmark processing (especially via MediaPipe) lagged even on high-performance systems.
Root Cause:
Real-time webcam + landmark model inference on CPU is expensive.
Frame drops occur due to sequential logic (acquire โ process โ act โ display).
How I addressed it:
In v4, introduced multi-threading (capture, detect, control, display) using Python
threading.Reduced
max_num_handsto 1 when possible to cut computation by ~40%.Future fix: use GPU-based inference via MediaPipe with TensorFlow GPU, or shift to OpenVINO / ONNX.
๐ 3. Varying Environment Light & Backgrounds
Problem: Different lighting conditions affected detection accuracy and stability.
Fixes Tried:
Placed detection thresholds like
min_detection_confidence=0.7,min_tracking_confidence=0.7.Added visual guidance lines (zones, wrist dots) to help user adjust hand positions.
Future fix: integrate background-agnostic tracking models or depth sensors (e.g., Intel RealSense).
๐๏ธ 4. Directional Intensity Is Too Binary
Problem: Left/right turns are quantized, meaning you're either turning or you're not->thereโs no in-between.
Effect: Sudden jumps can cause flickering, especially near zone boundaries.
Fix:
In
v9, introduced a cooldown + decay mechanism so rapid hand jitter doesnโt retrigger the same action.Future fix:
Introduce a "steering stabilizer": use smoothing techniques like moving average or Kalman filter on
wrist.x.Convert position to analog signal โ gradual turning (e.g., a=light, aa=hard left).
๐ 5. No Natural Recovery Mechanism (Left to Straight to Right)
Problem: Transitioning from left to center and then to right feels abrupt -> no counterbalancing inertia.
Solution Concept:
Implement a momentum model where zone transition logic includes "direction recovery":
If coming from 'Left', don't trigger 'Right' unless passed through 'Center' for N frames.Future idea: use virtual steering wheel state, which smoothly rotates and settles back to center over time.
๐ฎ Future Prospects & Feature Pipeline
| Feature | Description |
| ๐งช Building v10: Adaptive Decay Tuning | Enhance v9's flicker handling by implementing logarithmic decay in steering flicker strength -> reducing overreaction while preserving responsiveness. |
| ๐ง Steering Stabilizer | Apply smoothing on hand positions to prevent jitter (Kalman or EMA). |
| โจ๏ธ True Parallel Keypress Engine | Replace pyautogui with pynput, autopy, or C++ native DLL key injection for true multi-key support. |
| ๐ฎ Analog Steering Intensity | Use wrist.x value to simulate analog turning strength (light turn vs. hard turn). |
| ๐ถ๏ธ VR/XR Mode | Integrate with OpenXR or Unity to use gestures in immersive racing environments. |
| ๐งฉ Modular Configurator | Add UI to let users define their own zones and gestures (drag-and-drop grid designer). |
| ๐ Telemetry + HUD | Show real-time hand position, detected action, reaction time, and frame rate overlay. |
| ๐ค AI-based Gesture Model | Replace handcrafted rules with a model trained on gesture sequences (LSTM or Transformer). |
| ๐ฑ Mobile Camera Input | Stream camera from phone to PC via Wi-Fi (e.g., IP Webcam) for more flexible control. |
We're actively looking for collaborators from the fields of computer vision, HCI, and game development to help shape the next phase. If you have ideas, improvements, or just enthusiasm for gesture-based interaction, weโd love to build with you!
LICENSE
MIT License
Copyright (c) 2025 Siddhant Bali
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
๐จโ๐ป Developed At
Built with โค๏ธ by Kintsugi Dev Studio > combining computer vision, system design, and human-centered interaction.









