This project implements an LLM-powered, voice-controlled drone system capable of executing both discrete flight commands and continuous real-time object tracking.
It combines OpenAI's GPT function calling, Whisper speech-to-text, YOLOv8 + OpenCV vision, and multi-threaded flight control to deliver a fully conversational drone experience.
- Conversational Voice Control — Issue natural language commands like:
"take off""turn around 180 degrees""follow the man in the gray shirt"
- Agentic LLM Decision-Making — OpenAI agent decides whether to:
- Send discrete commands (e.g., move up, rotate, land)
- Activate continuous follow mode
- Computer Vision Tracking — YOLOv8 + OpenCV for robust, occlusion-resilient target following
- Multi-threaded Execution — Heartbeat thread, video streaming, and continuous movement updates
- Hardware Agnostic — Built for DJI Tello but adaptable to other drone platforms
.
├── Drone+OpenAI/
│ ├── voice_transcriber.py # Whisper-based speech-to-text
│ ├── openAPI.py # OpenAI agent, command parsing, and action routing
│ ├── drone_controller.py # Direct flight control logic for discrete commands
│ └── skytrack.py # YOLOv8 + OpenCV continuous tracking mode
│
├── Yolo Model/
│ ├── requirements.txt # Python dependencies
│ └── yolov8n.pt # YOLO model weights
│
└── README.md
git clone https://github.com/elijahtab/SkyPilot.gitWindows:
python -m venv myenv
.\myenv\Scripts\activatemacOS/Linux:
python3 -m venv myenv
source myenv/bin/activatepip install -r requirements.txtEnsure your DJI Tello (or compatible drone) is powered on and connected to your machine via WiFi. You would need your main chip connected to the Tello Wifi and a hotspot or secondary wifi card connected to an actual wifi source.
python openapi.pyThis starts the pipeline with a Whisper-based transcriber and routes recognized commands into the LLM pipeline.
Examples:
"take off""move up 100 cm""rotate 90 degrees""follow the person in the red shirt"
Say "land" to initiate landing.
-
Voice Input
voice_transcriber.pyuses Whisper to transcribe your speech into text. -
LLM Command Parsing
openAPI.pysends the text to an OpenAI agent, which decides whether the command is:- A discrete action (sent to
drone_controller.py) - A continuous tracking mode (sent to
skytrack.py)
- A discrete action (sent to
-
Drone Execution
- Discrete commands: altitude changes, rotations, land/takeoff
- Continuous tracking: YOLOv8 + OpenCV locks onto the target and maintains smooth pursuit
- Test Detection on Sample Image
python test_yolo.py
- Multi-threading — Required for stable flight control, heartbeat signals, and smooth tracking
- Smooth Follow — Fine-tuning speed, acceleration, and update intervals to avoid overshooting
- Vision Model Choice — Switching between fast OpenCV trackers and robust YOLOv8
- Voice Isolation — Minimizing false triggers in noisy environments
- Integrate GPS and onboard sensors for outdoor and large-area tracking
- Expand follow mode to multiple object classes
- Enhance voice pipeline with custom hotword detection
