Selenium-AI-Agentic

Agentic.mp4

Objective

Transform natural language instructions into automated browser interactions using Selenium, powered by Google Gemini’s generative language API.

Generative AI Integration
Utilizes Google Gemini API to convert user prompts into executable Selenium command scripts.
Browser Automation Layer
Employs Selenium WebDriver with ChromeDriver and applies anti-detection techniques (e.g., user-agent spoofing, webdriver masking).
Command Parsing & Execution
Supports a simple DSL with commands:
- OPEN("url") — navigate to a URL
- TYPE("selector", "text") — input text into a DOM element
- CLICK("selector") — click on an element
- WAIT(seconds) — pause execution for given seconds
Reliability & Logging
Implements robust error handling, exponential backoff retry on API calls, detailed logging to file and console.
Persistence & Traceability
Saves prompt-response pairs with timestamps in JSON format for audit trails and debugging.
Configuration Management
Loads API keys and runtime parameters from config.json, with sensible defaults for fallback.

User inputs high-level natural language instruction.
Instruction is wrapped into a prompt and sent to Google Gemini API.
Received Selenium command script is parsed and executed step-by-step in Chrome browser.
All interactions and errors are logged and persisted.

This design enables flexible, AI-driven browser automation suitable for a wide range of tasks requiring natural language control.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
Selenium-AI-Agentic.py		Selenium-AI-Agentic.py
config.json		config.json