Skip to content

Master-Frank/midscene-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Midscene Java zread

License: MIT Java Version Maven Central

AI-powered automation framework for Web and Android with natural language-driven UI operations - Java version

🌐 Language Version

🌟 Project Overview

Midscene Java is a revolutionary AI-powered automation framework designed for UI automation operations on Web and Android platforms. It is the Java implementation of Midscene Python, inheriting its core philosophy: making automation as simple as speaking.

🎯 Core Features

  • Natural Language Operations - Describe operation intentions in everyday language, and AI will automatically understand and execute them
  • Intelligent Element Locating - Multi-strategy fusion, automatically selects the optimal positioning method, adapts to page changes
  • Structured Data Extraction - Use natural language to extract complex structured data
  • Intelligent Assertion Verification - Describe verification conditions in natural language, AI automatically judges
  • Multi-Platform Support - Unified interface supports Web and Android platforms
  • Visual Debugging - Detailed execution screenshots and decision process recording
  • Code Optimization and Refactoring - Systematically refactored for more modular and maintainable code

πŸ—οΈ Project Structure

midscene-java/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ core/               # Core module, providing Agent and AI engine
β”‚   β”œβ”€β”€ web/                # Web automation module
β”‚   β”‚   β”œβ”€β”€ playwright/     # Playwright implementation
β”‚   β”‚   └── selenium/       # Selenium implementation
β”‚   β”œβ”€β”€ android/            # Android automation module
β”‚   β”œβ”€β”€ cli/                # Command line tool
β”‚   β”œβ”€β”€ examples/           # Example code
β”‚   β”œβ”€β”€ playground/         # Development testing environment
β”‚   └── tests/              # Test cases
β”œβ”€β”€ apps/                   # Application examples
β”œβ”€β”€ docs/                   # Project documentation and optimization plans
└── wiki/                   # Project wiki documentation

πŸš€ Quick Start

Prerequisites

  • Java 17+
  • Maven 3.6+ or Gradle 7.0+
  • Browser (Chrome/Firefox/Edge, for Web automation)
  • AI model API Key (Choose one from OpenAI, Claude, Qwen, or Gemini)

Installation

Add Midscene Java dependencies to your pom.xml file:

<dependencies>
    <!-- Core module -->
    <dependency>
        <groupId>com.midscene</groupId>
        <artifactId>midscene-core</artifactId>
        <version>0.1.1</version>
    </dependency>
    
    <!-- Web automation modules (choose as needed) -->
    <dependency>
        <groupId>com.midscene</groupId>
        <artifactId>midscene-web-playwright</artifactId>
        <version>0.1.1</version>
    </dependency>
    <dependency>
        <groupId>com.midscene</groupId>
        <artifactId>midscene-web-selenium</artifactId>
        <version>0.1.1</version>
    </dependency>
    
    <!-- Android automation module (choose as needed) -->
    <dependency>
        <groupId>com.midscene</groupId>
        <artifactId>midscene-android</artifactId>
        <version>0.1.1</version>
    </dependency>
</dependencies>

Configure AI Model

Create an application.properties or application.yml file to configure the AI model:

# application.properties
midscene.ai.provider=openai
midscene.ai.model=gpt-4-vision-preview
midscene.ai.api-key=your_openai_api_key_here

Example Code

Web Automation Example

package com.example;

import com.midscene.core.Agent;
import com.midscene.web.playwright.PlaywrightPage;
import com.midscene.web.playwright.PlaywrightUIContextProvider;
import com.microsoft.playwright.Playwright;
import com.microsoft.playwright.Browser;
import com.microsoft.playwright.Page;

public class SearchExample {
    public static void main(String[] args) {
        try (Playwright playwright = Playwright.create()) {
            // Create browser instance
            Browser browser = playwright.chromium().launch();
            Page page = browser.newPage();
            
            // Create PlaywrightPage wrapper
            PlaywrightPage playwrightPage = new PlaywrightPage(page);
            
            // Create Agent
            Agent agent = new Agent(new PlaywrightUIContextProvider(playwrightPage));
            
            // Navigate to website
            page.navigate("https://www.baidu.com");
            
            // Use natural language for search
            agent.aiAction("Type 'Java tutorial' in the search box");
            agent.aiAction("Click the search button");
            
            // Verify search results
            agent.aiAssert("The page displays search results for Java tutorials");
            
            System.out.println("βœ… Search operation completed!");
            
            // Close browser
            browser.close();
        }
    }
}

Data Extraction Example

package com.example;

import com.midscene.core.Agent;
import com.midscene.web.playwright.PlaywrightPage;
import com.midscene.web.playwright.PlaywrightUIContextProvider;
import com.microsoft.playwright.Playwright;
import com.microsoft.playwright.Browser;
import com.microsoft.playwright.Page;

import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class ExtractExample {
    public static void main(String[] args) {
        try (Playwright playwright = Playwright.create()) {
            Browser browser = playwright.chromium().launch();
            Page page = browser.newPage();
            
            PlaywrightPage playwrightPage = new PlaywrightPage(page);
            Agent agent = new Agent(new PlaywrightUIContextProvider(playwrightPage));
            
            // Visit news website
            page.navigate("https://news.example.com");
            
            // Extract structured data
            Map<String, Object> schema = new HashMap<>();
            schema.put("articles", List.of(
                Map.of(
                    "title", "News title",
                    "time", "Publish time",
                    "summary", "News summary"
                )
            ));
            
            Map<String, Object> newsData = agent.aiExtract(schema);
            
            // Output results
            List<Map<String, String>> articles = (List<Map<String, String>>) newsData.get("articles");
            for (Map<String, String> article : articles) {
                System.out.println("πŸ“° " + article.get("title"));
                System.out.println("⏰ " + article.get("time"));
                System.out.println("πŸ“„ " + article.get("summary") + "\n");
            }
            
            browser.close();
        }
    }
}

Android Automation Example

package com.example;

import com.midscene.core.Agent;
import com.midscene.android.AndroidDevice;
import com.midscene.android.AndroidUIContextProvider;

import java.util.concurrent.CompletableFuture;

public class AndroidExample {
    public static void main(String[] args) {
        // Connect to Android device
        AndroidDevice device = new AndroidDevice();
        CompletableFuture<Void> connectFuture = device.connect();
        connectFuture.join(); // Wait for connection to complete
        
        try {
            // Create Agent
            Agent agent = new Agent(new AndroidUIContextProvider(device));
            
            // Launch application
            agent.aiAction("Launch the settings app");
            
            // Perform operations
            agent.aiAction("Tap on the Wi-Fi option");
            agent.aiAssert("The Wi-Fi settings page is open");
            
            System.out.println("βœ… Android automation operation completed!");
        } finally {
            device.disconnect();
        }
    }
}

πŸ“– Documentation

πŸ†š Comparison with Traditional Tools

Feature Traditional Automation Tools Midscene Java
Learning Curve Steep, requires learning complex APIs Gentle, natural language driven
Code Readability Obscure and hard to understand Intuitive and easy to understand
Maintenance Cost High, requires extensive modifications for page changes Low, AI automatically adapts to changes
Element Locating Manual selector writing AI intelligent locating
Error Handling Manual handling of various exceptions AI automatic retry and recovery
Cross-Platform Requires learning different tools Unified interface
Code Quality Varies by project Systematically refactored, modular design

🀝 Contribution Guidelines

We welcome all forms of contributions! Whether it's submitting bug reports, feature requests, documentation improvements, or code contributions.

How to Contribute

  1. Fork this repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Create a Pull Request

Development Environment Setup

# Clone the repository
git clone https://github.com/Master-Frank/midscene-java.git
cd midscene-java

# Build the project
mvn clean install

# Run tests
mvn test

Code Standards

  • Follow commit message conventions from Conventional Commits
  • Add corresponding test cases for new features
  • Add JavaDoc documentation for public APIs
  • Keep code modular, avoid overly long methods

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“ Credits

Thanks to Midscene Project: https://github.com/web-infra-dev/midscene for inspiration and technical references

πŸ“ž Contact Us


⭐ If this project helps you, please give us a star!

Releases

No releases published

Packages

No packages published