Skip to content

Conversation

@santoshkumarradha
Copy link
Member

@santoshkumarradha santoshkumarradha commented Jan 12, 2026

Summary

This PR adds comprehensive multimodal support to the AgentField SDK with a unified UX pattern. All media generation (image, audio, video, transcription) works through consistent .ai_*() methods with automatic provider routing based on model prefix.

Key Features

Unified Multimodal UX

# Image generation - automatic provider routing
result = await app.ai_generate_image("A sunset", model="fal-ai/flux/dev")      # → Fal
result = await app.ai_generate_image("A sunset", model="dall-e-3")              # → LiteLLM
result = await app.ai_generate_image("A sunset", model="openrouter/google/...")  # → OpenRouter

# Audio generation (TTS)
result = await app.ai_generate_audio("Hello", model="tts-1")                    # → LiteLLM
result = await app.ai_generate_audio("Hello", model="fal-ai/kokoro/...")        # → Fal

# Video generation (NEW)
result = await app.ai_generate_video("A cat", model="fal-ai/minimax-video/...")  # → Fal

# Audio transcription (NEW)
result = await app.ai_transcribe_audio(url, model="fal-ai/whisper")             # → Fal

Changes

types.py

  • Added fal_api_key field to AIConfig for Fal.ai API key
  • Added video_model field with default fal-ai/minimax-video/image-to-video

agent_ai.py

  • Added _fal_provider lazy property for cached FalProvider instance
  • Updated ai_with_vision() to route fal-ai/ and fal/ prefixed models to FalProvider
  • Updated ai_with_audio() to route Fal TTS models to FalProvider
  • Added ai_generate_video() method for video generation
  • Added ai_transcribe_audio() method for speech-to-text
  • Updated docstrings with Fal examples

media_providers.py (from previous commits)

  • Added FalProvider with full implementation:
    • generate_image() - Flux, SDXL, Recraft models
    • generate_audio() - Fal TTS models
    • generate_video() - MiniMax, Kling, Luma models
    • transcribe_audio() - Whisper, Wizper models

Tests

  • Added comprehensive test suite in tests/test_media_providers.py (30 tests)
  • Tests for AIConfig new fields, provider routing, new methods

Provider Routing Summary

Model Prefix Provider Modalities
fal-ai/, fal/ FalProvider image, audio, video, transcribe
openrouter/ OpenRouterProvider image
(default) LiteLLMProvider image, audio (TTS)

Testing

  • tests/test_media_providers.py - 30 tests passing
  • Syntax validation and import checks

Checklist

  • I updated documentation where applicable.
  • I added or updated tests (30 new tests in test_media_providers.py).
  • I updated CHANGELOG.md (or this change does not warrant a changelog entry).

🤖 Generated with Claude Code

- Add _extract_image_from_data() helper for various image formats
- Add _find_images_recursive() for generalized fallback detection
- Extract images from message.images (OpenRouter/Gemini pattern)
- Handle data URLs with base64 extraction
- Add recursive fallback search for edge cases
- Add dedicated methods for image and audio generation
- Clearer naming than ai_with_vision/ai_with_audio
- Full documentation with examples
- Uses AIConfig defaults for model selection
- image_model is an alias for vision_model
- Provides clearer naming for image generation model config
- Backwards compatible - vision_model still works
- MediaProvider abstract base class for unified media generation
- FalProvider: Fal.ai integration for flux-pro, f5-tts, etc.
- LiteLLMProvider: DALL-E, Azure, and LiteLLM-supported backends
- OpenRouterProvider: Gemini and other OpenRouter image models
- Provider registry with get_provider() and register_provider()
- Easy to add custom providers by subclassing MediaProvider
@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2026

Performance

SDK Memory Δ Latency Δ Tests Status
Python 9.0 KB - 0.47 µs +34%

✓ No regressions detected

santoshkumarradha and others added 4 commits January 12, 2026 17:25
- Use subscribe_async() for queue-based reliable execution
- Support fal image size presets (square_hd, landscape_16_9, etc.)
- Add video generation with generate_video() method
- Add audio transcription with transcribe_audio() method
- Support all major fal models: flux/dev, flux/schnell, flux-pro
- Add video models: minimax-video, luma-dream-machine, kling-video
- Improve documentation with examples
- Add seed, guidance_scale, num_inference_steps parameters
- Add fal_api_key and video_model to AIConfig
- Add _fal_provider lazy property to AgentAI
- Route fal-ai/ and fal/ prefixed models to FalProvider in:
  - ai_with_vision() for image generation
  - ai_with_audio() for TTS
- Add ai_generate_video() method for video generation
- Add ai_transcribe_audio() method for speech-to-text
- Update docstrings with Fal examples
- Add comprehensive tests for media providers

Unified UX pattern:
- app.ai_generate_image("...", model="fal-ai/flux/dev")  # Fal
- app.ai_generate_image("...", model="dall-e-3")        # LiteLLM
- app.ai_generate_video("...", model="fal-ai/minimax-video/...")
- app.ai_transcribe_audio(url, model="fal-ai/whisper")

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TYPE_CHECKING import for MultimodalResponse forward reference (F821)
- Remove unused width/height/content_type variables in FalProvider (F841)
- Remove unused sys/types imports in tests (F401)
- Remove unused result variable in test (F841)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unused result assignment in test_ai_generate_video_uses_default_model.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@santoshkumarradha santoshkumarradha merged commit 5f781b8 into main Jan 12, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants