Santosh/multimodal #144

santoshkumarradha · 2026-01-12T22:22:01Z

Summary

This PR adds comprehensive multimodal support to the AgentField SDK with a unified UX pattern. All media generation (image, audio, video, transcription) works through consistent .ai_*() methods with automatic provider routing based on model prefix.

Key Features

Unified Multimodal UX

# Image generation - automatic provider routing
result = await app.ai_generate_image("A sunset", model="fal-ai/flux/dev")      # → Fal
result = await app.ai_generate_image("A sunset", model="dall-e-3")              # → LiteLLM
result = await app.ai_generate_image("A sunset", model="openrouter/google/...")  # → OpenRouter

# Audio generation (TTS)
result = await app.ai_generate_audio("Hello", model="tts-1")                    # → LiteLLM
result = await app.ai_generate_audio("Hello", model="fal-ai/kokoro/...")        # → Fal

# Video generation (NEW)
result = await app.ai_generate_video("A cat", model="fal-ai/minimax-video/...")  # → Fal

# Audio transcription (NEW)
result = await app.ai_transcribe_audio(url, model="fal-ai/whisper")             # → Fal

Changes

`types.py`

Added fal_api_key field to AIConfig for Fal.ai API key
Added video_model field with default fal-ai/minimax-video/image-to-video

`agent_ai.py`

Added _fal_provider lazy property for cached FalProvider instance
Updated ai_with_vision() to route fal-ai/ and fal/ prefixed models to FalProvider
Updated ai_with_audio() to route Fal TTS models to FalProvider
Added ai_generate_video() method for video generation
Added ai_transcribe_audio() method for speech-to-text
Updated docstrings with Fal examples

`media_providers.py` (from previous commits)

Added FalProvider with full implementation:
- generate_image() - Flux, SDXL, Recraft models
- generate_audio() - Fal TTS models
- generate_video() - MiniMax, Kling, Luma models
- transcribe_audio() - Whisper, Wizper models

Tests

Added comprehensive test suite in tests/test_media_providers.py (30 tests)
Tests for AIConfig new fields, provider routing, new methods

Provider Routing Summary

Model Prefix	Provider	Modalities
`fal-ai/`, `fal/`	FalProvider	image, audio, video, transcribe
`openrouter/`	OpenRouterProvider	image
(default)	LiteLLMProvider	image, audio (TTS)

Testing

tests/test_media_providers.py - 30 tests passing
Syntax validation and import checks

Checklist

I updated documentation where applicable.
I added or updated tests (30 new tests in test_media_providers.py).
I updated CHANGELOG.md (or this change does not warrant a changelog entry).

🤖 Generated with Claude Code

- Add _extract_image_from_data() helper for various image formats - Add _find_images_recursive() for generalized fallback detection - Extract images from message.images (OpenRouter/Gemini pattern) - Handle data URLs with base64 extraction - Add recursive fallback search for edge cases

- Add dedicated methods for image and audio generation - Clearer naming than ai_with_vision/ai_with_audio - Full documentation with examples - Uses AIConfig defaults for model selection

- image_model is an alias for vision_model - Provides clearer naming for image generation model config - Backwards compatible - vision_model still works

- MediaProvider abstract base class for unified media generation - FalProvider: Fal.ai integration for flux-pro, f5-tts, etc. - LiteLLMProvider: DALL-E, Azure, and LiteLLM-supported backends - OpenRouterProvider: Gemini and other OpenRouter image models - Provider registry with get_provider() and register_provider() - Easy to add custom providers by subclassing MediaProvider

github-actions · 2026-01-12T22:23:10Z

Performance

SDK	Memory	Δ	Latency	Δ	Tests	Status
Python	9.0 KB	-	0.47 µs	+34%	✓	⚠

✓ No regressions detected

- Use subscribe_async() for queue-based reliable execution - Support fal image size presets (square_hd, landscape_16_9, etc.) - Add video generation with generate_video() method - Add audio transcription with transcribe_audio() method - Support all major fal models: flux/dev, flux/schnell, flux-pro - Add video models: minimax-video, luma-dream-machine, kling-video - Improve documentation with examples - Add seed, guidance_scale, num_inference_steps parameters

- Add fal_api_key and video_model to AIConfig - Add _fal_provider lazy property to AgentAI - Route fal-ai/ and fal/ prefixed models to FalProvider in: - ai_with_vision() for image generation - ai_with_audio() for TTS - Add ai_generate_video() method for video generation - Add ai_transcribe_audio() method for speech-to-text - Update docstrings with Fal examples - Add comprehensive tests for media providers Unified UX pattern: - app.ai_generate_image("...", model="fal-ai/flux/dev") # Fal - app.ai_generate_image("...", model="dall-e-3") # LiteLLM - app.ai_generate_video("...", model="fal-ai/minimax-video/...") - app.ai_transcribe_audio(url, model="fal-ai/whisper") Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add TYPE_CHECKING import for MultimodalResponse forward reference (F821) - Remove unused width/height/content_type variables in FalProvider (F841) - Remove unused sys/types imports in tests (F401) - Remove unused result variable in test (F841) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove unused result assignment in test_ai_generate_video_uses_default_model. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

santoshkumarradha added 4 commits January 12, 2026 17:14

Add ai_generate_image and ai_generate_audio methods

ef19749

- Add dedicated methods for image and audio generation - Clearer naming than ai_with_vision/ai_with_audio - Full documentation with examples - Uses AIConfig defaults for model selection

Add image_model computed property to AIConfig

0a304dd

- image_model is an alias for vision_model - Provides clearer naming for image generation model config - Backwards compatible - vision_model still works

santoshkumarradha and others added 4 commits January 12, 2026 17:25

Fix remaining unused variable lint error

c34fb44

Remove unused result assignment in test_ai_generate_video_uses_default_model. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

santoshkumarradha merged commit 5f781b8 into main Jan 12, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Santosh/multimodal #144

Santosh/multimodal #144

Uh oh!

santoshkumarradha commented Jan 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Santosh/multimodal #144

Santosh/multimodal #144

Uh oh!

Conversation

santoshkumarradha commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Unified Multimodal UX

Changes

types.py

agent_ai.py

media_providers.py (from previous commits)

Tests

Provider Routing Summary

Testing

Checklist

Uh oh!

github-actions bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

santoshkumarradha commented Jan 12, 2026 •

edited

Loading

`types.py`

`agent_ai.py`

`media_providers.py` (from previous commits)

github-actions bot commented Jan 12, 2026 •

edited

Loading