-
Notifications
You must be signed in to change notification settings - Fork 57
Santosh/multimodal #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Santosh/multimodal #144
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add _extract_image_from_data() helper for various image formats - Add _find_images_recursive() for generalized fallback detection - Extract images from message.images (OpenRouter/Gemini pattern) - Handle data URLs with base64 extraction - Add recursive fallback search for edge cases
- Add dedicated methods for image and audio generation - Clearer naming than ai_with_vision/ai_with_audio - Full documentation with examples - Uses AIConfig defaults for model selection
- image_model is an alias for vision_model - Provides clearer naming for image generation model config - Backwards compatible - vision_model still works
- MediaProvider abstract base class for unified media generation - FalProvider: Fal.ai integration for flux-pro, f5-tts, etc. - LiteLLMProvider: DALL-E, Azure, and LiteLLM-supported backends - OpenRouterProvider: Gemini and other OpenRouter image models - Provider registry with get_provider() and register_provider() - Easy to add custom providers by subclassing MediaProvider
Contributor
Performance
✓ No regressions detected |
- Use subscribe_async() for queue-based reliable execution - Support fal image size presets (square_hd, landscape_16_9, etc.) - Add video generation with generate_video() method - Add audio transcription with transcribe_audio() method - Support all major fal models: flux/dev, flux/schnell, flux-pro - Add video models: minimax-video, luma-dream-machine, kling-video - Improve documentation with examples - Add seed, guidance_scale, num_inference_steps parameters
- Add fal_api_key and video_model to AIConfig
- Add _fal_provider lazy property to AgentAI
- Route fal-ai/ and fal/ prefixed models to FalProvider in:
- ai_with_vision() for image generation
- ai_with_audio() for TTS
- Add ai_generate_video() method for video generation
- Add ai_transcribe_audio() method for speech-to-text
- Update docstrings with Fal examples
- Add comprehensive tests for media providers
Unified UX pattern:
- app.ai_generate_image("...", model="fal-ai/flux/dev") # Fal
- app.ai_generate_image("...", model="dall-e-3") # LiteLLM
- app.ai_generate_video("...", model="fal-ai/minimax-video/...")
- app.ai_transcribe_audio(url, model="fal-ai/whisper")
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TYPE_CHECKING import for MultimodalResponse forward reference (F821) - Remove unused width/height/content_type variables in FalProvider (F841) - Remove unused sys/types imports in tests (F401) - Remove unused result variable in test (F841) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unused result assignment in test_ai_generate_video_uses_default_model. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive multimodal support to the AgentField SDK with a unified UX pattern. All media generation (image, audio, video, transcription) works through consistent
.ai_*()methods with automatic provider routing based on model prefix.Key Features
Unified Multimodal UX
Changes
types.pyfal_api_keyfield to AIConfig for Fal.ai API keyvideo_modelfield with defaultfal-ai/minimax-video/image-to-videoagent_ai.py_fal_providerlazy property for cached FalProvider instanceai_with_vision()to routefal-ai/andfal/prefixed models to FalProviderai_with_audio()to route Fal TTS models to FalProviderai_generate_video()method for video generationai_transcribe_audio()method for speech-to-textmedia_providers.py(from previous commits)FalProviderwith full implementation:generate_image()- Flux, SDXL, Recraft modelsgenerate_audio()- Fal TTS modelsgenerate_video()- MiniMax, Kling, Luma modelstranscribe_audio()- Whisper, Wizper modelsTests
tests/test_media_providers.py(30 tests)Provider Routing Summary
fal-ai/,fal/openrouter/Testing
tests/test_media_providers.py- 30 tests passingChecklist
CHANGELOG.md(or this change does not warrant a changelog entry).🤖 Generated with Claude Code