-
-
Notifications
You must be signed in to change notification settings - Fork 5k
fix(vertex_ai): improve passthrough endpoint url parsing and construction (#17402) #17526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(vertex_ai): improve passthrough endpoint url parsing and construction (#17402) #17526
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
cb2c137 to
8a4f674
Compare
|
Hi @krrishdholakia 👋 This PR fixes an issue where LiteLLM does not correctly load vertex_project and vertex_location for Vertex AI passthrough when using the google-genai Python SDK. In the SDK, if the user does not explicitly provide vertex_project and vertex_location, then the request will not contain these values. In this case, LiteLLM should fall back to the values configured in the YAML: use_in_pass_through: true
vertex_project: ...
vertex_location: ...The current behavior ignores these config values, resulting in passthrough URLs like: This PR ensures LiteLLM correctly loads the configured project and location so that Vertex AI passthrough works as documented. We rely on Vertex passthrough in production, so it would be great if this could be reviewed or assigned to another maintainer. Thanks again for the support! |
| vertex_location=vertex_location, | ||
| ) | ||
|
|
||
| if vertex_project is None or vertex_location is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make sure user has access to the model before allowing request to go through
can you add the extraction logic here -
| model = get_model_from_request(request_data, route) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or some version of the logic maybe in your code block?
maybe extract model and just run can_key_call_model - to confirm valid access before proceeding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this question "about whether the key has permission to access the model"
I will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the original pass_through_endpoints logic did not check whether the key had permission to call the model. I have fixed this issue in the new PR #17970.
Add a test that verifies _base_vertex_proxy_route uses get_available_deployment for proper load balancing instead of get_model_list. This ensures the correct deployment is selected from the router and vertex credentials are properly fetched. Also refactor the implementation to: - Use get_available_deployment instead of get_model_list - Add error handling for deployment retrieval - Improve code structure with try-except block
Add dedicated methods to filter and select deployments for pass-through endpoints: - Implement get_available_deployment_for_pass_through() to ensure only deployments with use_in_pass_through=True are considered - Implement async_get_available_deployment_for_pass_through() for async operations - Add _filter_pass_through_deployments() helper method to filter by use_in_pass_through flag - Update vertex pass-through route to use the new dedicated method This ensures pass-through endpoints respect the use_in_pass_through configuration and apply proper load balancing strategy only to configured deployments. Add comprehensive tests to verify filtering and load balancing behavior.
d80f9ae to
5e7bb0c
Compare
Title
fix(vertex_ai): improve passthrough endpoint url parsing and construction and deployment filtering
Relevant issues
Fixes #17402
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directorymake test-unitType
🐛 Bug Fix
Changes
URL Parsing and Construction Improvements
litellm/llms/vertex_ai/common_utils.py:
/v1/and/v1beta1/version prefixes from the requested route to prevent double versioning in the target URL.litellm/proxy/pass_through_endpoints/llm_passthrough_endpoints.py:
_base_vertex_proxy_routeto improve project and location resolution. If vertex_project or vertex_location cannot be parsed from the URL, it now attempts to extract the model ID and look up the corresponding deployment in thellm_routerto find the configured vertex_project and vertex_location.tests/test_litellm/llms/vertex_ai/test_vertex_ai_common_utils.py:
construct_target_urlverifying correct handling of/v1/and/v1beta1/prefixes.Pass-Through Deployment Filtering
litellm/router.py:
get_available_deployment_for_pass_through()method to ensure only deployments configured withuse_in_pass_through=Trueare returned for pass-through endpoint selection.async_get_available_deployment_for_pass_through()for async operations with the same filtering behavior._filter_pass_through_deployments()helper method to filter deployments by theuse_in_pass_throughflag.litellm/proxy/pass_through_endpoints/llm_passthrough_endpoints.py:
_base_vertex_proxy_routeto use the newget_available_deployment_for_pass_through()method instead ofget_available_deployment()to ensure pass-through filtering is applied consistently.tests/test_litellm/proxy/pass_through_endpoints/test_vertex_passthrough_load_balancing.py:
get_available_deployment_for_pass_through().test_get_available_deployment_for_pass_through_filters_correctly()to verify correct filtering of pass-through deployments.test_get_available_deployment_for_pass_through_no_deployments()to verify proper error handling when no pass-through deployments exist.test_get_available_deployment_for_pass_through_load_balancing()to verify load balancing respects deployment RPM weights.test_async_get_available_deployment_for_pass_through()to verify async functionality.Summary
This PR improves the Vertex AI pass-through endpoint handling in two main areas:
URL Parsing & Configuration: Properly parses model IDs from URLs and looks up vertex_project and vertex_location from router deployments when not present in the URL.
Deployment Filtering: Implements dedicated pass-through deployment selection methods that ensure only deployments explicitly configured for pass-through are used, while maintaining proper load balancing across them.
These changes ensure pass-through endpoints are more robust and respect deployment configuration, while enabling proper load balancing for pass-through requests.