Skip to content

Conversation

@krisxia0506
Copy link
Contributor

@krisxia0506 krisxia0506 commented Dec 5, 2025

Title

fix(vertex_ai): improve passthrough endpoint url parsing and construction and deployment filtering

Relevant issues

Fixes #17402

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
image image

Type

🐛 Bug Fix

Changes

URL Parsing and Construction Improvements

  1. litellm/llms/vertex_ai/common_utils.py:

    • Added get_vertex_model_id_from_url helper function to extract the model ID from Vertex AI URLs.
    • Updated construct_target_url to correctly handle and strip /v1/ and /v1beta1/ version prefixes from the requested route to prevent double versioning in the target URL.
  2. litellm/proxy/pass_through_endpoints/llm_passthrough_endpoints.py:

    • Updated _base_vertex_proxy_route to improve project and location resolution. If vertex_project or vertex_location cannot be parsed from the URL, it now attempts to extract the model ID and look up the corresponding deployment in the llm_router to find the configured vertex_project and vertex_location.
  3. tests/test_litellm/llms/vertex_ai/test_vertex_ai_common_utils.py:

    • Added unit tests for get_vertex_model_id_from_url covering valid and invalid URLs.
    • Added unit tests for construct_target_url verifying correct handling of /v1/ and /v1beta1/ prefixes.

Pass-Through Deployment Filtering

  1. litellm/router.py:

    • Implemented get_available_deployment_for_pass_through() method to ensure only deployments configured with use_in_pass_through=True are returned for pass-through endpoint selection.
    • Implemented async_get_available_deployment_for_pass_through() for async operations with the same filtering behavior.
    • Added _filter_pass_through_deployments() helper method to filter deployments by the use_in_pass_through flag.
    • Both methods respect the configured routing strategy (simple-shuffle, least-busy, latency-based-routing, usage-based-routing) while only considering pass-through enabled deployments.
  2. litellm/proxy/pass_through_endpoints/llm_passthrough_endpoints.py:

    • Updated _base_vertex_proxy_route to use the new get_available_deployment_for_pass_through() method instead of get_available_deployment() to ensure pass-through filtering is applied consistently.
  3. tests/test_litellm/proxy/pass_through_endpoints/test_vertex_passthrough_load_balancing.py:

    • Updated existing async test to verify use of get_available_deployment_for_pass_through().
    • Added test test_get_available_deployment_for_pass_through_filters_correctly() to verify correct filtering of pass-through deployments.
    • Added test test_get_available_deployment_for_pass_through_no_deployments() to verify proper error handling when no pass-through deployments exist.
    • Added test test_get_available_deployment_for_pass_through_load_balancing() to verify load balancing respects deployment RPM weights.
    • Added test test_async_get_available_deployment_for_pass_through() to verify async functionality.

Summary

This PR improves the Vertex AI pass-through endpoint handling in two main areas:

  1. URL Parsing & Configuration: Properly parses model IDs from URLs and looks up vertex_project and vertex_location from router deployments when not present in the URL.

  2. Deployment Filtering: Implements dedicated pass-through deployment selection methods that ensure only deployments explicitly configured for pass-through are used, while maintaining proper load balancing across them.

These changes ensure pass-through endpoints are more robust and respect deployment configuration, while enabling proper load balancing for pass-through requests.

@vercel
Copy link

vercel bot commented Dec 5, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
litellm Ready Ready Preview, Comment Dec 15, 2025 4:39am

@krisxia0506
Copy link
Contributor Author

Hi @krrishdholakia 👋

This PR fixes an issue where LiteLLM does not correctly load vertex_project and vertex_location for Vertex AI passthrough when using the google-genai Python SDK.

In the SDK, if the user does not explicitly provide vertex_project and vertex_location, then the request will not contain these values. In this case, LiteLLM should fall back to the values configured in the YAML:

use_in_pass_through: true
vertex_project: ...
vertex_location: ...

The current behavior ignores these config values, resulting in passthrough URLs like:

projects/None/locations/None/...

This PR ensures LiteLLM correctly loads the configured project and location so that Vertex AI passthrough works as documented.

We rely on Vertex passthrough in production, so it would be great if this could be reviewed or assigned to another maintainer.
Happy to provide additional tests or documentation updates if needed.

Thanks again for the support!

vertex_location=vertex_location,
)

if vertex_project is None or vertex_location is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure user has access to the model before allowing request to go through

can you add the extraction logic here -

model = get_model_from_request(request_data, route)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or some version of the logic maybe in your code block?

maybe extract model and just run can_key_call_model - to confirm valid access before proceeding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this question "about whether the key has permission to access the model"
I will fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the original pass_through_endpoints logic did not check whether the key had permission to call the model. I have fixed this issue in the new PR #17970.

Add a test that verifies _base_vertex_proxy_route uses
get_available_deployment for proper load balancing instead of
get_model_list. This ensures the correct deployment is selected
from the router and vertex credentials are properly fetched.

Also refactor the implementation to:
- Use get_available_deployment instead of get_model_list
- Add error handling for deployment retrieval
- Improve code structure with try-except block
Add dedicated methods to filter and select deployments for pass-through endpoints:
- Implement get_available_deployment_for_pass_through() to ensure only deployments with use_in_pass_through=True are considered
- Implement async_get_available_deployment_for_pass_through() for async operations
- Add _filter_pass_through_deployments() helper method to filter by use_in_pass_through flag
- Update vertex pass-through route to use the new dedicated method

This ensures pass-through endpoints respect the use_in_pass_through configuration and apply proper load balancing strategy only to configured deployments.

Add comprehensive tests to verify filtering and load balancing behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: use_in_pass_through does not pass project/location to Vertex AI — project_name=None and location=None

2 participants