Feat/multiturn #126

tttennnnn · 2026-01-03T03:49:48Z

No description provided.

gemini-code-assist · 2026-01-03T03:50:00Z

Summary of Changes

Hello @tttennnnn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement by adding multiturn benchmarking capabilities, allowing for a more comprehensive evaluation of model performance in conversational contexts. It integrates a new run_multiturn function to orchestrate these benchmarks across various providers and models, ensuring that relevant performance metrics are collected and visualized. Additionally, a small but important refactoring was made to improve variable naming within the provider interface for better code clarity.

Highlights

Multiturn Benchmarking Feature: A new run_multiturn method has been introduced in benchmark_main.py to enable multiturn benchmarking for evaluating model performance over sequential interactions.
Provider-Specific Multiturn Calls: The run_multiturn method now includes conditional logic to call provider.perform_multiturn, with a special case for the vLLM provider to pass an additional vllm_ip argument.
Metric Plotting for Multiturn: The newly added run_multiturn function incorporates logic to plot various performance metrics such as response_times, timetofirsttoken, totaltime, and timebetweentokens, depending on whether streaming is enabled.
Variable Renaming for Clarity: A minor refactoring was performed in provider_interface.py where the variable gen_tokens was renamed to target_tokens within the data_handler function to improve code readability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new run_multiturn benchmark capability. While the feature addition is a good step, there are several areas for improvement. The most critical issue is that the new perform_multiturn method is not defined in the ProviderInterface, which will likely lead to runtime errors. Additionally, the new method continues a problematic pattern of special-casing the vLLM provider, violating polymorphism principles. I've also identified code duplication in the metrics plotting logic and a hardcoded time interval that should be made configurable. Addressing these points will improve the robustness, maintainability, and flexibility of the benchmarking framework.

gemini-code-assist · 2026-01-03T03:51:00Z

benchmarking/benchmark_main.py

+                if provider_name == "vLLM":
+                    provider.perform_multiturn(model, time_interval, self.streaming, self.num_requests, self.verbosity, self.vllm_ip)
+                else:
+                    provider.perform_multiturn(model, time_interval, self.streaming, self.num_requests, self.verbosity)


The perform_multiturn method is called on provider instances, but it is not defined in the ProviderInterface abstract base class. This will cause an AttributeError at runtime for any provider that does not implement this method. To ensure all providers adhere to the expected contract, you should add perform_multiturn to ProviderInterface, likely as an @abstractmethod.

Additionally, the special handling for vLLM by checking provider_name and passing an extra vllm_ip argument violates the Liskov Substitution Principle and indicates a design issue. This pattern is repeated across run, run_trace, and now run_multiturn. A better approach would be to configure the vLLM provider with its IP address during its initialization. This would allow all provider methods to be called with a consistent signature, eliminating the need for these if/else blocks.

gemini-code-assist · 2026-01-03T03:51:00Z

benchmarking/benchmark_main.py

+        """
+
+        # Time between each request in seconds
+        time_interval = 15


The time_interval is hardcoded to 15 seconds. This reduces the flexibility of the multi-turn benchmark. Consider making this a configurable parameter, either by passing it to run_multiturn or by adding it as an attribute to the Benchmark class, initialized from the configuration file.

gemini-code-assist · 2026-01-03T03:51:00Z

benchmarking/benchmark_main.py

+        if not self.streaming:
+            self.plot_metrics("response_times", "response_times")
+        else:
+            # Save all the relevant metrics plots when streaming is true
+            self.plot_metrics("timetofirsttoken", "timetofirsttoken")
+            self.plot_metrics("response_times", "totaltime")
+            self.plot_metrics("timebetweentokens", "timebetweentokens")


This block of code for plotting metrics is duplicated in run(), run_trace(), and now in run_multiturn(). To improve maintainability and reduce redundancy (following the DRY principle), you should extract this logic into a private helper method, for example, _plot_results(), and call it from all three run* methods.

tttennnnn added 2 commits December 24, 2025 05:42

Multiturn benchmark runner

3530829

refactor

ded374e

gemini-code-assist bot reviewed Jan 3, 2026

View reviewed changes

multiturn method

4534e7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/multiturn #126

Feat/multiturn #126

Uh oh!

tttennnnn commented Jan 3, 2026

Uh oh!

gemini-code-assist bot commented Jan 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 3, 2026

Uh oh!

gemini-code-assist bot Jan 3, 2026

Uh oh!

gemini-code-assist bot Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/multiturn #126

Are you sure you want to change the base?

Feat/multiturn #126

Uh oh!

Conversation

tttennnnn commented Jan 3, 2026

Uh oh!

gemini-code-assist bot commented Jan 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants