-
Notifications
You must be signed in to change notification settings - Fork 1
Feat/multiturn #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/multiturn #126
Conversation
Summary of ChangesHello @tttennnnn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement by adding multiturn benchmarking capabilities, allowing for a more comprehensive evaluation of model performance in conversational contexts. It integrates a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new run_multiturn benchmark capability. While the feature addition is a good step, there are several areas for improvement. The most critical issue is that the new perform_multiturn method is not defined in the ProviderInterface, which will likely lead to runtime errors. Additionally, the new method continues a problematic pattern of special-casing the vLLM provider, violating polymorphism principles. I've also identified code duplication in the metrics plotting logic and a hardcoded time interval that should be made configurable. Addressing these points will improve the robustness, maintainability, and flexibility of the benchmarking framework.
| if provider_name == "vLLM": | ||
| provider.perform_multiturn(model, time_interval, self.streaming, self.num_requests, self.verbosity, self.vllm_ip) | ||
| else: | ||
| provider.perform_multiturn(model, time_interval, self.streaming, self.num_requests, self.verbosity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perform_multiturn method is called on provider instances, but it is not defined in the ProviderInterface abstract base class. This will cause an AttributeError at runtime for any provider that does not implement this method. To ensure all providers adhere to the expected contract, you should add perform_multiturn to ProviderInterface, likely as an @abstractmethod.
Additionally, the special handling for vLLM by checking provider_name and passing an extra vllm_ip argument violates the Liskov Substitution Principle and indicates a design issue. This pattern is repeated across run, run_trace, and now run_multiturn. A better approach would be to configure the vLLM provider with its IP address during its initialization. This would allow all provider methods to be called with a consistent signature, eliminating the need for these if/else blocks.
| """ | ||
|
|
||
| # Time between each request in seconds | ||
| time_interval = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not self.streaming: | ||
| self.plot_metrics("response_times", "response_times") | ||
| else: | ||
| # Save all the relevant metrics plots when streaming is true | ||
| self.plot_metrics("timetofirsttoken", "timetofirsttoken") | ||
| self.plot_metrics("response_times", "totaltime") | ||
| self.plot_metrics("timebetweentokens", "timebetweentokens") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code for plotting metrics is duplicated in run(), run_trace(), and now in run_multiturn(). To improve maintainability and reduce redundancy (following the DRY principle), you should extract this logic into a private helper method, for example, _plot_results(), and call it from all three run* methods.
No description provided.