-
Notifications
You must be signed in to change notification settings - Fork 289
[feature] Add prefix_kv_cache transfer between dp rankers. #1093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
153a409
fix
962ef2b
fix
0cd2b86
add enable_dp_prompt_cache_fetch
WANDY666 0667d71
free_radix_cache_to_get_enough_token instead of skip
WANDY666 5a1e22d
add use_openai_api, port, concurrency, history-turns, max-total-token…
WANDY666 7551fb7
support pd split
WANDY666 a3c44be
add mem_queues
WANDY666 1323ce1
little update
WANDY666 b72b7ac
fix
518b8f1
fix
da2eb3d
use node_nccl_group
WANDY666 737218e
delete shm_kv_indexes add shared_kv_indexes to reduce shared memory u…
WANDY666 b0743a7
layer into triton op
WANDY666 26879ea
fix multiple visits to fd
WANDY666 725eec3
fix pd mem_manager get failed
WANDY666 dad0b83
fix release other shm_reqs
WANDY666 6cc9982
add use_for_pd_trans to avoid duplicate name overwriting
WANDY666 a97df66
minor change
WANDY666 47212f0
add test.py
hiworldwzj cbb1b84
improve mem_manager
hiworldwzj b548946
write mem manager to shm
hiworldwzj 6270e0b
fix
hiworldwzj e1769a3
fix
hiworldwzj c4f780f
fix
1e1e18a
fix
c0f567e
fix
WANDY666 53852d2
fix position_ids empty
WANDY666 4e9a6c5
fix
WANDY666 704830f
add news
WANDY666 796f036
update readme
WANDY666 1fe6cfa
update readme
WANDY666 0e13fb9
fix
956796c
fix
4cc963e
fix
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These stride parameters (
input_stride_1,input_stride_2) are either unused or their values are not used after being cast. This makes the kernel signature more complex than necessary. The same applies tooutput_stride_1andoutput_stride_2on lines 206-207. For improved clarity and maintainability, it's recommended to remove these unused parameters. Consequently, the call to this kernel inkv_trans_for_dpshould be updated to pass only the required strides (e.g.,output.stride(0)) instead of unpacking all strides with*output.stride().