[The result of Tensorrt-llm](https://github.com/PolyAI-LDN/pheme#a100-gpu--100m-pheme-variant) is very amazing. If this is real, we streaming is not needed at all.