Feature Request - Add Sliding Window Memory Scheduling #128

githuba9f5404 · 2024-10-05T15:16:30Z

Paper here: https://arxiv.org/pdf/2410.00531
Code here: https://github.com/Lizonghang/TPI-LLM

This would allow for even larger models to run on smaller distribution. I think Llama 3.1 405B Instruct might be possible on my 8 x Raspberry Pi 4B 8GB array (specs here: #122).

I want to clarify I have little to no idea what I'm doing really, and this request might be impossible or very difficult for a number of reasons. I'm just a guy with an array of raspis that wants to run local LLMs. I did want to bring your attention to this other project either way, as their approach seems similar, interesting and possibly able to increase the capabilities of this project. So far my limited experience with that other project has been that it's so much slower (due to full precision), but some features (like pre-splitting the models and having the worker nodes only download their section once) seems like it might make sense in your project as well.

Thank you for continuing to advance and push the boundaries of edge AI, as a novice hobbyist, your expert efforts are so greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - Add Sliding Window Memory Scheduling #128

Feature Request - Add Sliding Window Memory Scheduling #128

githuba9f5404 commented Oct 5, 2024

Feature Request - Add Sliding Window Memory Scheduling #128

Feature Request - Add Sliding Window Memory Scheduling #128

Comments

githuba9f5404 commented Oct 5, 2024