Easy Train

I was using Tinker to run basic SFT via on datasets like no robots and tulu3 on LLaMa-3.1-8B. After training (LoRA rank 32, batch size 128, learning rate 1e-4), I merge the adapter weights with the base model and uploading the merged model to Hugging Face. I'm running inference via 4/8-bit quantized MLX locally and vLLM on Modal on an L4 GPU.

But not anymore: I ran out of my $150 free credits so now, I'm trying to train on Modal using Unsloth QLora with WanDB for visibility, then will upload weights to Hugging Face.

First SFT on No Robots dataset:

llama-3.1-8b-instruct-no-robots

llama-3.1-8b-instruct-no-robots-mlx

First SFT on Tulu3 dataset:

llama-3.1-8b-tulu3-sft

llama-3.1-8b-tulu3-sft-mlx

Unsloth QLora on FineTome dataset:

qwen3-32b-finetome-sft

Findings

On extremely short prompts, my fine-tunes produce gibberish responses. I'm currently trying to figure out if this is due to the training hyperparameters, or the model itself. A simple Hello turns into 1000s of words of the model hinting keyowrds to itself then generating text that exactly matches the dataset.

However, on longer prompts, the model seems to be able to generate more coherent responses.

Future Work

I want to explore RL (maybe the SFT checkpoint is supposed to be poor alone?).

I want to try PPO + RLHF and GRPO + RLVR to see if I can get a better instruct model and coding model potentially. Obviously, this requires a ton of compute and resources. I am interested in scaling RL and tackle problems like in this paper from Moonshot.

Folers

local -> local vLLM setup to test
modal -> Modal vLLM setup to test
train -> Tinker training setup
tinker_cookbook -> Tinker cookbook git submodule
client.py -> Simple client to test the vLLM server

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
inference		inference
post_training		post_training
tinker_cookbook		tinker_cookbook
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
GIT_SETUP.md		GIT_SETUP.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Easy Train

First SFT on No Robots dataset:

First SFT on Tulu3 dataset:

Unsloth QLora on FineTome dataset:

Findings

Future Work

Folers

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

aksheyd/easy-train

Folders and files

Latest commit

History

Repository files navigation

Easy Train

First SFT on No Robots dataset:

First SFT on Tulu3 dataset:

Unsloth QLora on FineTome dataset:

Findings

Future Work

Folers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages