Skip to Content

How to Fine-Tune an LLM Using NEO

Fine-tune a Hugging Face model on a custom dataset with full SFT—no code, just a single prompt. NEO plans and runs the entire pipeline on your GPU node.


Problem Statement

We told NEO to: Fine-tune Qwen 3.5 4B on the Qwen3-Coder-Next-1800x dataset using full supervised fine-tuning (SFT), not LoRA—and to run the whole pipeline autonomously from one natural-language prompt in the VS Code extension, with no hand-written training code. We wanted environment setup, data loading and ChatML formatting, model loading with the right precision and device mapping, training config, training run, and checkpoint saving all handled by NEO on a connected GPU node.


Overview

Open NEO in your VS Code extension, connect to a GPU node, and paste a single prompt. NEO reads the prompt, plans the full training pipeline, and executes every step autonomously on your GPU node—no code required.

Example prompt:

“Finetune https://huggingface.co/Qwen/Qwen3.5-4B  using https://huggingface.co/datasets/Crownelius/Qwen3-Coder-Next-1800x . I want full fine-tuning SFT, not LoRA.”


Step 1: Give NEO a Single Prompt

This is the entire user-facing workflow. Open NEO in your VS Code extension, connect to a GPU node, and paste the prompt above. NEO plans and runs the full training pipeline autonomously. You don’t need to write a single line of code. You can also watch the walkthrough on Google Drive  if you prefer.


Step 2: NEO Sets Up the Environment

NEO automatically provisions the training environment. It detects your available hardware, installs the correct package versions, and verifies that CUDA is accessible before doing anything else.

What gets installed and why:

PackagePurpose
transformersModel loading and tokenization
trlSFTTrainer for supervised fine-tuning
accelerateMulti-GPU support and mixed precision training
datasetsPulling and handling Hugging Face datasets
torchPyTorch with CUDA support
sentencepieceTokenizer dependency for Qwen models

No manual pip installs, no CUDA debugging, no version conflicts to sort out.


Step 3: NEO Loads and Formats the Dataset

NEO pulls the dataset directly from Hugging Face. The dataset used here is Crownelius/Qwen3-Coder-Next-1800x, which contains around 1,800 high-quality coding instruction–response pairs curated for Qwen-family models.

NEO converts every sample into ChatML format automatically, so the model sees inputs exactly the way it was trained to expect. No preprocessing, no schema mapping, no format errors on your end.


Step 4: NEO Loads the Model

NEO downloads Qwen 3.5 4B and loads it in bfloat16 precision, which halves memory usage without hurting numerical stability.


Step 5: NEO Configures the Training Run

NEO sets every training parameter without any input from you.

ParameterValueWhy
Epochs3Enough to learn the dataset without overfitting 1,800 samples
Batch size per GPU2Safe for A100 40GB VRAM
Gradient accumulation8 stepsEffective batch of 16 without extra VRAM cost
Learning rate2e-5Standard for full SFT on instruction datasets
LR schedulerCosineSmooth decay that avoids late-stage overfitting
Warmup5% of stepsStabilizes early training before full LR kicks in
Precisionbfloat16 + tf32Speed and stability on Ampere and Hopper GPUs
Max sequence length2,048 tokensCovers nearly all samples in the dataset

This is full SFT: every weight in the 4B model is updated on every step. The result is one standalone checkpoint with no adapter files and no merging required.


Step 6: NEO Runs Training and Streams Logs

NEO launches the training job on the GPU node and streams live logs back to your VS Code terminal through the extension. You can watch everything in real time.

What to watch in the logs:

Rough training time (3 epochs, 1,800 samples):

GPUApprox. time
A100 40GB25–40 minutes
H100 80GB12–20 minutes
2× A100 40GB12–20 minutes

If your VS Code window disconnects mid-run, training continues on the node and NEO re-attaches automatically when you reconnect.


Pipeline Architecture Overview

StageWhat NEO does
1. PromptUser pastes single prompt in VS Code; NEO parses model, dataset, and training type (full SFT)
2. EnvironmentProvisions GPU node, installs transformers, trl, accelerate, datasets, torch, sentencepiece; checks CUDA
3. DataPulls Hugging Face dataset; converts samples to ChatML format
4. ModelLoads Qwen 3.5 4B in bfloat16; device_map=auto, gradient checkpointing, padding token set
5. TrainingRuns SFTTrainer with configured epochs, batch size, LR, scheduler; streams logs to VS Code
6. CheckpointSaves full model directory (config, tokenizer, model.safetensors) on node

Step 7: NEO Saves the Final Checkpoint

When training finishes, NEO saves a complete model directory on the GPU node. Everything needed for inference is in one place.

qwen-coder-sft/ ├── config.json ├── tokenizer.json ├── tokenizer_config.json ├── special_tokens_map.json ├── model.safetensors └── training_args.bin

From there you can download the checkpoint to your machine, push it to Hugging Face, or run inference on the node with vLLM, Ollama, llama.cpp, or the standard Transformers pipeline.


Why Full SFT Instead of LoRA?

LoRA trains a small set of adapter matrices while keeping most of the model frozen. It’s faster and cheaper, but you need a merge step before deployment, it can underfit on complex or diverse coding tasks, and merged models sometimes show quality degradation compared to a well-trained full SFT run.

Full SFT updates every weight in the model. The output is a single .safetensors file that is the model—nothing else attached. For a coding model you plan to use daily, the quality difference is noticeable on the tasks it was trained for.


Hardware Requirements

ComponentMinimumRecommended
GPUA100 40GBH100 80GB or 2× A100
System RAM64 GB128 GB
Storage100 GB free200 GB free

Repository & Artifacts

This page describes a VS Code + GPU node workflow run with NEO. There is no standalone showcase GitHub repository—the primary artifact is the fine-tuned checkpoint NEO writes on your node (see Step 7).

Generated Artifacts:

Upstream sources (public):


References


Learn More