Ask HN: Best Setup for LLM training and inference with $30k?

1 points by behnamoh 7 months ago

I have a budget ($30k) which I want to use to purchase a rig to train and inference language models. I've looked at a few options.

- M2/M3 Ultra (maybe 2x for +$20k):

It seems these are good for inference with relatively high bandwidth (800 GB/s) and lots of unified RAM.

But some libraries (like bitsandbytes) aren't available for Apple Silicon yet, making it challenging/impossible to train transformer models from scratch on these machines.

Finetuning using MLX seems to be possible though.

Main advantage: I can actually buy one and get it in a few days.

- GPU clusters (like 8x5090 at $2000 MSRP + motherboard, etc.)

I'm not familiar with HBMs and other enterprise options, but a lot of people at r/localllama seem to like 3090/4090 rigs, especially 3090 since it supports nv-link (I've heard that 2x4090 would "halve" the bandwidth?!)

5090 seems to have some driver issues now, and the fact that most libraries haven't migrated to CUDA 12 might limit it (at least in short term).

Main problem: Totally over-priced and outright impossible to even purchase one. And the power consumption is going to be an issue.

What are your thoughts? I'm interested in doing LLM research as well (modifying LLM architecture, training simple transformers from scratch, fine tuning, etc.)

bigyabai 7 months ago

Spend $50 experimenting with both architectures in a VPS and buy whichever you preferred.