Ask HN: 2x Arc A770 or 1x Radeon 7900 XTX for llama.cpp

5 points by danielEM 14 hours ago

Can't find "apple to apple" comparison on performance on QWQ 32b (4bit), can anyone help me with decision on which solution to pick?

From what I dig so far it looks like dual Arc A770 is supported by llama.cpp. And saw some reports that llama.cpp on top of IPEX-LLM is fastest way for inference on intel card.

On the other end there is more expensive 7900 XTX on which AMD claims (Jan '25) that inference is faster than on 4090.

So - what is the state of the art as of today, how does one compare to another (apple to apple)? What is tokens/s diff?

runjake 14 hours ago

I don't know but you'll probably find a better answer here:

https://www.reddit.com/r/LocalLLaMA/

Using the search gave me a bunch of threads, but here's one:

https://www.reddit.com/r/LocalLLaMA/comments/1ip6c9e/looking...

danielEM 13 hours ago

I've seen these threads (I've done actually quite some digging before reaching to HN), but my problem is that if I'm to make decision I would want to see how both solution compare directly with exactly same model on most recent drivers. Having results from different models, or results older than 2-3 months is not reliable in this case.

laweijfmvo 13 hours ago

How about a 3090?

danielEM 13 hours ago

3090 is about twice as expensive as 7900 XTX, so out of my budget at this moment. I'm EU based and my absolute top limit is 1000 USD. 7900XTX cost around that and dual Arc A770 cost about 30% less.
- laweijfmvo 13 hours ago
  
  used 3090 seems $800-$850 on eBay