Dell's version of the DGX Spark fixes pain points

(jeffgeerling.com)

91 points | by thomasjb 6 hours ago

12 comments

kristianp 2 hours ago
I know it's just a quick test, but llama 3.1 is getting a bit old. I would have liked to see a newer model that can fit, such as gpt-oss-120, (gpt-oss-120b-mxfp4.gguf), which is about 60gb of weights (1).
(1) https://github.com/ggml-org/llama.cpp/discussions/15396
[-]
- eurekin 1 hour ago
  Correct, most of r/LocalLlama moved onto next gen MoE models mostly. Deepseek introduced few good optimizations that every new model seems to use now too. Llama 4 was generally seen as a fiasco and Meta haven't made a release since
  [-]
  - fragmede 39 minutes ago
    What are some of the models people are using? (Rather than naming the ones they aren't.)
jasoneckert 5 hours ago
I've got the Dell version of the DGX Spark as well, and was very impressed with the build quality overall. Like Jeff Geerling noted, the fans are super quiet. And since I don't keep it powered on continuously and mainly connect to it remotely, the LED is a nice quick check for power.
But the nicest addition Dell made in my opinion is the retro 90's UNIX workstation-style wallpaper: https://jasoneckert.github.io/myblog/grace-blackwell/
[-]
- mapontosevenths 30 minutes ago
  I've had mine for a while now, and never actually connected a monitor to it. Now I'll have to. Thanks. :)
- ranger_danger 5 hours ago
  I just want a standard, affordable mini PC that looks like this one. Or better yet, with the brown accents normally found on recent PowerEdge systems.
  https://www.fsi-embedded.jp/contents/uploads/2018/11/DELLEMC...
  [-]
  - storus 5 hours ago
    Zotac has a bunch of x64 mini PCs that use a similar hexagonal styling.
Tepix 5 hours ago
You can get two Strix Halo PCs with similar specs for that $4000 price. I just hope that prompt preprocessing speeds will continue to improve, because Strix Halo is still quite slow in that regard.
Then there is the networking. While Strix Halo systems come with two USB4 40Gbit/s ports, it's difficult to
a) connect more than 3 machines with two ports each
b) get more than 23GBit/s or so per connection, if you're lucky. Latency will also be in the 0.2ms range, which leaves room for improvement.
Something like Apple's RDMA via Thunderbolt would be great to have on Strix Halo…
[-]
- benreesman 10 minutes ago
  NVFP4 (and to a lesser extent, MXFP8) work, in general. In terms of usable FLOPS the DGX Spark and the GMTek EVO-X2 both lose to the 5090, with NCCL and OpenMPI set up the DGX is still the nicest way to dev for our SBSA future. Working on that too, harder problem.
- coder543 5 hours ago
  As you allude, the prompt processing speeds are a killer improvement of the Spark which even 2 Strix Halo boxes would not match.
  Prompt processing is literally 3x to 4x higher on GPT-OSS-120B once you are a little bit into your context window, and it is similarly much faster for image generation or any other AI task.
  Plus the Nvidia ecosystem, as others have mentioned.
  One discussion with benchmarks: https://www.reddit.com/r/LocalLLaMA/comments/1oonomc/comment...
  If all you care about is token generation with a tiny context window, then they are very close, but that’s basically the only time. I studied this problem extensively before deciding what to buy, and I wish Strix Halo had been the better option.
  [-]
  - plagiarist 14 minutes ago
    Could I get your thoughts on the Asus GX10 vs. spending on GPU compute? It seems like one could get a lot of total VRAM with better memory bandwidth and make PCIe the bottleneck. Especially if you already have a motherboard with spare slots.
    I'm trying to better understand the trade offs, or if it depends on the workload.
- Aurornis 5 hours ago
  The primary advantage of the DGX box is that it gives you access to the nVidia ecosystem. You can develop against it almost like a mini version of the big servers you're targeting.
  It's not really intended to be a great value box for running LLMs at home. Jeff Geerling talks about this in the article.
  [-]
  - cmrdporcupine 4 hours ago
    Exactly this. I'm not sure why people keep drumming the "a Mac or Strix Halo is faster/cheaper" drum. Different market.
    If I want to do hobby / amateur AI research or do stuff with fine tuning models etc, learn the tooling. I'm better off with the DG10 than AMD or Apple's systems.
    The Strix Halo machines look nice. I'd like one of those too. Especially if/when they ever get around to getting it into a compelling laptop.
    But I ordered the ASUS Ascent DG10 machine (since it was more easily available for me than the other versions of these) because I want to play around with fine tuning open weight models, learning tooling, etc.
    That and I like the idea of having a (non-Apple) Aarch64 linux workstation at home.
    Now if the courier would just get their shit together and actually deliver the thing...
    [-]
    - mapontosevenths 4 hours ago
      I have this device, it's exactly as you say. This is a device for AI research and development. My buddies mac ultra beats it squarely for inference workloads, but for real tinkering it can't be beat.
      I've used it to fine tune 20+ models in the last couple of weeks. Neither a Mac or Strix Halo even try to compete.
    - lostmsu 28 minutes ago
      I got ASUS ROG Flow Z13 128G with Ryzen AI 395, and I am able to train nanoGPT with little effort. On Windows (haven't tried Linux), where ROCm was just released recently.
      See https://news.ycombinator.com/item?id=46052535
alecco 4 hours ago
IMHO DGX Spark at $4,000 is a bad deal with only 273 GB/s bandwidth and the compute capacity between a 5070 and a 5070 TI. And with PCIe 5.0 at 64 GB/s it's not such a big difference.
And the 2x 200 GBit/s QSFP... why would you stack a bunch of these? Does anybody actually use them in day-to-day work/research?
I liked the idea until the final specs came out.
[-]
- BadBadJellyBean 2 hours ago
  I think the selling point is the 128GB of unified system memory. With that you can run some interesting models. The 5090 maxes out at 32GB. And they cost about $3000 and more at the moment.
  [-]
  - alecco 2 hours ago
    1. /r/localllama unanimously doesn't like the Spark for running models
    2. and for CUDA dev it's not worth the crazy price when you can dev on a cheap RTX and then rent a GH or GB server for a couple of days if you need to adjust compatibility and scaling.
    [-]
    - BadBadJellyBean 1 hour ago
      I am not on reddit. What are they saying?
      [-]
      - mapontosevenths 32 minutes ago
        It isn't for "running models." Inference workloads like that are faster on a mac studio, if that's the goal. Apple has faster memory.
        These devices are for AI R&D. If you need to build models or fine tune them locally they're great.
        That said, I run GPT-OSS 120B on mine and it's 'fine'. I spend some time waiting on it, but the fact that I can run such a large model locally at a "reasonable" speed is still kind of impressive to me.
        It's REALLY fast for diffusion as well. If you're into image/video generation it's kind of awesome. All that compute really shines when for workloads that aren't memory speed bound.
kachapopopow 5 hours ago
Dell fixing issues instead of creating new ones? That's a new one for me. Would rather still not deal with their firmware updaters thought.
[-]
- cjbgkagh 5 hours ago
  Give them a chance, I’m sure they’ll add new issues in one of their monthly bios updates.
  [-]
  - kachapopopow 4 hours ago
    nothing beats perfectly good vendor firmware updates packaged in an obscenely complicated bash file that just extracts the tool and runs it while performing unnecessary and often broken validation that only runs on hardware that is part of their ecosystem (ex: dell nic on non dell chassis).
    [-]
    - BadBadJellyBean 2 hours ago
      On linux I use fwupdmgr to upgrade the firmware on my dell laptop. Not sure if that works for servers though.
npalli 2 hours ago
Seems you are paying the Dell tax of 15%. The same setup is $4K from NVidia, Lenovo and $3K for 1TB at Asus.
https://www.dell.com/en-us/shop/desktop-computers/dell-pro-m...
postalrat 53 minutes ago
Spark's biggest paint point is the price. Does it fix that?
nightski 56 minutes ago
It's a product without a purpose.
dagaci 3 hours ago
A nice little AI review with comparison of the CPU/Power Draw & Networking would be interested in seeing a fine-tuning comparison too. I think pricing was missing also.
[-]
- geerlingguy 1 hour ago
  I've been working on fine tuning testing, it's something I hope to set up for comparison against the Mac Studio and Framework Desktop clusters soon.
cat_plus_plus 2 hours ago
I have a slightly cheaper similar box, NVIDIA Thor Dev Kit. The point is exactly to avoid deploying code to servers that cost half a million dollars each. It's quite capable in running or training smart LLMs like Qwen3-Next-80B-A3B-Instruct-NVFP4. So long as you don't tear your hair out first figuring out pecularities and fighting with bleeding edge nightly vLLM builds.
[-]
- echion 45 minutes ago
  > training smart LLMs like Qwen3-Next-80B-A3B-Instruct-NVFP4
  Sounds interesting; can you suggest any good discussions of this (on the web)?
barelysapient 2 hours ago
Great article but would be nice to see how larger models work.
colordrops 4 hours ago
I assume they didn't fix the memory bandwidth pain point though.
[-]
- llm_nerd 4 hours ago
  The memory bandwidth limitation is baked into the GB10, and every vendor is going to be very similar there.
  I'm really curious to see how things shift when the M5 Ultra with "tensor" matmul functionality in the GPU cores rolls out. This should be a multiples speed up of that platform.
  [-]
  - storus 4 hours ago
    My guess is M5 Ultra will be like DGX Spark for token prefill and M3 Ultra for token generation, i.e. the best of both worlds, at FP4. Right now you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation part; with M5U that should no longer be necessary. However given RAM prices situation I am wondering if M5U will ever get close to the price/performance of Spark + M3U we have right now.
    [-]
    - echion 44 minutes ago
      > you can combine Spark with M3U, the former streaming the compute, lowering TTFT, the latter doing the token generation part
      Are you doing this with vLLM, or some other model-running library/setup?
      [-]
      - coder543 41 minutes ago
        They're probably referencing this article: https://blog.exolabs.net/nvidia-dgx-spark/
- cat_plus_plus 2 hours ago
  At least for transformers, it can be kind of fixed with MOE + NVFP4 for small working set despite large resident size.