Sharing actual GPU core and VRAM utilization metrics for query on 10 LLM models

medicis123 9 hours ago

We ran it on WoolyAI Acceleration Service https://docs.woolyai.com/getting-started/running-your-first-...

There are some interesting insights just looking at these numbers.

Environment Details Wooly Client: Linux non-GPU container running PyTorch scripts for all ten models Models were downloaded using Hugging Face Transformers library from vendor-specific repositories. Each model was executed 20 times using the same script to collect average Wooly Credits for both CPU and VRAM usage. Models Tested Llama-3.2-1B Llama-3.2-1B-Instruct Llama-3.2-3B Llama-3.2-3B-Instruct Mistral-7B-Instruct Falcon3-7B-Instruct Llama-3.1-8B-Instruct Llama-3.1-8B Dolly-v2-12B Llama-2-13B-Chat-HF Pytorch Script

from transformers import AutoTokenizer, AutoModelForCausalLM import torch torch.manual_seed(100000) # Model name or path model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda") # Input text input_text = "What is the capital of United States of America" # Tokenize input text inputs = tokenizer(input_text, return_tensors="pt").to(model.device) # Decode and print output for z in range (1, 10): outputs = model.generate(*inputs, max_length=100) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) GPU Core and Memory utilization Metrics

Llama-3.2-1B Core Wooly Credits Used - 46000 VRAM Wooly Credits Used - 31072 Llama-3.2-1B-Instruct Core Wooly Credits Used - 94868 VRAM Wooly Credits Used - 60964 Llama-3.2-3B Core Wooly Credits Used - 195936 VRAM Wooly Credits Used - 84715 Llama-3.2-3B-Instruct Core Wooly Credits Used - 502448 VRAM Wooly Credits Used - 258125 Mistral-7b-instr Core Wooly Credits Used - 525689 VRAM Wooly Credits Used - 397181 Falcon3-7B-Instruc Core Wooly Credits Used - 136094 VRAM Wooly Credits Used - 26528 Llama-3.1-8B Core Wooly Credits Used - 283458 VRAM Wooly Credits Used - 167515 Llama-3.1-8B-Instruct Core Wooly Credits Used - 574872 VRAM Wooly Credits Used - 403934 Dolly-v2-12b Core Wooly Credits Used - 767108 VRAM Wooly Credits Used - 342877 Llama-2-13b-chat-hf Core Wooly Credits Used - 313809 VRAM Wooly Credits Used -120067