There are some interesting insights just looking at these numbers.
Environment Details
Wooly Client: Linux non-GPU container running PyTorch scripts for all ten models
Models were downloaded using Hugging Face Transformers library from vendor-specific repositories.
Each model was executed 20 times using the same script to collect average Wooly Credits for both CPU and VRAM usage.
Models Tested
Llama-3.2-1B
Llama-3.2-1B-Instruct
Llama-3.2-3B
Llama-3.2-3B-Instruct
Mistral-7B-Instruct
Falcon3-7B-Instruct
Llama-3.1-8B-Instruct
Llama-3.1-8B
Dolly-v2-12B
Llama-2-13B-Chat-HF
Pytorch Script
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
torch.manual_seed(100000)
# Model name or path
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda")
# Input text
input_text = "What is the capital of United States of America"
# Tokenize input text
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Decode and print output
for z in range (1, 10):
outputs = model.generate(*inputs, max_length=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
GPU Core and Memory utilization Metrics
Llama-3.2-1B Core Wooly Credits Used - 46000 VRAM Wooly Credits Used - 31072
Llama-3.2-1B-Instruct Core Wooly Credits Used - 94868 VRAM Wooly Credits Used - 60964
Llama-3.2-3B Core Wooly Credits Used - 195936 VRAM Wooly Credits Used - 84715
Llama-3.2-3B-Instruct Core Wooly Credits Used - 502448 VRAM Wooly Credits Used - 258125
Mistral-7b-instr Core Wooly Credits Used - 525689 VRAM Wooly Credits Used - 397181
Falcon3-7B-Instruc Core Wooly Credits Used - 136094 VRAM Wooly Credits Used - 26528
Llama-3.1-8B Core Wooly Credits Used - 283458 VRAM Wooly Credits Used - 167515
Llama-3.1-8B-Instruct Core Wooly Credits Used - 574872 VRAM Wooly Credits Used - 403934
Dolly-v2-12b Core Wooly Credits Used - 767108 VRAM Wooly Credits Used - 342877
Llama-2-13b-chat-hf Core Wooly Credits Used - 313809 VRAM Wooly Credits Used -120067
We ran it on WoolyAI Acceleration Service https://docs.woolyai.com/getting-started/running-your-first-...
There are some interesting insights just looking at these numbers.
Environment Details Wooly Client: Linux non-GPU container running PyTorch scripts for all ten models Models were downloaded using Hugging Face Transformers library from vendor-specific repositories. Each model was executed 20 times using the same script to collect average Wooly Credits for both CPU and VRAM usage. Models Tested Llama-3.2-1B Llama-3.2-1B-Instruct Llama-3.2-3B Llama-3.2-3B-Instruct Mistral-7B-Instruct Falcon3-7B-Instruct Llama-3.1-8B-Instruct Llama-3.1-8B Dolly-v2-12B Llama-2-13B-Chat-HF Pytorch Script
from transformers import AutoTokenizer, AutoModelForCausalLM import torch torch.manual_seed(100000) # Model name or path model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda") # Input text input_text = "What is the capital of United States of America" # Tokenize input text inputs = tokenizer(input_text, return_tensors="pt").to(model.device) # Decode and print output for z in range (1, 10): outputs = model.generate(*inputs, max_length=100) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) GPU Core and Memory utilization Metrics
Llama-3.2-1B Core Wooly Credits Used - 46000 VRAM Wooly Credits Used - 31072 Llama-3.2-1B-Instruct Core Wooly Credits Used - 94868 VRAM Wooly Credits Used - 60964 Llama-3.2-3B Core Wooly Credits Used - 195936 VRAM Wooly Credits Used - 84715 Llama-3.2-3B-Instruct Core Wooly Credits Used - 502448 VRAM Wooly Credits Used - 258125 Mistral-7b-instr Core Wooly Credits Used - 525689 VRAM Wooly Credits Used - 397181 Falcon3-7B-Instruc Core Wooly Credits Used - 136094 VRAM Wooly Credits Used - 26528 Llama-3.1-8B Core Wooly Credits Used - 283458 VRAM Wooly Credits Used - 167515 Llama-3.1-8B-Instruct Core Wooly Credits Used - 574872 VRAM Wooly Credits Used - 403934 Dolly-v2-12b Core Wooly Credits Used - 767108 VRAM Wooly Credits Used - 342877 Llama-2-13b-chat-hf Core Wooly Credits Used - 313809 VRAM Wooly Credits Used -120067
[flagged]