Show HN: ToolFuzz – Automated Testing for Agent Tools

github.com

1 points by imilev 7 hours ago

Hi HN ,

I’m Ivan, and I’ve developed ToolFuzz – an Agent Tool Fuzzing framework as part of my thesis (with the help of my advisors). I really wanted ToolFuzz to be a tool for people to use and so it is available as an open-source library for everyone to use. The repository can be found on github - https://github.com/eth-sri/ToolFuzz

What ToolFuzz does? ToolFuzz detects two types of errors in agent tools:

1. Runtime failures – Using taint analysis and fuzzing, ToolFuzz generates problematic arguments that trigger exceptions. Using those args then we use LLMs to generate prompts which might lead to problematic tool invocations.

2. Correctness failures – Generating synonymous prompts and validating consistency in the tool return values and the agent's responses. If results are inconsistent, that’s a strong signal of a correctness issue.

Would love to hear your feedback on this!

My Background:

Hey all, I am a master student studying Machine Intelligence in ETH Zurich. I have always been super interested in software testing/programming languages/software engineering and for the last two years AI/ML/AI Agents. When it came time to do my thesis I wanted to do something related to testing and AI agents as they are the hype at the moment (I am also super hyped about agentic systems in general). This said I found a super cool topic - testing agent tools. You see at the moment there are a bunch of benchmarks, but no real testing framework as this is more of a software engineering topic (my guess), which made it perfect for me. I wanted to do something which is more applied and can be useful for people who build actual software and less of a pure research. So this is why and how ToolFuzz was made.

I hope you’ll try it out and share your feedback! My goal is for this to be more than a scientific paper/thesis, but a tool that’s genuinely useful to people. All feedback is greatly appreciated – this is my first time releasing something for others to use, and I’m really excited to interact with everyone!