Show HN: Droidrun – LLM Agent for Android

1 points by nodueck 11 hours ago

Hi HN,

I'm Nikolai, software engineer and co-founder at DroidRun. We built DroidRun, an LLM-based agent that leverages the Android Accessibility Tree for precise control and understanding of UI elements. It works on real phones and emulators, and it's open source.

How it started:

Our co-founder Niels Schmidt (you’ll see him in the demos) coded a prototype and shared a quick video. It went viral, about 50k views on X in under 2 hours. That moment pushed us to go all-in on DroidRun and soon after, we open-sourced it.

How it works:

Most agents rely on screenshots alone for context. We do that plus feed the Accessibility Tree into the LLM. That gives structural, hierarchical, and spatial metadata about UI elements.

Here’s an example:

Screenshot of a real UI: https://imgur.com/a/ePRLpyv

And a matching accessibility JSON snippet:

  {
    "index": 3,
    "resourceId": "com.android.settings:id\\/search_action_bar",
    "className": "LinearLayout",
    "text": "search_action_bar",
    "bounds": "42, 149, 1038, 338",
    "children": [
      {
        "index": 4,
        "resourceId": "com.android.settings:id\\/search_bar_title",
        "className": "TextView",
        "text": "In Einstellungen suchen",
        "bounds": "189, 205, 768, 282",
        "children": []
      }
    ]
  }
We also annotate UI regions in screenshots with numbers, then match them in the tree. This structure gives the agent a deep understanding of what’s on screen, even across different device types like tablets.

This allows for better generalization across devices and screen sizes. Agents can act with greater confidence and fewer hallucinations.

Current Status:

- Ranked #1 on AndroidWorld until recently (it became highly competitive)

- Supports real devices + Emulators

- Strong performance on simple and complex UI tasks

- Gemini 2.5 Pro works best so far, but we’re iterating fast

What's next:

We’re working on a cloud platform where you can run prompts on Android devices without setup. Think of LLM controlling a phone in the cloud, ready to test your automations.

Looking for:

- Feedback from HN

- Collaborators who love Android, LLMs, agents

- OSS contributors