Amazingly generous that it’s open source. Let’s hope the author can keep building it, but if they need to fund their existence there is precedent - lots of folks pay for Superwhisper. People pay for quality software.
In a past tech cycle Apple might’ve hired the author, acquired the IP and lovingly stewarded the work into a bundled OS app. Not something to hope for lately. So just going to hope the app lives for years to come and keeps improving the whole way.
Yep, I think the guy who’ll make the GUI for LLMs is the next Jobs/Gates/Musk and Nobel Prize Winner (I think it’ll solve alignment by having millions of eyes on the internals of LLMs), because computers became popular only after the OS with a GUI appeared. I just started ASK HN to let people and me share their AI safety ideas, both crazy and not: https://news.ycombinator.com/item?id=43332593
What you describe seems to be OpenAI’s “moat”. They currently are the farthest ahead in app UX for nontechnical users and in Brand Recognition. It doesn’t matter if they are 10% behind Anthropic in frontier model quality if Claude For Mac is a shitty Electron app.
I miscommunicated( I meant new 3D game-like UIs. There can be a whole new OS full apps that represent multimodal LLMs in human-familiar ways. All the UI now is what I consider commandline-like. They are like a strict librarian that only spits quotes, no one allows you to truly enter the library. We need better 3D and even “4D” long exposure photo like UIs
Claude for Mac works quite well for me, and now with these MCP servers looking better than ever. And regarding electron - I have seen (and created for myself) awesome apps that would never existed without it.
I miscommunicated( I meant new 3D game-like UIs. There can be a whole new OS full apps that represent multimodal LLMs in human-familiar ways. All the UI now is what I consider commandline-like. They are like a strict librarian that only spits quotes, no one allows you to truly enter the library. We need better 3D and even “4D” long exposure photo like UIs
I got what you mean. People have said cryptocurrencies are one UX revolution away from mainstream adoption since its inception. The reality was/is, it’s a solution in search of a problem.
Looks great, kudos for making it open-source! Yet as with any app that has access to my local file system, what instantly comes to mind is "narrow permissions" / principle of least permissions.
It'd be great if the app would only have read access to my files, not full disk permission.
As an end-user, I'm highly concerned that files might get deleted or data shared via the internet.
So ideally, Sidekick would have only "read" permissions and no internet access. (This applies really to any app with full disk read access).
Also: why does it say Mac Silicon required?
I can run Llama.cpp and Ollama on my intel mac.
> It'd be great if the app would only have read access to my files, not full disk permission.
I'm running it right now and macOS didn't ask for any permissions at all which afaik means that it cannot access most of my personal folders and definitely not the full disk. Am I missing something?
> I'm running it right now and macOS didn't ask for any permissions at all which afaik means that it cannot access most of my personal folders and definitely not the full disk. Am I missing something?
This is only true if it uses App Sandbox, which is mandatory for apps distributed through the App Store, but not necessarily everything else
Neat. It would be nice to provide an option to use an API endpoint without downloading an additional local model. I have several models downloaded via ollama and would prefer to use them without additional space being taken up by the default model.
Somewhat related, one issue i have with projects like these is it appears like everyone is bundling the UX/App with the core ... pardon my ignorance, "LLM App interface". Eg We have a lot of abstractions for LLMs themselves such as Llama.cpp, but it feels like we lack abstractions for things like what Claude Code does, or perhaps this RAG impl, or whatever.
Ie these days it seems like a lot of the magic in a quality implementation is built on top of a good LLM. A secondary layer which is just as important as the LLM itself. The prompt engineering, etc.
Are there any attempts to generalize this? Is it even possible? Feels like i keep seeing a lot of good ideas which get locked behind an app wall and no ability to switch them out. We've got tons of options to abstract the LLMs themselves, but i've not seen anything which tackles this (but i've also not been looking).
I'm not sure there's a market in LLM "middleware" per se. Look at the market segments:
• B2C: wants vertically-integrated tools that provide "middleware" plus interface. Doesn't want to dick around. Often integrates their own service layer as well (see e.g. "character chat" apps); but if not, aims for a backend-integration experience that "knows about" the quirks of each service+model family, effectively commoditizing them. The ultimate aim of any service-generic app of this type is likely to provide an "subscription store" where you purchase subscriptions to inference services through the app, never visiting the service provider itself.
• B2B (think "using agents to drive pseudo-HFT bots for trades in fancy financial instruments that can't be arbitraged through dumb heuristics"): has a defined use-case and wants to control every detail of both "middleware" and backend together. Vertically integrates their own solution — on-prem inference cluster + custom-patched inference engine + business logic that synthesizes the entire prompt at every step. Doesn't bother with the "chat" abstraction other than as part of several-shot prompting.
• B2B2C: wants "scaling an inference cluster + engine + model deployment" to be Somebody Else's Problem; thinks of an "app agent experience" as the deliverable they want to enable their business customers to achieve through their product or service; and thus thinks of "middleware" as their problem / secret sauce — the thing they will build to enable "app agent experiences" to be created with the least business-customer effort possible. The "middleware" is where these B2B2C businesses see themselves making money. Thus, these B2B2C businesses aren't interested in paying some other middleman for a hosted "generic middleware framework as a service" solution; they're interested in being the only middleman, that captures all of the margin. They're interested in library frameworks they can directly integrate into their business layer.
---
For an analogy, think of the "middleware" of an "easy website builder" service like Squarespace/Wix/etc. You can certainly find vertically-integrated website-builder services; and you can also find slightly-lower-level library components to do what the "middleware part" of these website-builder services do. But you can't find full-on website-builder frameworks (powerful enough that the website-builder services actually use them) — let alone a white-labelable headless-CMS + frontend library "website builder builder" — let alone again, a white-labelable headless-CMS "website builder builder" that doesn't host its own data, but lets you supply your own backend.
Why?
Because B2C businesses just want Squarespace itself (a vertically-integrated solution); B2B businesses don't want an "easy website builder", they want a full-on web-app framework that allows them to control both the frontend and backend; and B2B2C businesses want to be "the Squarespace of X" for some vertical X, using high-ish-level libraries to build the highest-level website-building functionality, while keeping all of that highest-level glue code to themselves, as their proprietary "secret sauce." (Because if they didn't keep that highest-level code proprietary, it would function as a "start your own competitor to our service in one easy step" kit!)
---
The only time when the "refined and knowledge-enriched middleware abstraction layer -as-a-Service — but backend-agnostic!" approach tends to come up, is to serve the use-case of businesspeople within B2B orgs, who want to be able to ask high-level questions or drive high-level operations without first needing to get a bespoke solution built by the engineering arm of said org. This is BI software (PowerBI), ERP software (NetSuite), CRM software (Salesforce), etc.
The weird / unique thing about LLMs, is that I don't think they... need this? The "thing about AI", is precisely that you can simply sit an executive in front of a completely-generic base-model chat prompt, and they can talk their way into getting it to do what they want — without an engineer there to gather + formalize their requirements. (Which is not to say that the executive can get the LLM to build software, correctly, to answer their question; but rather, that the executive can ask questions that invoke the agent's inbuilt knowledge and capabilities to — at least much of the time — directly answer the executive's question.)
For LLMs, the "in-context learning" capability mostly replaces "institutional knowledge burned into a generic middleware." Your generic base-model won't know everything your domain-specialist employees know — but, through conversation, it will at least be able to know what you know, and work with that. Which is usually enough. (At least, if your goal was to get something done on your own without bothering people who have better things to be doing than translating your question into SQL. If your goal is to work around the need for domain expertise, though... well, I don't think any "middleware" is going to help you there.)
In short: the LLM B2C use-case is also the LLM "B2Exec" use-case — they're both most-intuitively solved through vertical integration "upward" into the backend service layer. (Which is exactly why there was a wave of meetings last week, of businesspeople asking whether they could somehow share a single ChatGPT Pro $200/mo subscription across their team/org.)
When I bought my new MBP, I was wondering whether to just upgrade the memory to 48GB thinking that it will become more likely that I will run local models in the next 3-4 year cycle of this laptop. So I took the leap and just upgraded the memory.
Hoping that these kinds of tools will run well in these scenarios.
Just downloaded it and mucked about. It definitely works without the cloud, because it works while I'm offline. Looking at the code, it looks like an opt-in feature where you can provide your API key to Tavily.
That said, it seems built toward "Cheat on your homework" and doesn't reliably surface information from my notes, so I uninstalled it.
Looks nice, and I greatly appreciate the local only or local first mode.
The readme says:
> Give the LLM access to your folders, files and websites with just 1 click, allowing them to reply with context.
…
> Context aware. Aware of your files, folders and content on the web.
Am I right in assuming that this works only with local text files and that it cannot integrate with data sources in Apple’s apps such as Notes, Reminders, etc.? It could be a great competitor to Apple Intelligence if it could integrate with apps that primarily store textual information (but unfortunately in their own proprietary data formats on disk and with sandboxing adding another barrier).
Can it use and search PDFs, RTF files and other formats as “experts”?
Some interesting features. I'm working on similar native app with Qt so it will support Linux, macOS and Windows out of the box. I might open source it as well.
What differentiates this from Open WebUI? How did you design the RAG pipeline?
I had a project in the past where I had hundreds of PDF / HTML files of industry safety and fatality reports which I was hoping to simply "throw in" and use with Open WebUI, but I found it wasn't effective at this even in RAG mode. I wanted to ask it questions like "How many fatalities occurred in 2020 that involved heavy machinery?", but it wasn't able to provide such broad aggregate data.
Ultimately, the quality of OCR on PDF is where we are bottlenecked as an industry. And not just in text characters but understanding and feeding to the LLM structured object relationships as we see in tables and graphs. Intuitive for a human, very error prone for RAG.
That's a real issue, but that's masking some of the issues further downstream, like chunking and other context-related problems. There are some clever proposals to make this work, including some of the stuff from Anthropic and Jina. But as far as I can tell, these haven't been tested thoroughly because everyone is hung up at the OCR step (as you identified).
For my purposes, all of the data was also available in HTML format, so the OCR wasn't a problem. I think the issue is the RAG pipeline doesn't take the entire corpus of knowledge into its context when making a response, but uses an index to find one or more relevant documents that it believes are relevant, then uses that small subset as part of the input.
I'm not sure there's a way to get what a lot of people want RAG to be without actually training the model on all of your data, so they can "chat with it" similar to how you can ask ChatGPT about random facts about almost any publicly available information. But I'm not an expert.
I've also observed this issue and I wonder where the industry is on it. There seem to be a lot of claims that a given approach will work here, but not a lot of provably working use cases.
I was going to say the same thing. It had so many cool tools. A calculator, ascii chart, notepad, calendar. And the whole idea of a tsr opened a door in my head which hadn't seen multiple programs running at the same time till then.
Great work! Please consider a plugin mode to support integrating with Dropbox, S3 compatible targets, where users might be storing large amounts of data off device (but still device accessible), as well as email providers via IMAP/JMAP.
I've been looking for something like this to query / interface with the mountain of home appliance manuals I've hung onto as PDFs - use case being that instead of having to fish out and read a manual once something breaks, I can just chat with the corpus to quickly find what I need to fix something. Will give it a shot!
Does anyone know if there is something like this or https://github.com/kevinhermawan/Ollamac for linux ... both are build with swift and swift also supports linux!?
Very cool, trying it out, I'm unable to make it do a search tho, on the experts it says it's deactivated on the settings but I couldn't find a setting for it, maybe it's model dependent and the default model can't do it?
Nice, just needs a computer/browser use mode and thinking/agent mode. e.g. "Test this web app for me. Try creating a new account and starting a new order" etc.
looks like an awesome tool!
I just found it funny that in code interpreter demo, javascript is used to evaluate mathematical problems (especially the float comparison)
Trying to put this through its paces, I first set out to build my own local binary (because why not, and also because code-reading is fun when you've got your own local build) ..
But I get this far:
/Users/aa-jv/Development/InterestingProjects/Sidekick/Sidekick/Logic/View Controllers/Tools/Slide Studio/Resources/bin/marp: No such file or directory
It seems there is a hand-built binary resource missing from the repo - did anyone else do a build yet, and get past this step?
Yeah, I've manually copied that binary into place from the marp-cli package in homebrew and now the build proceeds .. continuing as I type .. lets see what happens.
I'm immediately suspicious of such binary resources, however.
I doubt my customer - on whose proprietary code I want to try running LLMs - cares :)
Or to rephrase: would you go to court with the contents of that link as evidence that you haven't inadvertently published someone else's proprietary data in some external database?
I don't think Apple has missed out on much (yet). The best LLM's (e.g. gpt4o, sonnet 3.7) are no where near being able to run locally and still make mistakes.
Some LLMs can run locally, but are brutally slow with small context windows.
Apple is likely waiting until you can run a really good model on device (i.e. iOS), which makes sense to me. It's not like they're losing customers over this right now.
They are playing the long game, which is what has always been: wait until the silicon enables that for most users. The Apple Silicon track record suggests that... wait a couple of years and we'll get M3-Ultra-class capabilities in all of Apple devices. Some day the lowest bar will be above running state of the art LLMs on device.
Just checked some Genmojis created on reddit, wow, i don't know how that got approved. I'm all for creativity and freedom but it's 100% not apples brand.
And they just postponed AI-Siri to 2026 after promising it for iPhone 16.
I seriously don't get how it can be that hard. Small model trained on various app API's, a checker model that double checks, an approve this action button. Not that hard.
What an great looking tool.
Amazingly generous that it’s open source. Let’s hope the author can keep building it, but if they need to fund their existence there is precedent - lots of folks pay for Superwhisper. People pay for quality software.
In a past tech cycle Apple might’ve hired the author, acquired the IP and lovingly stewarded the work into a bundled OS app. Not something to hope for lately. So just going to hope the app lives for years to come and keeps improving the whole way.
Yep, I think the guy who’ll make the GUI for LLMs is the next Jobs/Gates/Musk and Nobel Prize Winner (I think it’ll solve alignment by having millions of eyes on the internals of LLMs), because computers became popular only after the OS with a GUI appeared. I just started ASK HN to let people and me share their AI safety ideas, both crazy and not: https://news.ycombinator.com/item?id=43332593
What you describe seems to be OpenAI’s “moat”. They currently are the farthest ahead in app UX for nontechnical users and in Brand Recognition. It doesn’t matter if they are 10% behind Anthropic in frontier model quality if Claude For Mac is a shitty Electron app.
I miscommunicated( I meant new 3D game-like UIs. There can be a whole new OS full apps that represent multimodal LLMs in human-familiar ways. All the UI now is what I consider commandline-like. They are like a strict librarian that only spits quotes, no one allows you to truly enter the library. We need better 3D and even “4D” long exposure photo like UIs
Claude for Mac works quite well for me, and now with these MCP servers looking better than ever. And regarding electron - I have seen (and created for myself) awesome apps that would never existed without it.
What in your mind is the main advantage for using the app over the web site?
The only thing missing from the MCP component of Claude Desktop is a better interface for discovering, enabling and configuring different MCP servers.
The ideal UX isn’t a secret: audio with AR for context
I’m bullish on an AirPods-with-cameras experience
I’ve read the same thing about cryptocurrencies for so long (it needs a proper GUI to take off)
I miscommunicated( I meant new 3D game-like UIs. There can be a whole new OS full apps that represent multimodal LLMs in human-familiar ways. All the UI now is what I consider commandline-like. They are like a strict librarian that only spits quotes, no one allows you to truly enter the library. We need better 3D and even “4D” long exposure photo like UIs
I got what you mean. People have said cryptocurrencies are one UX revolution away from mainstream adoption since its inception. The reality was/is, it’s a solution in search of a problem.
Who said it and how it relates to what I wrote? You’re majorly straw-manning what I proposed
Looks great, kudos for making it open-source! Yet as with any app that has access to my local file system, what instantly comes to mind is "narrow permissions" / principle of least permissions.
It'd be great if the app would only have read access to my files, not full disk permission.
As an end-user, I'm highly concerned that files might get deleted or data shared via the internet.
So ideally, Sidekick would have only "read" permissions and no internet access. (This applies really to any app with full disk read access).
Also: why does it say Mac Silicon required? I can run Llama.cpp and Ollama on my intel mac.
> It'd be great if the app would only have read access to my files, not full disk permission.
I'm running it right now and macOS didn't ask for any permissions at all which afaik means that it cannot access most of my personal folders and definitely not the full disk. Am I missing something?
> I'm running it right now and macOS didn't ask for any permissions at all which afaik means that it cannot access most of my personal folders and definitely not the full disk. Am I missing something?
This is only true if it uses App Sandbox, which is mandatory for apps distributed through the App Store, but not necessarily everything else
Comments like this are what turn me off about this website. Entitled, much?
Apart from the M1 comment, I think the commenter voices a reasonable concern and isn't hostile or antagonistic
He complimented the app and made a decent suggestion..
Neat. It would be nice to provide an option to use an API endpoint without downloading an additional local model. I have several models downloaded via ollama and would prefer to use them without additional space being taken up by the default model.
From the README:
Optionally, offload generation to speed up generation while extending the battery life of your MacBook.
Screenshot shows example, mentions OpenAI and gpt-4o.
Looks super neat!
Somewhat related, one issue i have with projects like these is it appears like everyone is bundling the UX/App with the core ... pardon my ignorance, "LLM App interface". Eg We have a lot of abstractions for LLMs themselves such as Llama.cpp, but it feels like we lack abstractions for things like what Claude Code does, or perhaps this RAG impl, or whatever.
Ie these days it seems like a lot of the magic in a quality implementation is built on top of a good LLM. A secondary layer which is just as important as the LLM itself. The prompt engineering, etc.
Are there any attempts to generalize this? Is it even possible? Feels like i keep seeing a lot of good ideas which get locked behind an app wall and no ability to switch them out. We've got tons of options to abstract the LLMs themselves, but i've not seen anything which tackles this (but i've also not been looking).
Does it exist? Does this area have a name?
On MacOS, look at things like Msty.app (and of course LM Studio)?
They are pluggable across more than just LLM itself.
I went with msty because I didnt want to run docker and it's been rock solid for my needs.
I'm not sure there's a market in LLM "middleware" per se. Look at the market segments:
• B2C: wants vertically-integrated tools that provide "middleware" plus interface. Doesn't want to dick around. Often integrates their own service layer as well (see e.g. "character chat" apps); but if not, aims for a backend-integration experience that "knows about" the quirks of each service+model family, effectively commoditizing them. The ultimate aim of any service-generic app of this type is likely to provide an "subscription store" where you purchase subscriptions to inference services through the app, never visiting the service provider itself.
• B2B (think "using agents to drive pseudo-HFT bots for trades in fancy financial instruments that can't be arbitraged through dumb heuristics"): has a defined use-case and wants to control every detail of both "middleware" and backend together. Vertically integrates their own solution — on-prem inference cluster + custom-patched inference engine + business logic that synthesizes the entire prompt at every step. Doesn't bother with the "chat" abstraction other than as part of several-shot prompting.
• B2B2C: wants "scaling an inference cluster + engine + model deployment" to be Somebody Else's Problem; thinks of an "app agent experience" as the deliverable they want to enable their business customers to achieve through their product or service; and thus thinks of "middleware" as their problem / secret sauce — the thing they will build to enable "app agent experiences" to be created with the least business-customer effort possible. The "middleware" is where these B2B2C businesses see themselves making money. Thus, these B2B2C businesses aren't interested in paying some other middleman for a hosted "generic middleware framework as a service" solution; they're interested in being the only middleman, that captures all of the margin. They're interested in library frameworks they can directly integrate into their business layer.
---
For an analogy, think of the "middleware" of an "easy website builder" service like Squarespace/Wix/etc. You can certainly find vertically-integrated website-builder services; and you can also find slightly-lower-level library components to do what the "middleware part" of these website-builder services do. But you can't find full-on website-builder frameworks (powerful enough that the website-builder services actually use them) — let alone a white-labelable headless-CMS + frontend library "website builder builder" — let alone again, a white-labelable headless-CMS "website builder builder" that doesn't host its own data, but lets you supply your own backend.
Why?
Because B2C businesses just want Squarespace itself (a vertically-integrated solution); B2B businesses don't want an "easy website builder", they want a full-on web-app framework that allows them to control both the frontend and backend; and B2B2C businesses want to be "the Squarespace of X" for some vertical X, using high-ish-level libraries to build the highest-level website-building functionality, while keeping all of that highest-level glue code to themselves, as their proprietary "secret sauce." (Because if they didn't keep that highest-level code proprietary, it would function as a "start your own competitor to our service in one easy step" kit!)
---
The only time when the "refined and knowledge-enriched middleware abstraction layer -as-a-Service — but backend-agnostic!" approach tends to come up, is to serve the use-case of businesspeople within B2B orgs, who want to be able to ask high-level questions or drive high-level operations without first needing to get a bespoke solution built by the engineering arm of said org. This is BI software (PowerBI), ERP software (NetSuite), CRM software (Salesforce), etc.
The weird / unique thing about LLMs, is that I don't think they... need this? The "thing about AI", is precisely that you can simply sit an executive in front of a completely-generic base-model chat prompt, and they can talk their way into getting it to do what they want — without an engineer there to gather + formalize their requirements. (Which is not to say that the executive can get the LLM to build software, correctly, to answer their question; but rather, that the executive can ask questions that invoke the agent's inbuilt knowledge and capabilities to — at least much of the time — directly answer the executive's question.)
For LLMs, the "in-context learning" capability mostly replaces "institutional knowledge burned into a generic middleware." Your generic base-model won't know everything your domain-specialist employees know — but, through conversation, it will at least be able to know what you know, and work with that. Which is usually enough. (At least, if your goal was to get something done on your own without bothering people who have better things to be doing than translating your question into SQL. If your goal is to work around the need for domain expertise, though... well, I don't think any "middleware" is going to help you there.)
In short: the LLM B2C use-case is also the LLM "B2Exec" use-case — they're both most-intuitively solved through vertical integration "upward" into the backend service layer. (Which is exactly why there was a wave of meetings last week, of businesspeople asking whether they could somehow share a single ChatGPT Pro $200/mo subscription across their team/org.)
When I bought my new MBP, I was wondering whether to just upgrade the memory to 48GB thinking that it will become more likely that I will run local models in the next 3-4 year cycle of this laptop. So I took the leap and just upgraded the memory.
Hoping that these kinds of tools will run well in these scenarios.
some other alternatives (a little more mature / feature rich):
anythingllm https://github.com/Mintplex-Labs/anything-llm
openwebui https://github.com/open-webui/open-webui
lmstudio https://lmstudio.ai/
Any recommendations wrt resources that compare these or provides more context in that regard?
Does it actually highlight the cites when opening cite docs? Or were the highlights in the screen shot just there by chance?
This is not local, but uses the Tavily cloud (https://tavily.com/) ?!
Sure seems to. https://github.com/search?q=repo%3Ajohnbean393%2FSidekick%20...
This is getting murky.
As a suggestion to the author, please try to make it verifiably local only with an easy to set option.
Tavily search is an option, disabled by default.
Maybe the author could make a note of that in the README.
Just downloaded it and mucked about. It definitely works without the cloud, because it works while I'm offline. Looking at the code, it looks like an opt-in feature where you can provide your API key to Tavily.
That said, it seems built toward "Cheat on your homework" and doesn't reliably surface information from my notes, so I uninstalled it.
Looks nice, and I greatly appreciate the local only or local first mode.
The readme says:
> Give the LLM access to your folders, files and websites with just 1 click, allowing them to reply with context.
…
> Context aware. Aware of your files, folders and content on the web.
Am I right in assuming that this works only with local text files and that it cannot integrate with data sources in Apple’s apps such as Notes, Reminders, etc.? It could be a great competitor to Apple Intelligence if it could integrate with apps that primarily store textual information (but unfortunately in their own proprietary data formats on disk and with sandboxing adding another barrier).
Can it use and search PDFs, RTF files and other formats as “experts”?
The Apple data you mention has APIs for feeding them in to LLMs if you wish. Someone just has to write it.
(I wrote one of those Apple API SDKs)
> Am I right in assuming that this works only with local text files
One of the screen shots shows a .xlsx in the “Temporary Resources” area.
Also: I haven’t checked, but for a “Local-first” app, I would expect it to leverage Spotlight text importers from the OS, and run something like
on files it can’t natively process.Some interesting features. I'm working on similar native app with Qt so it will support Linux, macOS and Windows out of the box. I might open source it as well.
https://www.get-vox.com
What differentiates this from Open WebUI? How did you design the RAG pipeline?
I had a project in the past where I had hundreds of PDF / HTML files of industry safety and fatality reports which I was hoping to simply "throw in" and use with Open WebUI, but I found it wasn't effective at this even in RAG mode. I wanted to ask it questions like "How many fatalities occurred in 2020 that involved heavy machinery?", but it wasn't able to provide such broad aggregate data.
I think this is a fundamental issue with naive RAG implementations: they aren't accurate enough for pretty much anything
Ultimately, the quality of OCR on PDF is where we are bottlenecked as an industry. And not just in text characters but understanding and feeding to the LLM structured object relationships as we see in tables and graphs. Intuitive for a human, very error prone for RAG.
That's a real issue, but that's masking some of the issues further downstream, like chunking and other context-related problems. There are some clever proposals to make this work, including some of the stuff from Anthropic and Jina. But as far as I can tell, these haven't been tested thoroughly because everyone is hung up at the OCR step (as you identified).
For my purposes, all of the data was also available in HTML format, so the OCR wasn't a problem. I think the issue is the RAG pipeline doesn't take the entire corpus of knowledge into its context when making a response, but uses an index to find one or more relevant documents that it believes are relevant, then uses that small subset as part of the input.
I'm not sure there's a way to get what a lot of people want RAG to be without actually training the model on all of your data, so they can "chat with it" similar to how you can ask ChatGPT about random facts about almost any publicly available information. But I'm not an expert.
I've also observed this issue and I wonder where the industry is on it. There seem to be a lot of claims that a given approach will work here, but not a lot of provably working use cases.
The name gave me a flashback to Borland Sidekick
Was this the MS-DOS TSR app that kept running in the background and you could invoke at any time? Fond memories!
I was going to say the same thing. It had so many cool tools. A calculator, ascii chart, notepad, calendar. And the whole idea of a tsr opened a door in my head which hadn't seen multiple programs running at the same time till then.
That's the one. Nifty little program.
I thought of the phone with the spinning screen from the mid 2000s.
Great work! Please consider a plugin mode to support integrating with Dropbox, S3 compatible targets, where users might be storing large amounts of data off device (but still device accessible), as well as email providers via IMAP/JMAP.
I've been looking for something like this to query / interface with the mountain of home appliance manuals I've hung onto as PDFs - use case being that instead of having to fish out and read a manual once something breaks, I can just chat with the corpus to quickly find what I need to fix something. Will give it a shot!
An option to use a local LLM on network without needing to download the 2GB "default model" would be great
It's in the README https://github.com/johnbean393/Sidekick?tab=readme-ov-file#f...
Does anyone know if there is something like this or https://github.com/kevinhermawan/Ollamac for linux ... both are build with swift and swift also supports linux!?
Desktop-wise there's https://msty.app which is rather good but not open source. I'm using OpenWeb UI [1] with a desktop shortcut but that's a web app.
1 - https://github.com/open-webui/open-webui
https://jan.ai is open source and works on linux.
Very cool, trying it out, I'm unable to make it do a search tho, on the experts it says it's deactivated on the settings but I couldn't find a setting for it, maybe it's model dependent and the default model can't do it?
Nice, just needs a computer/browser use mode and thinking/agent mode. e.g. "Test this web app for me. Try creating a new account and starting a new order" etc.
This needs 164 MB of disk space. Not to bad. Thank you to the author for this.
That's just the binary. It needs at least another order of magnitude beyond that to download the model.
Why no MLX?
I think it uses Llama.cpp, which doesn't support MLX.
looks like an awesome tool! I just found it funny that in code interpreter demo, javascript is used to evaluate mathematical problems (especially the float comparison)
Does it support MCP?
Pretty slick, I've been using Ollama + https://github.com/kevinhermawan/Ollamac - not sure this provides much extra benefit. Still love to see it.
Looking forward to when there will be a broad llm api accessible in the browser via js
Very nice.
Trying to put this through its paces, I first set out to build my own local binary (because why not, and also because code-reading is fun when you've got your own local build) ..
But I get this far:
/Users/aa-jv/Development/InterestingProjects/Sidekick/Sidekick/Logic/View Controllers/Tools/Slide Studio/Resources/bin/marp: No such file or directory
It seems there is a hand-built binary resource missing from the repo - did anyone else do a build yet, and get past this step?
Looks like it wants https://github.com/marp-team/marp-cli
Yeah, I've manually copied that binary into place from the marp-cli package in homebrew and now the build proceeds .. continuing as I type .. lets see what happens.
I'm immediately suspicious of such binary resources, however.
> Image generation is availible on macOS 15.2 or above, and requires Apple Intelligence.
... so image generation is not fully offline?
This tool looks like it could be worth a try to me, but only if I'm sure I can run it into a mode that's fully offline.
Some features use private cloud, but it's pretty decent in terms of security and privacy.
https://security.apple.com/blog/private-cloud-compute/
"Pretty decent" is irellevant for proprietary code that's not my property.
The only safe option is if it's guaranteed to not leave my machine, so for this app, to disable anything that has a chance of exfiltrating data.
I encourage you to try to understand how verifiable transparency works.
https://security.apple.com/blog/pcc-security-research/
I doubt my customer - on whose proprietary code I want to try running LLMs - cares :)
Or to rephrase: would you go to court with the contents of that link as evidence that you haven't inadvertently published someone else's proprietary data in some external database?
Apple Intelligence image generation is fully offline
isn't apple intelligence image generation fully offline?
I don't know, I'm asking.
Only want it for some code so it looks like it can be fully offline, but it's worth being paranoid about it.
[delayed]
[flagged]
I don't think Apple has missed out on much (yet). The best LLM's (e.g. gpt4o, sonnet 3.7) are no where near being able to run locally and still make mistakes.
Some LLMs can run locally, but are brutally slow with small context windows.
Apple is likely waiting until you can run a really good model on device (i.e. iOS), which makes sense to me. It's not like they're losing customers over this right now.
They are playing the long game, which is what has always been: wait until the silicon enables that for most users. The Apple Silicon track record suggests that... wait a couple of years and we'll get M3-Ultra-class capabilities in all of Apple devices. Some day the lowest bar will be above running state of the art LLMs on device.
Siri hasn't run on device for most of its existence. It's only in the last few years that Apple suddenly decided it was a priority.
All they have to show is incremental improvements over Siri. For that, Quantized models are more than enough in my opinion.
Sonnet 3.7 best? That thing is a dumpster fire. Totally useless vs 3.5.
Just checked some Genmojis created on reddit, wow, i don't know how that got approved. I'm all for creativity and freedom but it's 100% not apples brand.
And they just postponed AI-Siri to 2026 after promising it for iPhone 16. I seriously don't get how it can be that hard. Small model trained on various app API's, a checker model that double checks, an approve this action button. Not that hard.
Really cool! I hope they'll roll out MCP support so that we can add support for it in our MCP app store (https://github.com/fleuristes/fleur)
Right now only code editors and Claude support MCPs, but we'd love to see more clients like Sidekick