It’s a nice post, but “using array methods” probably shouldn’t be placed in the “Efficient Implementation” section. As often happens with high-level languages, a single plain old loop is faster than three array methods.
Similarly, if you plan to query those vectors in search, you should consider continuous `TypedArray` types and smaller scalars than the double precision `number`.
Oh this is awesome! We actually have a really good use case for this, it should improve performance in our case a lot and help us stay away from WASM a bit longer. Thanks for sharing!
It sounds pretentious, but I think it comes from having multiple ways to compare things for similarity. Using the word similarity says what you are using the angle for. Angle similarity might work just as well until someone comes up with another angle based calculation. Cosine Similarity becomes a specific meaning indicating how it's caclulated and why it's calculated.
If you really wanted to be pretentious you could invent a Dimension Insensitive Euclidean Metric and call it DIEM to make it sound like you are putting some Latin into your papers.
It is used as a distance function, so the important thing it is not that it is an angle, but that it is a similarity function.
And I think today it is preferred to call it "similarity" instead of "distance" because it is not a true distance function (15 years ago we called it "cosine distance").
I have a feeling it might be because the dot product of two unit length vectors is equal to the cosine of the angle between them.. but that's just a wild guess
I think the point is they're often doing it on things that aren't strictly vectors by definition, as in the physics sense, rather just some sack of stuff organised like they might be a vector and assuming its valid to compare angles because its just something numerically without much bias that does the job.
I think what you meant to say is that they aren’t strictly geometric or “interpretable”. But they absolutely are vectors in the linear algebraic sense.
Great post, but what struck me (again, like every time I look at cos similiarity) is how unreasonably well it works. It's just one of those things that's so "weird" about our world: why would cosine similarity work in n-dimensional semantic spaces? It's so stupid simple, and it intuitively makes sense, and it works really well. Crazy cool.
I'm reminded of that old Eugene Wigner quote: "The most incomprehensible thing about the universe is that it is comprehensible."
That cosine distance works at all as a concept isn't terribly shocking, especially given our habit of norming everything. Cosine similarity in a unit vector space is monotonic with euclidean distance, and we're using this stuff in "select the K most relevant vectors" sorts of queries, so cosine similarity behaves identically to euclidean distance. Tack on the fact that every finite set of vectors, with your favorite metric, can be embedded in euclidean space with at most ~41% relative error (and errors that high require somewhat special circumstances, so you'd expect most real-world data to have lower errors -- plus, the error doesn't apply to every pair of points, and many will definitely have much lower error), and you're able to use normed cosine similarity somewhat reasonably on every finite set of stuff you care about, so long as you choose an appropriate embedding. All sets of things you care about in ML are finite, and the sub-metric induced by whichever infinite set you're considering works just fine for everything we've discussed, so cosine similarity is reasonable for all practical ML purposes.
It's much more interesting that almost any set of ML-adjacent vectors can be somewhat reasonably compared via cosine distance _even without_ explicitly constructing an optimal embedding. It's not at all intuitive to me that an autoencoder's interior layer should behave well with respect to cosine similarity and not have any knots or anything warping that (associated) metric's usefulness.
I don't have any references off the top of my head, but I can state a result (which hopefully is easier to search) and then build to the thing I claimed:
- There's a well-known result that says you can embed any finite, undirected, unweighted graph (with shortest-path as the metric of choice) in Euclidean space with at most sqrt(2) multiplicative error (~41% relative error). The proof of that, as I recall, basically shows that the worst-case embedding is any graph with a 4-vertex cycle, from which the result immediately follows.
- Construct the complete graph where each vertex is one of your (finitely many) vectors.
- Let G(0) denote that graph.
- Greedily construct a sequence of graphs G(1), G(2), ... by adding one extra vertex splitting some edge. The edge you split is the one that minimizes skew between your metric's definition of distance between your original set of vectors and the new distance defined using shortest paths. When more than one equivalent split exists, choose one.
- Note that G(N) consists of better and better rational approximations of the desired distances, approaching 0 skew as N approaches infinity.
- Using that well-known result, we can embed G(N) with at most ~41% relative error, and as N grows the extra relative error approaches 0, so multiplying those together you get ~41%.
- Delete the extra vertices from G(N) to obtain the embedding for your original vectors.
- This isn't strictly necessary, but you can (I'm 90% sure -- I'd have to work through these details very carefully to be positive) obtain sqrt(2) exactly (instead of just less than sqrt(2)+epsilon for every positive epsilon) in a few ways. The one I'm most familiar with falls back to logical compaction, basically taking all the math that exists for the hyperreals and using that to create a nonstandard space holding all of these graphs. The limit approaching 0 can be phrased as a first-order statement with consequences that allow you to conclude that there exists an embedding which actually attains the limit. Topological arguments of some sort would probably be a more common way to go about it.
Tbh, I would argue that's also pretty surprising, as Euclidean distance is notoriously unintuitive[1] (and noisy) in higher dimensions. (I guess norming does help, so that's likely a good point.)
I attempted to implement this on the front end of my e-commerce site, which has approximately 20,000 products (see gist [1]). My goal was to enhance search speed by performing as many local operations as possible.
Biggest impact in performance was by moving to dot products.
Regrettably, the sheer size of the index of embeddings rendered it impractical to achieve the desired results.
Those are some janky diagrams. The labels are selectable, and therefore are repeatedly highlighted and un-highlighted while dragging the vector around. The "direction only" arrow prevents you from changing the magnitude, but it doesn't prevent said magnitude from changing and it does so often because the inputs are quantized but the magnitude isn't. Multiple conventions for decimals are used within the same diagram. The second diagram doesn't center the angle indicator on the actual angle. Also the "send me feedback on X" popup doesn't respond to the close button, but then disappeared when I scrolled again so maybe it did? I'm running Chrome 134.0.6998.36 for Windows 10 Enterprise 22H2 build 19045.5487.
This whole thing looks like unreviewed AI. Stylish but fundamentally broken. I haven't had a chance to dig into the meat of the article yet, but unfortunately this is distracting enough that I'm not sure I will.
Edit: I'm digging into the meat, and it's better! Fortunately, it appears accurate. Unfortunately, it's rather repetitive. There's two paragraphs discussing the meaning of -1, 0, and +1 interleaved with multiple paragraphs explaining how cosine similarity allows vectors to be compared regardless of magnitude. The motivation is spread throughout the whole thing and repetitive, and the real world examples seem similar though formatted just differently enough to make it hard to tell at a glance.
To try to offer suggestions instead of just complaining... Here's my recommended edits:
I'd move the simple English explanation to the top after the intro, then remove everything but the diagrams until you reach the example. I'd completely remove the explanation of vectors unless you're going to include an explanation of dot products. I really like the functional approach, but feel like you could combine it with the `Math.hypot` example (leave the full formula as a comment, the rest is identical), and with the full example (although it's missing the `Math.hypot` optimization). Finally, I feel like you could get away with just one real web programming example, though don't know which one I'd choose. The last section about using OpenAI for embedding and it's disclaimer is already great!
Thank you for the good feedback. I tried to improve that. I was writing the blog post for myself to understand Cosine Similarity, which is why it's maybe a bit repetitive, but this is the best way for me to learn something. I get your point. Next time I will write it better. Good feedback - I love that.
Ha, when you put it that way, I can totally see why it read like that!
It looks super great now. What you have here leaves an entirely different impression, and a stylish one!
Two last suggestions:
* Now I'm thinking the Why Cosine Similarity Matters for Modern Web Development section belongs at the top, right after your intro.
* The angle indicator is still a bit wonky in the diagram. I might even take direction only mode out entirely, as you point out cosine similarity is invariant to changes in magnitude.
I've just written a tag similarity calculation for a custom set of tagged documents, and if you're going to actually use the metrics, it's probably a good idea to fill a similarity table instead of recalculating it all on the spot.
While doing that, the next logical step is precalculating the squared magnitudes for each item, and in my small test case of <1000 items that sped the calculation up almost 300 times. The gains are not exponential, but economy on that constant for each pair considered is not insignificant, especially on small data sets (of course, with large sets a table won't cut it due to memory limitations and requires more sophisticated techniques).
Very well explained (although not sure if the TypeScript is necessary)
I wonder though- isn't classic TF-IDF scoring also a type of basic vector comparison? What is the actual difference between "vector database" and something like elastic search?
I do all my work in Rust now via o3-mini-high and convert to WASM... JS just for DOM and event handling. What is the point of building these CPU-bound functions in TS. Why not Rust->WASM?
What I'm asking really is what is the benefit of TS if we could use decent and improving coding AIs to help us code in Rust, even if we are new to it, or never used it, and compile to WASM. Much faster execution per my experience. I mean not even comparable.
It's nice for performance, but if that's not a bottleneck, Typescript is pretty convenient. It has better tooling for the web, and compiles almost instantly. Rust -> WASM can be frustratingly slow if you're exploring a new idea or prototyping.
It’s a nice post, but “using array methods” probably shouldn’t be placed in the “Efficient Implementation” section. As often happens with high-level languages, a single plain old loop is faster than three array methods.
Similarly, if you plan to query those vectors in search, you should consider continuous `TypedArray` types and smaller scalars than the double precision `number`.
I know very little about JS, but some of the amazing HackerNews community members have previously helped port SimSIMD to JavaScript (https://github.com/ashvardanian/SimSIMD), and I wrote a blog post covering some of those JS/TS-specifics, NumJS, and MathJS in 2023 (https://ashvardanian.com/posts/javascript-ai-vector-search/).
Hopefully, it should help unlock another 10-100x in performance.
Oh this is awesome! We actually have a really good use case for this, it should improve performance in our case a lot and help us stay away from WASM a bit longer. Thanks for sharing!
ooh, fantastic.. thankyou !
Im doing a bunch of 3D rotation transforms on millions of points in javascript .. this looks like what I need.
I'd probably use gpu.js for something like this.
As a physicist, I always found it funny that in ML people renamed 'the angle between vectors' into something as fancy-feeling as 'cosine similarity'.
It sounds pretentious, but I think it comes from having multiple ways to compare things for similarity. Using the word similarity says what you are using the angle for. Angle similarity might work just as well until someone comes up with another angle based calculation. Cosine Similarity becomes a specific meaning indicating how it's caclulated and why it's calculated.
If you really wanted to be pretentious you could invent a Dimension Insensitive Euclidean Metric and call it DIEM to make it sound like you are putting some Latin into your papers.
It is used as a distance function, so the important thing it is not that it is an angle, but that it is a similarity function.
And I think today it is preferred to call it "similarity" instead of "distance" because it is not a true distance function (15 years ago we called it "cosine distance").
I have a feeling it might be because the dot product of two unit length vectors is equal to the cosine of the angle between them.. but that's just a wild guess
I think the point is they're often doing it on things that aren't strictly vectors by definition, as in the physics sense, rather just some sack of stuff organised like they might be a vector and assuming its valid to compare angles because its just something numerically without much bias that does the job.
I think what you meant to say is that they aren’t strictly geometric or “interpretable”. But they absolutely are vectors in the linear algebraic sense.
It's stil a vector though.
https://insights.sei.cmu.edu/blog/translating-between-statis...
As an engineer I find it funny when physicists (and general ML bros) don't know about linear algebra concepts
huh? This is just a funny joke on the way its named, what made you think I don't know about dot products ?
It could be a joke, but the way you wrote it makes it sound like you didn't know it
> in ML people renamed 'the angle between vectors' into something as fancy-feeling as 'cosine similarity'.
Since cosine similarity is a name derived from linear algebra
Great post, but what struck me (again, like every time I look at cos similiarity) is how unreasonably well it works. It's just one of those things that's so "weird" about our world: why would cosine similarity work in n-dimensional semantic spaces? It's so stupid simple, and it intuitively makes sense, and it works really well. Crazy cool.
I'm reminded of that old Eugene Wigner quote: "The most incomprehensible thing about the universe is that it is comprehensible."
That cosine distance works at all as a concept isn't terribly shocking, especially given our habit of norming everything. Cosine similarity in a unit vector space is monotonic with euclidean distance, and we're using this stuff in "select the K most relevant vectors" sorts of queries, so cosine similarity behaves identically to euclidean distance. Tack on the fact that every finite set of vectors, with your favorite metric, can be embedded in euclidean space with at most ~41% relative error (and errors that high require somewhat special circumstances, so you'd expect most real-world data to have lower errors -- plus, the error doesn't apply to every pair of points, and many will definitely have much lower error), and you're able to use normed cosine similarity somewhat reasonably on every finite set of stuff you care about, so long as you choose an appropriate embedding. All sets of things you care about in ML are finite, and the sub-metric induced by whichever infinite set you're considering works just fine for everything we've discussed, so cosine similarity is reasonable for all practical ML purposes.
It's much more interesting that almost any set of ML-adjacent vectors can be somewhat reasonably compared via cosine distance _even without_ explicitly constructing an optimal embedding. It's not at all intuitive to me that an autoencoder's interior layer should behave well with respect to cosine similarity and not have any knots or anything warping that (associated) metric's usefulness.
> every finite set of vectors, with your favorite metric, can be embedded in euclidean space with at most ~41% relative error
I’ve never heard of this before. Do you have a reference?
I don't have any references off the top of my head, but I can state a result (which hopefully is easier to search) and then build to the thing I claimed:
- There's a well-known result that says you can embed any finite, undirected, unweighted graph (with shortest-path as the metric of choice) in Euclidean space with at most sqrt(2) multiplicative error (~41% relative error). The proof of that, as I recall, basically shows that the worst-case embedding is any graph with a 4-vertex cycle, from which the result immediately follows.
- Construct the complete graph where each vertex is one of your (finitely many) vectors.
- Let G(0) denote that graph.
- Greedily construct a sequence of graphs G(1), G(2), ... by adding one extra vertex splitting some edge. The edge you split is the one that minimizes skew between your metric's definition of distance between your original set of vectors and the new distance defined using shortest paths. When more than one equivalent split exists, choose one.
- Note that G(N) consists of better and better rational approximations of the desired distances, approaching 0 skew as N approaches infinity.
- Using that well-known result, we can embed G(N) with at most ~41% relative error, and as N grows the extra relative error approaches 0, so multiplying those together you get ~41%.
- Delete the extra vertices from G(N) to obtain the embedding for your original vectors.
- This isn't strictly necessary, but you can (I'm 90% sure -- I'd have to work through these details very carefully to be positive) obtain sqrt(2) exactly (instead of just less than sqrt(2)+epsilon for every positive epsilon) in a few ways. The one I'm most familiar with falls back to logical compaction, basically taking all the math that exists for the hyperreals and using that to create a nonstandard space holding all of these graphs. The limit approaching 0 can be phrased as a first-order statement with consequences that allow you to conclude that there exists an embedding which actually attains the limit. Topological arguments of some sort would probably be a more common way to go about it.
> behaves identically to euclidean distance
Tbh, I would argue that's also pretty surprising, as Euclidean distance is notoriously unintuitive[1] (and noisy) in higher dimensions. (I guess norming does help, so that's likely a good point.)
[1] https://bib.dbvis.de/uploadedFiles/155.pdf
No mystery there IMO. It works well because linear projections are the basic building block of modern neural networks.
I attempted to implement this on the front end of my e-commerce site, which has approximately 20,000 products (see gist [1]). My goal was to enhance search speed by performing as many local operations as possible.
Biggest impact in performance was by moving to dot products.
Regrettably, the sheer size of the index of embeddings rendered it impractical to achieve the desired results.
1. https://gist.github.com/schappim/d4a6f0b29c1ef4279543f6b6740...
This looks nice. I also played on the weekend with Vue and Transformer.js to build the embeddings locally. See https://github.com/alexanderop/vue-vector-search
Those are some janky diagrams. The labels are selectable, and therefore are repeatedly highlighted and un-highlighted while dragging the vector around. The "direction only" arrow prevents you from changing the magnitude, but it doesn't prevent said magnitude from changing and it does so often because the inputs are quantized but the magnitude isn't. Multiple conventions for decimals are used within the same diagram. The second diagram doesn't center the angle indicator on the actual angle. Also the "send me feedback on X" popup doesn't respond to the close button, but then disappeared when I scrolled again so maybe it did? I'm running Chrome 134.0.6998.36 for Windows 10 Enterprise 22H2 build 19045.5487.
This whole thing looks like unreviewed AI. Stylish but fundamentally broken. I haven't had a chance to dig into the meat of the article yet, but unfortunately this is distracting enough that I'm not sure I will.
Edit: I'm digging into the meat, and it's better! Fortunately, it appears accurate. Unfortunately, it's rather repetitive. There's two paragraphs discussing the meaning of -1, 0, and +1 interleaved with multiple paragraphs explaining how cosine similarity allows vectors to be compared regardless of magnitude. The motivation is spread throughout the whole thing and repetitive, and the real world examples seem similar though formatted just differently enough to make it hard to tell at a glance.
To try to offer suggestions instead of just complaining... Here's my recommended edits:
I'd move the simple English explanation to the top after the intro, then remove everything but the diagrams until you reach the example. I'd completely remove the explanation of vectors unless you're going to include an explanation of dot products. I really like the functional approach, but feel like you could combine it with the `Math.hypot` example (leave the full formula as a comment, the rest is identical), and with the full example (although it's missing the `Math.hypot` optimization). Finally, I feel like you could get away with just one real web programming example, though don't know which one I'd choose. The last section about using OpenAI for embedding and it's disclaimer is already great!
Thank you for the good feedback. I tried to improve that. I was writing the blog post for myself to understand Cosine Similarity, which is why it's maybe a bit repetitive, but this is the best way for me to learn something. I get your point. Next time I will write it better. Good feedback - I love that.
Ha, when you put it that way, I can totally see why it read like that!
It looks super great now. What you have here leaves an entirely different impression, and a stylish one!
Two last suggestions:
* Now I'm thinking the Why Cosine Similarity Matters for Modern Web Development section belongs at the top, right after your intro.
* The angle indicator is still a bit wonky in the diagram. I might even take direction only mode out entirely, as you point out cosine similarity is invariant to changes in magnitude.
I think the web animation is really useful for people new to the concept.
Oh, certainly, I meant remove the "Direction-only mode" toggle, not the whole animation!
I've just written a tag similarity calculation for a custom set of tagged documents, and if you're going to actually use the metrics, it's probably a good idea to fill a similarity table instead of recalculating it all on the spot.
While doing that, the next logical step is precalculating the squared magnitudes for each item, and in my small test case of <1000 items that sped the calculation up almost 300 times. The gains are not exponential, but economy on that constant for each pair considered is not insignificant, especially on small data sets (of course, with large sets a table won't cut it due to memory limitations and requires more sophisticated techniques).
Very well explained (although not sure if the TypeScript is necessary)
I wonder though- isn't classic TF-IDF scoring also a type of basic vector comparison? What is the actual difference between "vector database" and something like elastic search?
I do all my work in Rust now via o3-mini-high and convert to WASM... JS just for DOM and event handling. What is the point of building these CPU-bound functions in TS. Why not Rust->WASM?
Because op knows TS and doesn't know Rust?
What I'm asking really is what is the benefit of TS if we could use decent and improving coding AIs to help us code in Rust, even if we are new to it, or never used it, and compile to WASM. Much faster execution per my experience. I mean not even comparable.
It's nice for performance, but if that's not a bottleneck, Typescript is pretty convenient. It has better tooling for the web, and compiles almost instantly. Rust -> WASM can be frustratingly slow if you're exploring a new idea or prototyping.
How sad is it to want a solution but none of the journey to get there.
The future is bleak.
I liked your article. The chart with the vectors on it was cool though kinda hard to use on mobile.
I went to the typescript tag and tried to read a few other articles and got 404 errors. Just wanted to let ya know.
Nice blog and good work!
Meanwhile in Python this is just
Because you use numpy. Could as well import cosine_similarity from sklearn.
you could also normalize (divide all components by magnitude) both vectors and simply take the dot product?