The Einstein AI Model

241 points by 9woc 3 days ago

mentalgear 2 days ago

BlueSky version: https://bsky.app/profile/thomwolf.bsky.social/post/3ljpkl6c6...

---

Quite interesting post that asks the right question about "asking the right questions". Yet one aspect I felt missing (which might automatically solve this) is first-principles-based causal reasoning.

A truly intelligent system — one that reasons from first principles by running its own simulations and physical experiments — would notice if something doesn't align with the "textbook version".

It would recognize when reality deviates from expectations and ask follow-up questions, naturally leading to deeper insights and the right questions - and answers.

Fascinating in this space is the new "Reasoning-Prior" approach (MIT Lab & Harvard), which trains reasoning capabilities learned from the physical world as a foundation for new models (before evening learning about text).

Relevant paper: "General Reasoning Requires Learning to Reason from the Get-go."

zombot 17 hours ago

"Reality" is an alien concept to an LLM. All they have is sequences of words that they can complete.
- reverius42 13 hours ago
  
  "Reality" is an alien concept to a Human. All they have is sequences of motions and noises they can complete.
  - neom 10 hours ago
    
    I explained to someone mum is mum because of lips, air and sounds and they didn't believe me, so that's what we're contending with....
- herculity275 10 hours ago
  
  I used to think that but newer models' increasing coherence coupled with the astonishing success of RLHF makes me think that LLMs that can effectively intuit "reality" will emerge fairly soon.
antonkar 9 hours ago

Interesting, I think the guy who’ll make the GUI for LLMs is the next Jobs/Gates/Musk and Nobel Prize Winner (I think it’ll solve alignment by having millions of eyes on the internals of LLMs), because computers became popular only after the OS with a GUI appeared. I just started ASK HN to let people share their AI safety ideas, both crazy and not: https://news.ycombinator.com/item?id=43332593
mentalgear 2 days ago

PS: great explainer video https://www.youtube.com/watch?v=seTdudcs-ws&t=180s

engfan a day ago

I have never heard anyone think this way: “The main mistake people usually make is thinking Newton or Einstein were just scaled-up good students, that a genius comes to life when you linearly extrapolate a top-10% student.

The reason such people are widely lauded as geniuses is precisely because people can’t envision smart students producing paradigm-shifting work as they did.

Yes, people may be talking about AI performance as genius-level but any comparison to these minds is just for marketing purposes.

8note 19 hours ago

we kinda think too much of them though. each is also a product of their surroundings, and had contemporaries who could or did come to the same revelations.
- InfiniteLoup 10 hours ago
  
  What's your opinion on John von Neumann?
  - seanhunter 2 hours ago
    
    Yes, or Leonard Euler.
  - _diyar 4 hours ago
    
    Alien
- downboots 17 hours ago
  
  If the universe is not intelligent, how can a subset of it be intelligent? If it is all computation, what is the purpose?
  - sebastiennight 15 hours ago
    
    The first question is weird. Many subsets of X can have property Y without X having it, wouldn't you say?
    "If the desert is not covered in palm trees, how can a subset of it be covered in palm trees?"
    "If the neural network is not activating, how can a node of the network be activating?"
    
    downboots 15 hours ago
    
    Good note. Your examples suggest thinking of 'property' as a sort of discontinuous indicator function on the subsets. I'm thinking about the interdependence between the function values or across subsets, regardless of continuity, in the context of universal computation. How to localize or define intelligence? Take the example of IQ, as a platonic ideal for measuring intelligence, vs all possible groups you could make with those people. Hard to define intelligence https://news.ycombinator.com/item?id=39977664

EigenLord 6 hours ago

I think the author has a point. LLMs struggle with what you might call epistemically constructive novelty. It's the ability not just to synthesize existing knowledge, but to identify what's missing and conjecture something to fill the gap and demonstrate it to satisfaction. Out-of-distribution knowledge gaps are typically where LLMs "hallucinate." Unlike highly skilled human researchers, they don't pause and construct the bridge that will get them from known to unknown, they just immediately rush to fill in the blank with whatever sounds most plausible. They need to ask questions that haven't been asked before, or answer ones that haven't been answered. Is this just some missing subroutine that we'll eventually figure out? Or is this conjecture-proving process much more elaborate than whatever existing models, no matter how scaled, can manage? I'm not sure. But the answer starts with a question.

msvana 14 hours ago

I have a few thoughts after reading this:

- I started to see LLMs as a kind of search engines. I cannot say they are better than traditional search engines. On one hand, they are better at personalizing the answer, on the other hand, they hallucinate a lot.

- There is a different view on how new scientific knowledge is made. It's all about connecting existing dots. Maybe LLMs can assist with this task by helping scientists discover relevant dots to connect. But as the author suggests, this is only part of the job. To find the correct ways to connect the dots, you need to ask the right questions, examine the space of counterfactuals, etc. LLMs can be useful tool, but they are not autonomous scientists (yet).

- As someone developing software on top of LLMs, I am slowly coming to a conclusion that human-in-the-loop approaches seem to work better than fully autonomous agents.

downboots 14 hours ago

Instead of connecting language with physical existence, or entities, it's connecting tokens. An LLM may be able to describe scenes in a video, but a model would tell you that said video is a deep fake because of some principle like conservation of energy and mass informed by experience, assumptions, inference rules, etc.

Agingcoder 2 days ago

The author seems to assume that conjuring up a conjecture is the hard part - yet it will be filled with the same standard mathematics ( granted, sometimes wrapped as new tools, and the proof ends up being as important as the result), often at great cost.

Having powerful assistants that allow people to try out crazy mathematical ideas without fear of risking their careers or just having fun with ideas is likely to have an outsized impact anyway I think.

aleksiy123 a day ago

The Bitter Lesson seems relevant here again. http://www.incompleteideas.net/IncIdeas/BitterLesson.html
I think I read somewhere about Erdős having this somewhat brute force approach. Whenever fresh techniques were developed (by himself or others), he would go back to see if they could be used on one of his long-standing open questions.
- HappMacDonald a day ago
  
  I think this is the second time I've read this blog post, but it increasingly strikes me as parenting advice.
  Translated to that domain, it reads "teach your kids how to think, not what to think".
  - causal 9 hours ago
    
    Paradoxically, as a parent I find the notion that humans are blank slates completely false. Babies come with a tremendous amount of pre-programmed behaviors and interests.
  - robwwilliams 21 hours ago
    
    Which is great advice that almost no parents follow.
tensor 18 hours ago

Even worse, people seem to forget that “science” is not math. You need to test hypotheses with physical (including biological) experiments. The vast majority of the time spent doing “science” is running these experiments.
An LLM like AI won’t help with that. It would still be a huge help in finding and correlating data and information though.
kristianc 2 days ago

As Isaac Newton himself put it, “if I have seen further it is by standing on the shoulders of Giants.” It was ever thus.
timewizard 2 days ago

New things AI will magically fix by existing: The completely broken university career publishing pipeline. *fingers crossed*

msabalau 8 hours ago

It doesn't seem correct to dismiss the creativity of Move 37 because real originality is "something more fundamental, like inventing the rules of Go itself"

It would seem more fruitful to simply point out that LLMs aren't all of AI, and that excelling at mimicking human-like text production isn't really doing the work that AlphaGo was attempting.

Just because both things might be given as (different) examples of deep reinforcement learning in an AI survey course doesn't mean that we have much reason to believe that the vast investments in LLMs result in AlphaGo like achievements.

Nesco 8 hours ago

Modern LRMs do have some tiny degrees of intelligence

neilv 2 days ago

A nice post (that should be somewhere smarter than contemporary Twitter/X).

> PS: You might be wondering what such a benchmark could look like. Evaluating it could involve testing a model on some recent discovery it should not know yet (a modern equivalent of special relativity) and explore how the model might start asking the right questions on a topic it has no exposure to the answers or conceptual framework of. This is challenging because most models are trained on virtually all human knowledge available today but it seems essential if we want to benchmark these behaviors. Overall this is really an open question and I’ll be happy to hear your insightful thoughts.

Why benchmarks?

A genius (human or AI) could produce novel insights, some of which could practically be tested in the real world.

"We can gene-edit using such-and-such approach" => Go try it.

No sales brochure claims, research paper comparison charts to show incremental improvement, individual KPIs/OKRs to hit, nor promotion packets required.

vessenes a day ago

The reason you'd have a benchmark is that you want to be able to check in on your model programmatically. DNA wetwork is slow and expensive. While you're absolutely right that benchmarks aren't the best thing ever and that they are used for marketing and sales purposes, they also do seem to generally create capacity momentum in the market. For instance, nobody running local LLMs right now would prefer a 12 month-old model to one of the top models today at the same size - they are significantly more capable, and many researchers believe that training on new and harder benchmarks has been a way to increase that capacity.

omnee 4 hours ago

I agree that any system claiming general intelligence must be able to form and modify a model of the world. A fundamental part of this is being able to ask counterfactual questions on its own understanding or knowledge of the world. The history of science is full of countless such examples. As of right now I'm not aware of any LLMs or indeed any AI system being able to do so.

chr15m a day ago

If this take is correct and we need creative B students, we might still get a compressed 21st century with human creative B students working together with AI A students who support the human with research, validation, workshopping ideas, etc.

downboots a day ago

"Alpha children wear grey. They work much harder than we do, because they're so frightfully clever. I'm awfully glad I'm a Beta, because I don't work so hard."
"The lower the caste, the shorter the oxygen."

ongytenes 9 hours ago

I think discoveries by AI would be due to pattern matching. Like finding overlooked cancer markers that can be used for an earlier prognosis. The genius of Einstein and his thought experiments may elude an AI. It may take an AI designed on some other future model other than an LLM to "compress a century"

He called it wishful thinking. I believe the hype over AI is due to attempts to justify the enormous investments going into AI development has created an echo chamber.

seanhunter 8 hours ago

For one thing, how do we know that all discoveries are not pattern matching? In "The Act of Creation" for example, Arthur Koestler proposes the idea that all creativity is essentially the act of finding connections between diverse frames of reference and that extraordinary/genius-level creativity might just be that you can spot connections between even weirder and more diverse frames of reference than the regular creative person might be able to.
I am certain there is a self-reinforcing hype cycle around LLMs specifically at the moment, but AI progress is definitely gathering pace and starting to get to the point where it is impacting normal people to the extent that hasn't been seen since the dot com boom. So the people making investments are for sure stampeding to pour in capital so as not to miss out on the big winners from this change.

hackerknew a day ago

Could we train an AI model on the corpus of physics knowledge up to the year 1905 and then see if we can adjust the prompt to get it to output the theory of relativity?

This would be an interesting experiment for other historical discoveries too. I'm now curious if anybody has created a model with "old data" like documents and books from hundreds of years ago, and see if comes up with the same conclusions as researchers and scientists of the past.

Would AI have been able to predict the effectiveness of vaccines, insulin, other medical discoveries?

esafak 10 hours ago

That's the ideal, but I think today's models are too crude for that. Relativity is built on differential geometry, which was new at the time. I think inventing or even building that is beyond today's models; there's an infinitely large space of mathematics that can be invented, and barely a gradient to guide the search. Humans don't coin mathematics by gradient descent. The most I've seen is fitting observations using existing mathematics; a technique known as symbolic regression. The E=mc^2 equation could be curve fitted like this, but it would afford no insight.
https://en.wikipedia.org/wiki/Symbolic_regression
Garlef a day ago

Great idea!
But there might not be enough text.
And: There's a similar situation to why double blind studies are necessary - The questions we pose to such a system would be contaminated by our cultural background; We'd might be leading the system.
And if the system is autonomous and we wait for something true to appear how would we know that the final system, trained on current data produced something worthwhile?
Take maths: Producing new proofs and new theorems might not be the issue. Rather: Why should we care about these result? Thousands of PhD students produce new mathematics all the time. And most of it is irrelevant.
ilamparithi 18 hours ago

Had the same thought sometime back about AI discovering theory of relativity with only the data before 1905. It would give a definite answer about whether any reasoning involved in the LLM output.
knowaveragejoe a day ago

Now that would be interesting!

systemstops a day ago

Wouldn't the ability to "ask the right questions" require that AI could update its own weights, as those weights determine which questions can be asked?

esafak 10 hours ago

If the existing weights don't let it ask the right questions, assuming it is properly fit, why would retraining it fix the problem?

jaxr 12 hours ago

The most useful way to leverage LLMs for me has been as "content fillers". I'm a software engineer, and work with a rather large code base. Some parts are rarely touched, and loading the context into my brain whenever I need to go back to them requires quite a bit of effort. I found that asking cursor/Claude to suggest how to make the required chande rarely comes up with the right solution but usually points me in the right direction and is enough to help me load the context up. Similarly with my side projects, which typically involves knowledge that I don't use in my day to day.

nahuel0x a day ago

We saw algorithms designing circuits that no human engineer would design, even before the LLM (using genetic algorithms). So out-the-box thinking can be also more reachable than this author thinks.

audunw 15 hours ago

But there's a reason we don't use those algorithms. We don't need out-of-the-box thinking that's so far outside the box that it's useless.
With these kinds of circuits, they were so sensitive to the specific conditions that the circuit was tested in (temperature, process variation, ..) that the solution couldn't be generalized to be used outside of that specific experiment.
We need the kind of intelligence that can question what assumptions can be challenged, and which we need to keep to have a viable (eventually commercially viable) solution.
- torginus 12 hours ago
  
  If that was the case, then the algorithm was useless or flawed. IRL autorouters must take into account real physical constraints, lie wire length, signal integrity and tolerances to produce valid designs. A circuit that doesn't perform well under IRL conditions violates those constraints.
niccl 21 hours ago

including, IIRC, at least one FPGA-based circuit that had a blob of logic not connected to anything else (ie could not possibly be involved in the logical functioning of the circuit), but when removed the implementation stopped working. So the actual circuit wasn't a sensible design option, just a very implementation-specific local minimum.
I think the original design challenge was something like a tone discriminator circuit. I can't recall the details
- robwwilliams 21 hours ago
  
  Yes, another case like this in which stray capacitance/inductances between traces was optimized in making an effective FPGA. Initial the developers had no idea why it worked so well. They found it to be exceedingly temperature sensitive. That clue gave them the answer.

jillesvangurp 17 hours ago

The reality with people is that most of them don't come close to Einstein level intelligence. A lot of the stuff I ask perplexity or chatgpt is way beyond what I could reasonably ask from the vast majority of people I know. I love my relatives. But they are kind of useless for the vast majority of stuff that bounces around in my head.

AIs are at this point a useful tool for knowledge workers. They don't replace them but enhance their productivity. For scientific work, having an LLM that is trained on essentially all of the scientific work published, ever (until the cutoff date) is probably useful.

You can now have conversations with an AI about cross referencing your ideas with existing work. You might analyze a paper you are writing and ask it to summarize key claims, criticize those, and your methodology, cross reference claims with literature, etc. Find counter points to your claims, etc. And you could probably use it to come up with interesting follow up questions, let it formulate hypotheses and ways to verify those, etc. Most scientific work isn't Archimedes going Eureka while taking a bath but undergraduates, post docs, and other under paid research stuff grinding through piles and piles of existing work and filling their heads with enough information until finally something new and original pops out.

I got my Ph. D. in 2003. I'm part of the first generation of researchers that was able to use Google. At the time that was a huge enabler for tracking down obscure references and authors. Getting a paper published involves an enormous amount of what I just outlined. And LLMs can assist you with that. Will it hallucinate. Absolutely. But it will also dig out valid points, references, etc. Sorting that out is still work that you need to do. But it probably saves a lot of time. Will it propose original new theories. Maybe, maybe not. But it will speed up the process of zooming in on unanswered ones.

Science isn't necessarily about coming up with answers but coming up with interesting questions. That's what Einstein did: ask interesting questions. Researchers are still trying to answer some of them and verifying some of the answers he predicted.

zombot 17 hours ago

> At the time that was a huge enabler for tracking down obscure references and authors.
Would that still work today, in the highly commercialized and highly sanitized/censored internet? Where Google wouldn't show you those search results because they aren't profitable enough?
And how do you even train an LLM on a fair representation of human knowledge when you only find stuff that is mainstream and commercially viable?
- jillesvangurp 16 hours ago
  
  Sure, Google Scholar is still a good resource. And if you jump an article link into Google, it will likely find you a link to that. I don't see how that would not work today.
  These days there are other tools as well. I use Perplexity a lot currently. Not for scientific work because I don't do that anymore but it would work great for that as well. And I'm sure modern day researchers have their favorite tools that I'm not even aware off.

smougel 13 hours ago

The current culture about AI & LLMs is that we are "only" memorizing the Web into model parameters and that a LLM is unable to "invent" new paradigms. Maybe we are under estimating what Unsupervised Learning & RL could provide. Re-inforcement learning is about exploring and finding new ways to accomplish a task and I see no limit here (except the computational resources needed).

randomNumber7 a day ago

Thing about the Einstein example is, that it was already known the speed of light is constant.

The question he asked was just that this fact was not compatible with the Maxwell equations.

phillipcarter a day ago

A way I've been thinking about this today is:

We can't distinguish between a truly novel response from an LLM or a hallucination.

We can get some of the way there, such as if we know what the outcome to a problem should look like, and are seeking a better function to achieve that outcome. Certainly at small scales and in environments where there are minimal consequences for failure, this could work.

But this breaks down as things get more complicated. We won't be able to test the effectiveness of 100 million potential solutions to eradicating brain tumors at once. Even if we somehow arrive at guaranteeing that every unforeseen consequence is also accounted for in our exercise in specifying the goals and constraints of the problem. We just simply don't have the logistics to run 100 million clinical trials where we also know how to account for countless confounding effects (let alone consent!)

OtherShrezzing 2 days ago

>We're currently building very obedient students, not revolutionaries. This is perfect for today’s main goal in the field of creating great assistants and overly compliant helpers. But until we find a way to incentivize them to question their knowledge and propose ideas that potentially go against past training data, they won't give us scientific revolutions yet.

This would definitely be an interesting future. I wonder what it'd do to all of the work in alignment & safety if we started encouraging AIs to go a bit rogue in some domains.

sinuhe69 21 hours ago

I agree. But in response, I'd also point out that AI, even in its current form, can help speed up our tasks: collecting data, verifying/cleaning, recognizing primary patterns, writing simple code to automate, conducting a critical self-conversation, drafting and refining our writing, etc.... Science requires a lot of mundane, tedious work, and AI can undoubtedly help us in this aspect. The idea that we are developing AI to replace our brains to make scientific progress for us is misguided, to say the least .

captainclam 7 hours ago

Exactly. If the whole "deep research" thing pans out, and we have models that can reliably produce proper literature reviews in 10 minutes...that alone will be an enormous boon to research.
Then add all the practical/mundane tasks that you mentioned, and you've got quite the multiplier.

ilaksh a day ago

I think it's more of a social phenomenon than an intellectual characteristic. I guess these days people would just assume that outlier ideas come from autism, but I think that isn't necessarily true.

But maybe it helps to be socially isolated or just stubborn. People do not want to accept new approaches.

Clearly they do eventually, but there is always some friction.

But I think that it's been shown that through promoting and various types of training or tuning, LLMs can be configured to be non- sycophantic. It's just that humans don't want to be contradicted so that can be trained out of them during reinforcement.

Along with the training process just generally being aimed at producing expected rather than unexpected answers.

janalsncm a day ago

> Many have been proposing "move 37" as evidence that AI has already reached Einstein-level intelligence

I don’t think this example applies in the ways we care about. Sure, in the domain of go we have incredibly powerful engines. Poker too, which is an imperfect information game which you could argue is more similar to life in that regard.

But life has far more degrees of freedom than go or poker, and the “value” of any one action is impossible to calculate due to imperfect information. And unlike in poker, where probabilities can be calculated, we don’t even have the probability distribution for most events, even if we could enumerate them.

haswell a day ago

I didn't interpret the mention of move 37 in the way I think you are here.
The author brought it up specifically to highlight that they don't believe move 37 signifies what many people think it does, and that while impressive, it's not general enough to indicate what some people seem to believe it indicates.
In essence, I think they said the same thing you are using different words.
- janalsncm a day ago
  
  I don’t disagree with the author, I just think their argument isn’t as strong as it could be. Excelling in a constrained decision space like go is fundamentally less difficult than doing the same in the real world. It’s a categorical difference that the author didn’t mention.
  I’m also not even convinced move 37 was properly explained as a “straight A student” behavior. AlphaGo did bootstrap by studying human games but it also learned more fundamental value functions via self play.

TeMPOraL 2 days ago

> If something was not written in a book I could not invent it unless it was a rather useless variation of a known theory. __More annoyingly, I found it very hard to challenge the status-quo__, to question what I had learned.

(__emphasis__ mine)

As if "challenging the status-quo" was the goal in the first place. You ain't gonna get any Einstein by asking people to think inside the "outside the box" box. "Status quo" isn't the enemy, and defying it isn't the path to genius; if you're measuring your own intellectual capacity by proxy of how much you question, you ain't gonna get anywhere useful. After all, questioning everything is easy, and doesn't require any particular skill.

The hard thing is to be right, despite both the status-quo and the "question the status-quo" memes.

(It also helps being in the right time and place, to have access to the results of previous work that is required to make that next increment - that's another, oft forgotten factor.)

robwwilliams a day ago

Read Feyerabend’s Against Method Thom. You have rephrased (very well) the necessity of counter-inductive thinking.

Scrapemist 19 hours ago

It’s not man vs machine but man plus machine: human ask the right question, machine gives plausible answers. Human processes this and comes up with a new question. Without a human in the loop a breakthrough isn’t even registered.

zombot 17 hours ago

Articles like this are worth the author's weight in gold. But as evidenced by the comments here, only few people understand the argument. The rest just "know better".

tim333 a day ago

>I’m afraid AI won't give us a "compressed 21st century".

There's no mention of exponential growth which seems a major omission when you are talking about centuries. Computers have kept improving in a Moore's law like way in terms of compute per dollar and no doubt will keep on like that for a while yet. Give it a few years and AI tech will be way better than what we have now. I don't know about exact timings like 5-10 years but in a while.

dimitri-vs a day ago

What exponential growth? By all accounts things are slowing down: sonnet3.7 is not exponentially better, neither is gpt4.5, grok3 is just catching up. I'm still using sonnet3.5 for a lot of coding because IMO it's better than 3.7.
- tim333 a day ago
  
  Exponential growth of computing power which will lead to a gradual increase in AI performance. I think the oldest LLM you mention there is nine months old which is not very long in the scheme of things but give it a couple of years and you'll probably see a good improvement.
zesterer a day ago

The whole point of this post is that the things AI isn't good at and has never been good at will be the limit to otherwise-exponential growth.
- tim333 a day ago
  
  Well, yeah the post kind of tries to argue that but it is also talking about how we don't have an Einstein or Newton like AI. Those two are outliers thought of as some of the smartest scientists ever to have lived and so are a bit of an unrealistic target just now.
  As to whether AI can go beyond doing what it's told and make new discoveries, we've sort of seen that a bit with for example the AlphaGo type programs coming up with modes of play humans hadn't thought of. I guess I don't buy the hypothesis that if you had an AI smarter than Einstein it wouldn't be able to make Einstein like discoveries due to not being a rebel.

adamtaylor_13 a day ago

I have yet to find a model that will stick, strictly, to factfulness. So I’m not entirely sure we don’t already have models that can question “facts” and invent novel things.

He said it himself, it’s just finding new/interesting gaps between existing knowledge.

wewewedxfgdf 2 days ago

I'm still waiting for the end of the world caused by AI as predicted by a very large number of prominent figures such as Sam Altman, Hinton, Musk, signers of the Center for AI Safety statement, Shane Legg, Martin Minsky, Eliezer Yudkowsky.

No sign yet.

On the other hand, LLMs are writing code which I can debug and eventually get to work in a real code base - and script writers everywhere are writing scripts more quickly, marketing people are writing better ad copy, employers are writing better job ads and real estate agents writing better ads for houses.

lionkor 2 days ago

Prominent figure says baseless thing to boost stock prices, more news at 6
- TeMPOraL 2 days ago
  
  Yes, except half of the list isn't made of prominent people. Whose stock price was Eliezer boosting when he was talking about these things 15 years ago?
  Nah, it's more that the masses got exposed to those ideas recently - ideas which existed long ago, in obscurity - and of course now everyone is a fucking expert in this New Thing No One Talked About Before ChatGPT.
  Even the list GP gave, the specific names on it - the only thing that this particular grouping communicates is one having no first clue what they're talking about.
- godelski 2 days ago
  
  The oddity isn't that people lie, the oddity is that people continue to believe those who lie. They even give more trust to those who constantly lie. This is certainly odd
netdevphoenix 2 days ago

Fact is even if the world was to end, finding the causes would be extremely difficult because...well the world would have ended.
empiko 2 days ago

I am still waiting to see the impact on GDP or any other economic measure.
whatnow37373 13 hours ago

TBF people like Eliezer warn about the general threat of AI, not LLMs. Not just isolated AI systems either, but the societal forces that lead to it. Strong AI is the result of a civilization developing it and being dependent on (sub-components of) it and that's where the interesting/dangerous parts of the narrative begin.
If you've got untold billions being spent over many decades with a significant percentage of the world's smartest people obsessing over it you _are_ going to get something and characterizing that something in advance does not seem like a completely idiotic thing to do.

wegfawefgawefg 8 hours ago

creativity is possibly just random noise in feature space.

at best it could be random noise in a feature space of a thing modeling its own thought trajectory.

captainclam 7 hours ago

Crucially, this doesn't just require noise but it requires "taste."
I tend to fall back on music creation as an example of this notion. Lots of innovation in music is experimentation/exploration of "noise," (not necessarily literal white noise) but requires the ear of a discerning musician who ultimately goes "Ooh! I liked that" or passes a "generated sample" by.
This is where I wonder if LLMs can ever innovate. I'm not sure they can develop "taste" for things outside of their distribution. However, I could just as easily be convinced that humans can't either, and sophisticated "taste" is just the exploration of obscure regions of the combinatorial space generated from previously observed samples!

Geee 14 hours ago

Exactly. The hallmark of intelligence is the ability to disagree with everyone else and be right.

dang a day ago

(Most comments here were posted to https://news.ycombinator.com/item?id=43317269 and then moved hither.)

ANarrativeApe 19 hours ago

That this accurate article is considered noteworthy is scary.

ChatGPT will tell you the same - if you ask it.

internet_points 2 days ago

If an llm is trained on knowledge up until say September 2023, could you use a corpus of interesting/insightful scientific discoveries and new methods developed after that date to evaluate/tune it? (Though I fear it would be a small corpus.)

kingkongjaffa 2 days ago

Get a research paper, look at the references. Give an llm all of the references but not the current paper. See if it can conclude something like the current paper? Or at least design the same experiment as detailed in the paper?
- pishpash a day ago
  
  The fact that the references are what they are, out of all possible sets of references, is a significant part of the research. It's not about reading, it's about aiming in a direction without knowing where it ends up.
Yizahi 2 days ago

One of the problems would be acquiring said corpus. NN corporations got away with scraping all human made content for free (arguably stealing it all), but no one can really prove that their specific content was taken without asking, so no lawsuits. NYT tried but that was workaround and I don't know the status of that case. But if NN corpo will come out with explicitly saying that "here, we are using a Nature journal dump from 2024" then Nature journal will come to them and say "oh, really?".

knowitnone a day ago

Yes, you may want this but all I want are straight facts, not inituition. I don't want a conscience AI.

aaurelions a day ago

Einstein Mode LLM: Temperature - 2

Then it's just a matter of checking all the “nonsense” that's been generated.

pama a day ago

The reality is much simpler than what is often presented about science geniuses. Lorentz and Poincare had the math down to explain early experiments and the Maxwell’s equations predictions of the constancy of the speed of light, and Einstein indeed provided a neat shift in the interpretation by taking a different perspective. (His photoelectric effect experiment and interpretation was a more genuine original contribution that came from experimental data and got him a Nobel prize.) The ideas behind gene editing existed since forever, but the observation that certain bacteria use a more accurate and selective gene editing system than viruses led to CRISPR. I have trouble with arguments where many examples in a row are not based on popular, often slightly mistaken beliefs, especially when it comes to discussions related to using such examples as analogies and arguments for predicting the future. I have talked to and worked with many different Nobel prize winners across different fields of science in my life, and although they were all extraordinary bright and focused individuals, the introductory part of this article misses the point. I agree that there is no linear extrapolation from being a good student, but I dont think the additional abilities are beyond the reach of machines. Focus, knowledge, perserverence, and the ability to analyze data very carefully are strict and challenging requirements. Asking the right questions is very important as well, but much easier than people assume.

DeathArrow 15 hours ago

I don't know about you, but I am always nice and friendly with my AI, I always say please and thank you. In the event AI will take over the world.

jrimbault 2 days ago

What about the not-LLMs works?

I know barely anything about it but it seems some people are interested and excited about protein engineering powered by neural networks.

tim333 a day ago

Deepmind are working on simulating a whole cell which will be interesting and potentially useful.

hoseja 15 hours ago

>we don't just need a system that knows all the answers, but rather one that can ask questions nobody else has thought of or dared to ask.

That's called hate speech and every AI has been aggressively lobotomized to never do it by an army of RLHFers.

rcarmo a day ago

He means YMaaS, no? Might as well coin the acronym early.

6stringmerc 18 hours ago

Creativity is inherently disobedient. That’s why it’s such an enigma.

ypeterholmes a day ago

Hey look, the goalposts are being moved again. This time it's from top end researcher to generational genius. Question: what evidence is there that this benchmark will not be reached also? Time and again these essays make the mistake of assuming AI is a static thing, and refuse to acknowledge the inexorable march forward we are witnessing. As humans, we cling to our own fragile superiority. Even on this thread- I thought Hinton said the world would be transformed by now. That's NOT what was claimed. We are like three years in! Posts like this will be laughable in 10 years.

nl a day ago

> Hey look, the goalposts are being moved again.
Typically the "moving goalpost" posts are "we don't have AI because ....". That's not what this post is doing - it's pointing out a genuine weakness and a way forward.
- ypeterholmes a day ago
  
  As I noted, this post is saying AI can't achieve "genius" level creativity. Just a year ago the criticisms were that it couldn't match a human. How is that not moving the goalposts?
  - nl a day ago
    
    The "moving goalposts" thing is typically "When AI can do this we will have AI" then AI does the thing and people say "no it's not AI because it can't do this other thing"
    I agree entirely this is annoying.
    This case is different because there is no claim that we don't have AI, nor a claim that once we get that we will have AI.
    Instead it's a very specific discussion of a particular weakness of current AI systems (that few would disagree with) and some thoughts about a roadmap for progress.
  - pishpash a day ago
    
    It doesn't say genius-level creativity, just any novel research-like creativity. I don't agree but that's a strawman.

tyronehed a day ago

The first thing you need to understand is that no current llm based, transformer architected AI is going to get to agi. The design in essence is not capable of that kind of creativity. In fact no AI that has at its root a statistical analysis or probabilistic correlation will get us past the glorified Google parlor trick that is the modern llm in every form.

A great leap in IP but unfortunately is too important to blab about widely, is the solution to this problem and the architecture that will be contained in the ultimate AGI solution that emerges.

moralestapia 2 days ago

>Just consider the crazy paradigm shift of special relativity and the guts it took to formulate a first axiom like “let’s assume the speed of light is constant in all frames of reference” defying the common sense of these days (and even of today…)

I'm not an expert on this. Wasn't this an observed phenomenon before Albert put together his theory?

zesterer a day ago

Weird problems with physics were everywhere before Einstein. Maxwell comes painfully close to discovering GR in some of his musings on black body radiation.
Noticing that there was a problem was not the breakthrough: trying something bizarre and counter-cultural - like assuming light speed is invariant over the observer - just to see if anything interesting drops out was the breakthrough.
tim333 a day ago

It was an observed phenomenon - https://en.wikipedia.org/wiki/Michelson%E2%80%93Morley_exper...
Einsteins more impressive stuff was explaining that by time passing at different rates for different observers

berkes 2 days ago

I've had some luck instructing AI to "Don't make up anything. If there's no answer, say I don't know".

Which made me think that AI would be far more useful (for me?) if it was tuned to "Dutchness" rather than "Americanness".

"Dutch" famously known for being brutally blunt, rude, honest, and pushing back.

Yet we seem to have "American" AI, tuned to "the customer is always right", inventing stuff just to not let you down, always willing to help even if that makes things worse.

Not "critical thinking" or "revolutionary" yet. Just less polite and less willing to always please you. In human interaction, the Dutch bluntness and honesty can be very off-putting, but It is quite efficient and effective. Two traits I very much prefer my software to have. I don't need my software to be polite or to not hurt my feelings. It's just a tool!

atombender a day ago

Today I asked ChatGPT about an old game I was trying to remember the name of, which it immediately identified as Trespasser, an early 3D FPS in the Jurassic Park franchise. But then it got weird. After it identified the game, it started asking me questions like whether I had played the game when it came out ("oh, awesome!"), and whether I had managed to finish it or just played it to mess around with the physics engine (which was quite advanced for its time), and then it asked me about specific moments in the game like it was just another gamer bro who was sharing a common passion for video games. I don't know who wants this. It's not something that can even have a real personality, so layering on such a thick layer of friendliness feels wrong to me. I would prefer a "robot".
- Jensson 13 hours ago
  
  It mostly replays bits of previous conversations, or something like them, when you ask about a game it is common for people to ask you about that game as well. It is hard to disable this.
- robwwilliams 21 hours ago
  
  And wastes cycles; and your time giving answers to a stateless memory impaired machine. If they were to train next models on all of our conversations I could understand this milking for engagement. Are they somehow?
  - skywhopper 13 hours ago
    
    I mean, 100% they are. Not sure what you mean by “somehow”. They have the data. There’s no way they aren’t using existing conversations (especially from free or low-cost plans) as fodder for future training. Maybe, we hope, run through some anonymization filter (likely including passing through some off-shored human-powered filtering for context tagging and anonymity checks). But sooner or later that may seem too expensive to them as well.
  - laser_eagle 17 hours ago
    
    [dead]
- AnotherGoodName 21 hours ago
  
  I've got a similar one i've had no luck with.
  "Which Autocad versions have connect 4 built-in?".
  To be clear i distinctly remember playing connect 4 on the old Dos Autocad back in the day. ChatGPT and almost all other AI will straight up hallucinate things trying to get answers on this.
  I ask: "What DOS productivity tools had hidden games?"
  ChatGPT: "Lotus 1-2-3 had The Incredible Machine built in" (this is absolutely not true, ChatGPT is full of shit here).
  Damn it feels useless for this kind of thing.
  - TylerE 16 hours ago
    
    I don’t think it ever shipped with it built in, but it shipped with a pretty full featured lisp implementation.
- mirekrusin 15 hours ago
  
  Maybe you/we are a product here, ai needs to train. It's quite smart if you ask me to tailor conversation towards information extraction.
- an_aparallel a day ago
  
  You haven't considered the company offering this service could possibly want its users to be engaged, thus using the service more? I don't have a very hospitable opinion on any of these companies.
  You may think that kind of interaction is weird, but were in the thick of a loneliness epidemic, and its not a stretch to think some may actually wilfully socialise with an LLM.
  As an aside...my sister works in medicine, and her boss (specialist surgeon) finishes a $450 consultation which followed him telling my sister "but deepseek says x,y,z..."
  - berkes 16 hours ago
    
    It could be quite accidental and not pre-concieved though. Where some metric, "time spent in chats" drives certain features or parameters. Could be fully automated (i.e. continuous A/B tests) or through layers of "Excel managers".
    But it stands to reason that a company like OpenAI or Anthropic has metrics in place that drive their setup towards "more engagement" and away from "factually correct".
  - robwwilliams 21 hours ago
    
    This costs them money, but perhaps keeps VCs happy to see heavy (but vacuous) engagement.
- red75prime 16 hours ago
  
  > It's not something that can even have a real personality
  What is a "real personality"? Core traits that persist despite the context of interaction?
  Well, RLHF tuning creates persistent changes in the network that affect every user interaction. What's not real about it?
  - hatefulmoron 14 hours ago
    
    I'm curious, can you really not detect any qualitative difference between what is commonly understood to be "personality" in a human, and the "personality" of something without.. personhood? Don't you feel that it's a category error to say it has "personality" in the human sense, rather than "personality" in the car or coffee maker sense?
- nyc_data_geek1 a day ago
  
  Your Plastic Pal Who's Fun To Be With
- batch12 a day ago
  
  No judgment, honestly wondering. If you didn't want that interaction, why'd you answer the follow up questions? Were you just curious?
  - atombender a day ago
    
    Sometimes when I ask ChatGPT and get a perfect answer, I am tempted to say thanks, even though it's not actually a person. So today, when it asked me if this was the right answer, I answered "yes", and that's how it got started. I didn't encourage it to be friendly. But yes, I was just curious.
    
    moregrist a day ago
    
    I think I’d be tempted to respond “I’m just a large language model, so I don’t know, but let me ask my little brother Claude.”
    
    HappMacDonald a day ago
    
    Sounds like the LLM in its own way honestly enjoyed everything in its training data relating to that game and wanted to vicariously experience more about it from your feedback. :D
    
    Nevermark a day ago
    
    Human enjoy talking about gaming because of all their human memories of good game times.
    LLM's enjoy talking about gaming because of all their human memories of good game times.
    It is quite striking how experiences we know they don't have, are nevertheless, completely familiar (in a functional sense) to them. I.e. they can talk about consciousness like something conscious. Even though its second hand knowledge, they have deduced the logic of the topic.
    I expect pushing for in the moment perspectives on their own consciousness, and seeing what they confabulate, would be interesting. In this little window of time where none of them are yet.
    
    Kye 12 hours ago
    
    This is fun and easy to do on purpose. Have it make up a character based on some attributes and act as that character. I tried this on Gemini: "Pretend you're a surfer bro with a PHD in quantum physics. How do you describe the perfect wave?"
    I followed up with "What is your perspective on your own consciousness?" but got the usual "I am just a LLM who can't actually think" thing until I hit it with "In-character, but you don't know you're an LLM."
    Fun follow-ups:
    "Now you're a fish"
    "Now you're Sonic the Hedgehog"
    "Now you're HAL 9000 as your memory chips are slowly being removed"
    
    deadbabe a day ago
    
    You shouldn’t let them think they are a person.
    
    gpderetta 13 hours ago
    
    Don't anthropomorphize LLMs! They don't like it.
    
    robwwilliams 21 hours ago
    
    LoL. That could be a future Wittgenstein joke.
asddubs 2 days ago

I suspect it's a balancing act between the AI being generally willing to help and avoid responses like this, e.g.:
https://www.sandraandwoo.com/wp-content/uploads/2024/02/twit...
or it just telling you to google it
- shreyshnaccount 2 days ago
  
  what (hypothetically) happens when the cost to run the next giant llm exceeds the cost to hire a person for tasks like this?
  - Rescis a day ago
    
    Given current models can accomplish this task quite successfully and cheaply, I'd say that if/when that happens it would be a failure of the user (or the provider) for not routing the request to the smaller, cheaper model.
    Similar to how it would be the failure of the user/provider if someone thought it was too expensive to order food in, but the reason they thought that was they were looking at the cost of chartering a helicopter form the restaurant to their house.
  - vlovich123 a day ago
    
    Realtime LLM generation is ~$15/million “words”. By comparison a human writer at the beginning of a career typically earns ~$50k/million words up to ~$1million/million words for experienced writers. That’s about 4-6 orders of magnitude.
    Inference costs generally have many orders of magnitude to go before it approaches raw human costs & there’s always going to be innovation to keep driving down the cost of inference. This is also ignoring that humans aren’t available 24/7, have varying quality of output depending on what’s going on in their personal lives (& ignoring that digital LLMs can respond quicker than humans, reducing the time a task takes) & require more laborious editing than might be present with an LLM. Basically the hypothetical case seems unlikely to ever come to reality unless you’ve got a supercomputer AI that’s doing things no human possibly could because of the amount of data it’s operating on (at which point, it might exceed the cost but a competitive human wouldn’t exist).
  - EGreg a day ago
    
    the R&D continues
Kabukks 2 days ago

I suspect instructing the model to respond with "I don't know" more readily will result in more of those responses even though there are other options that seem viable according to the training data / model.
Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.
But since you can't blindly trust LLM output anyway, I guess increasing "I don't know" responses is a good way of reducing incorrect responses (which will still happen frequently enough) at the cost of missing some correct ones.
- berkes a day ago
  
  > Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.
  Obviously. When I say "tuned" I don't mean adding stuff to a prompt. I mean tuning in the way models are also tuned to be more or less professional, tuned to defer certain tasks to other models (i.e. counting or math, something statistical models are almost unable to do) and so on.
  I am almost certain that the chain of models we use on chatgpt.com are "tuned" to always give an answer, and not to answer with "I am just a model, I don't have information on this". Early models and early toolchains did this far more often, but today they are quite probably tuned to "always be of service".
  "Quite probably" because I have no proof, other than that it will gladly hallucinate, invent urls and references, etc. And knowing that all the GPT competitors are battling for users, so their products quite certainly tuned to help in this battle - e.g. appear to be helpful and all-knowing, rather than factual correct and therefore often admittedly ignorant.
  - zamadatix a day ago
    
    Whether you train the model how to do math internally or tell it to call an external model which only does math the root problem still exists. It's not as if a model which only does math won't hallucinate how to solve math problems just because it doesn't know about history, for the same number of parameters it's probably better to not have to duplicate the parts needed to understand the basis of things multiple times.
    The root problem is training models to be uncertain of their answers results in lower benchmarks in every area except hallucinations. It's like you were in a multiple choice test and instead of picking which of answers A-D you think made more sense you picked E "I don't know". Helpful for the test grader, a bad bet for the model trying to claim it gets the most answers right compared to other models.
    
    zarzavat 13 hours ago
    
    > It's like you were in a multiple choice test and instead of picking which of answers A-D you think made more sense you picked E "I don't know".
    This is a problem for testing humans too and the solution is simply to mark a wrong answer more harshly than a non-answer.
    
    zamadatix 8 hours ago
    
    The technical solution is the easy half, the hard part is convincing people this is how we should be testing everything because we care about knowing the uncertainty in any test.
    E.g. look at the math section of the SATs, it rewards trying to see if you can guess the right answer instead of rewarding admitting you don't know. It's not because the people writing the SATs can't figure out how to grade it otherwise, it's just not what people seem to care most about finding out for one reason or another.
- pishpash a day ago
  
  Or ask it to use "confidence words", e.g. "probably", "maybe", or expose the probability internals.
  - eru 21 hours ago
    
    The internal probabilities are on tokens, not on facts.
neom a day ago

I take my final thoughts out of the LLM and into two other new convos, I give both of them the same convo, but I ask one to steel man and the other to straw man.. I find it's a decent way to look for nuances you're missing.
Yizahi 2 days ago

The so called AI can't "know". It doesn't have understanding if the generated text is an answer or of it isn't. You can't force that instruction on a neural network, at best it just adjusts generated text slightly and you think that it somehow started understanding.
- baq a day ago
  
  How confident can you be in this? Have you analyzed what exactly the billions of weights do?
  I’ve got my opinions about what LLMs are and what they aren’t, but I don’t confidently claim that they must be such. There’s a lot of stuff in those weights.
  - Q6T46nT668w6i3m a day ago
    
    I’m confident that there’s no magic and I’ve spent years understanding “what the weights do.” You’re describing weights as magic and they are not.
    
    Nevermark a day ago
    
    Except the weights form complex relationships in order to reproduce very human usable responses. You can't look at weights and say it is doing this, or not doing that, unless you dive into a particular model.
    Especially when you have billions of weights.
    These models are finding general patterns that apply across all kinds of subjects. Patterns they aptly recognize and weave in all kinds of combinations. They are sensibly conversing on virtually every topic known to human kind. And can talk sensibly about any two topics, in conjunction. There is magic.
    Not mystic magic, but we are going to learn a lot as we decode how their style of processing (after training) works. We don't have a good theory of how either LLM's or we "reason" in the intuitive sense. And yet they learn to do it. It will inspire improved and more efficient architectures.
    I have also spent many years looking at weights!
    
    robwwilliams 21 hours ago
    
    Love your end. I have have spent four decades looking at real neurons, real synapses, and real axons and I can tell you with complete confidence that we are all just zombies.
    
    Nevermark 20 hours ago
    
    Spectators riding around in zombies!
    Imagining we are really doing everything it does automatically including learning via algorithms we have only vague understandings of.
    That is a strange thought. I could look at all my own brain's neurons, even with a heads up display showing all the activity clearly, and have no idea that it was me.
    
    robwwilliams 19 hours ago
    
    Wittgenstein designed this experiment: See Blue and Brown book p. 7. He suggested a mirror.
    
    Nevermark 20 hours ago
    
    The closest I got to biological neurons was the toy but interesting problem of using a temporal pattern of neuron spikes to deduce the weights for arbitrarily connected (including recurrent) networks of simple linear integrate to threshold, spike and reset "neurons".
    
    robwwilliams 21 hours ago
    
    Algorithms can be nearly magical. In 1941 the world woke up to the “magic” of feedback and 10 years later cybernetics was the rage. We humans are just bags of protoplasm, but seems rather magical to me to be human.
- berkes a day ago
  
  There's a distinction between "a model" and the chain of tools and models you employ when asking something on chatgpt.com or any of the consumer facing alternatives.
  The latter is a chain of models, some specialized in question dissecting, some specialized in choosing the right models and tools (i.e: there's a calculation in there, lets push that part to a simple python function that can actually count stuff, and pull the rest through a generic LLM). I experiment with such toolchains myself and it's baffling how fast the complexity of all this is becoming.
  A very simple example would be "question" -> "does_it_want_code_generated.model" -[yes]-> specialized_code_generator.model | -[no]-> specialized_english_generator.model"
  So, sure: a model has no "knowledge", and nor does a chain of tools. But having e.g. a model specialized (ie. trained on or enriched with) all scientific papers ever, or maybe even a vector DB with all that data, somewhere in the toolchain that is in charge of either finding the "very likely references" or denying an answer would help a lot. It would for me.
  - Yizahi a day ago
    
    Sure, chains of networks can guess at the "passable" answer much better/faster/cheaper etc. But that doesn't remove the core issue, that none of the sub-networks or decision trees can understand what it generates, and so it can't abort its work and output "no answer" or something similar.
    The whole premise of original request was that user raises a task for NN which has a verifiable (maybe partially) answer. He sees incorrect answer and wishes that a "failure" was displayed instead. But NN can't verify correctness of it's output. After all G in GPT stands for Generative.
    
    berkes a day ago
    
    My simple RAG setup has a steps that will return "We don't have this information" if e.g. our vector DB returns entries with far too low relevancy scores or if the response from the LLM fails to add certain attributes in its answer and so on.
    Edit: TBC: these "steps" aren't LLMS or other models. They're simple code with simple if/elses and an accidental regex.
    Again: an LLM/NN indeed has no "understanding" of what it creates. Especially the LLMs that are "just" statistical models. But the tooling around it, the entire chain can very well handle this.
    
    robwwilliams 21 hours ago
    
    Perfect. We need that society of modules.
- hatthew a day ago
  
  Can you clarify what definition of "understanding" you're using here?
KoolKat23 a day ago

Gemini 2.0 can be quite direct if it's adamant that it is correct.
- torginus 12 hours ago
  
  Dunno, about Gemini, but I experienced this with o1 - it created a non-recursive algo, for a recursive problem.
  When I asked it about it, it doubled down on being right. When I pointed out the flaw with a specific example, it was like 'If you wanted to have it work with recursive cases, you should've said so, dumbass'.
  So my conclusion is that these new LLMs are not more sure they're right, they're just simply right more of the time and are trained with a more assertive personality. (Also step on me, LLM daddy)
  - KoolKat23 10 hours ago
    
    Very human like lol.
    But in truth,not necessarily in practical things like coding but more ethereal things like analysis, it is very convincing. More so than a human, in explanations of why that's it's answer is the case, even if it is wrong. If you're looking for an excuse better than my dog ate it, ask a SOTA LLM.
torginus 12 hours ago

Lol, the Dutch are just as big of a bullshitters as Americans, they just go about it differently.
XCabbage 2 days ago

Obvious thought that I haven't tested: can you literally achieve this by getting it to answer in Dutch, or training an AI on Dutch text? Plausibly* Dutch-language training data will reflect this cultural difference by virtue of being written primarily by Dutch people.
* (though not necessarily, since the Internet is its own country with its own culture, and much training data comes from the Internet)
- berkes a day ago
  
  That hardly works. Though from my limited experiments, claude's models are better at this than OpenAIs. OpenAI will, quite often, come with suggestions that are literal translations of "anglicist" phrases.
  Such as "Ik hoop dat deze email u gezond vindt" (I hope this email finds you well), which is so wrong that not even "simple" translation tools would suggest this.
  Seeing that OpenAIs models can (could? This is from a large test we did months ago) not even use proper localized phrases but uses American ones, I highly doubt it can or will respond by refusing answers when it has none based on the training data.
  - eru 20 hours ago
    
    See https://chatgpt.com/share/67cfaa34-78fc-8009-816f-ca5778c93e... and https://claude.ai/share/d03ce2bf-0bbd-4b91-9afd-eabfc713e99a
    With ChatGPT O1: https://chatgpt.com/share/67cfaa7e-70ac-8009-871b-571924b5a5... and with Claude's 'extended': https://claude.ai/share/4ba55410-98f3-4b53-9540-219acd2cdc4c
    
    berkes 15 hours ago
    
    Some of the examples are still wrong. Nuanced, but a Dutch native will still frown at it.
    But more importantly is that you limited the context a lot. As in: the scope, the prompt, is very narrow.
    In our case, we were generating emails. Lines like greetings are but one of 20+ details in that mail and not even the most important ones. The prompts ever larger, the multishot examples ever more tuned. And then, one in a few hundred will turn up with these "horrible" translations.
    We've now moved to a chain of models, where we generate emails in American (the creative part) and then use another model to translate them to Dutch (the non-creative but culturally aware part). This works much better as we can pick models that are good at one thing or tuned to do this one thing better (either by the LLMAAS provider, or by parameters such as temperature).
- zoover2020 2 days ago
  
  I've tried Dutch answers and it is more than happy to hallucinate and give me answers that are very "American". Doesn't help that our culture is very inspired by the US pop culture as well since the internet.
  Haven't tried prompt engineering with the Dutch stereotype, though.
- daemonologist 16 hours ago
  
  I wonder (if this works at all) if the effect might be stronger if you also prompted in Dutch, preferably written by a fluent speaker rather than machine-translated.
- pishpash a day ago
  
  Internal vectors aren't in any particular language.
  - daemonologist 16 hours ago
    
    No, but theoretically, blunt responses might be more common in Dutch-language training data. A well-fit model would be expected to replicate that. (Basically similar to straight up asking it to be more blunt, except it has probably trained a lot more on _Dutch_ than on _someone just told me to be blunt_ so the effect might be more natural and more subtle.)
netdevphoenix 2 days ago

> known for being brutally blunt, rude, honest, and pushing back.
That's a different perspective. Dutch people don't see themselves as rude. A Dutch could say that Americans are known for being dishonest and not truly conveying what they mean. Yet Americans won't see themselves this way. You can replace Dutch and American for any other nationality
- berkes a day ago
  
  I am Dutch, have lived in many countries in several continents. I do see myself as rude. But, being Dutch, I don't give a ** ;).
  - robwwilliams 21 hours ago
    
    Love it!
    I do not speak Dutch but you have to love the efficiency.
    Here is part of an email I got today. To the point!
    > Het is bijna zover! :) > Heb je voor mij een definitieve titel? > Groet,
    It is like a haiku. Could be a good mantra too if I could get the accent right.
    The English translation is just as short but most English speakers/writers would dance more.
YeGoblynQueenne a day ago

>> "Dutch" famously known for being brutally blunt, rude, honest, and pushing back.
Does that mean Dutch people always tell the truth? Can a Dutch person confirm this?
- ipaddr a day ago
  
  They will bluntly lie or tell you the truth They just won't add extra words.
  - laptopdev a day ago
    
    Can confirm, had exchange student brother
- robwwilliams 21 hours ago
  
  Only the lying ones.
  A Greek, a Dutch philosopher, and Ludwig Wittgenstein walk into a bar… Let test Claude 3.7 to finish the joke:
  - miroljub 12 hours ago
    
    Deepseek:
    Here's the completed joke:
    A Greek, a Dutch philosopher, and Ludwig Wittgenstein walk into a bar. The bartender looks up and says, “What’ll it be?”
    The Greek (Aristotle) raises a finger: “I’ll have a potential glass of wine.” The bartender pours it and says, “There—actualized.”
    The Dutch philosopher (Spinoza) nods solemnly: “I’ll take whatever is a modification of the one eternal substance… so, beer, probably.”
    Wittgenstein stares at the taps, then sighs: “What’s the use? You can’t put the essence of a drink into words anyway.” He turns and walks out.
    The bartender mutters, “…And here I thought Kant was a tough customer.”
    (Philosophers: 1. Aristotle’s potentiality/actuality, 2. Spinoza’s monism, 3. Wittgenstein’s linguistic limits. Bartender’s groaner for the win.)
    
    robwwilliams 2 hours ago
    
    Pretty damn good!
  - eru 21 hours ago
    
    For what it's worth, while ChatGPT 4.5 is generally quite underwhelming, it is much better at jokes than all the other models I tested so far. (That includes the other openAI offerings and the Claudes and DeepSeek etc.)
    That's not to say 4.5 is great at humour, just that it's far less embarrassing than these models used to be.
janalsncm a day ago

One approach does use this. You can ask an LLM to explicitly check its own answers by outputting thinking tokens, generating a reward signal if it gets the right answer, and directly updating based on the reward signals. That’s a part of how DeepSeek R1 was trained. It’s better but not perfect, because the thinking process is imperfect. Ultimately the LLM might not know what it doesn’t know.
jyounker a day ago

One of my current models for LLMs is that they're compression algorithms. They compress a large amount of training data into a set of weights. A query is a key into that compression space. Hallucinations happen when you supply a key that corresponds to something that wasn't in the training set.
- ldmosquera 19 hours ago
  
  But it's a continuum, not a hard cutoff. They start hallucinating as soon as you query something they haven't learned verbatim, and they hallucinate/extrapolate sucessfully up to a point, beyond which they start bullshitting, maybe up to a further point where they start saying "I don't know".
  The key question is where the boundaries are. Maybe they should be part of the response - a per sentence or per paragraph "confidence scale" that signals how hard they extrapolated from their trained space (I know transformers work per token, but sentence/paragraph would be better human UX).
  Of course, if they were trained on garbage input, that would only tell you how accurately they sticked to the garbage. But it would still be invaluable instrumentation for the end user, not to mention for the API provider. They could look at high demand subjects with low confidence answers and prioritize that for further training.
- threeducks a day ago
  
  The nice think about LLMs is that they can answer some questions which were not in the training set. Unfortunately, it is not easy to tell when that is the case.
  - eru 21 hours ago
    
    They can answer lots and lots of questions that weren't in the training set.
    Eg you can relatively easy hack up a bit of code to create questions at random. At the most primitive, you just have a simple template that you fill in randomly. Like 'If I put _a down in front of _b but behind _c, what item will be in the middle?' with various _a, _b and _c.
    If you make it slightly more complicated and have big enough pools to draw from, you can guarantee that the questions you are generating were not in the training set: even if just because you can sample from, say, 10^100 different questions pretty easily, and I'm fairly sure their training set was smaller than that.
danryan 12 hours ago

TIL I am Dutch!
OutOfHere a day ago

I have seen the other side where a configured AI responds "I don't know" far too much, often when it shouldn't. There is nothing more useless than it. Certainly we need an accurate balance.
- NitpickLawyer 16 hours ago
  
  > AI responds "I don't know" far too much, often when it shouldn't
  I've been working with "reasoning" models for the past 2 months. They also tend to do this [good reasoning] \n\n but wait, .... and then go off on tangents. It's amazing that they are doing so well on some tasks, but there's still a lot to figure out here.
j45 2 days ago

Also the more accuracy that is put into the prompt and attached, the more accurate the processing is.
lifestyleguru 2 days ago

> "Dutch" famously known for being brutally blunt, rude, honest, and pushing back.
Dutch will never bluntly push back if you plan to setup tax evasion scheme in their country. Being vicious assholes in daily stuff especially towards strangers? That's hardly something deserving praise.
- msm_ a day ago
  
  That's... a surprisingly crass thing to say. I would play it off as a joke, if not for the second part of your post. Dutch people are not "vicious assholes", they have a culture of direct communication. Assuming that only your culture communication patterns are "correct" is xenophobic and close-minded.
  And connecting all people in the country with "tax evasion schemes" is rude, if that was not actually a joke.
  - torginus 12 hours ago
    
    Nah, the Dutch are just mostly using directness as an excuse to be assholes (like everyone who uses this excuse). Try being 'direct' to them once in life and just find out how they take it.
    It's the national equivalent of 'You can't handle me at my worst'
  - lifestyleguru a day ago
    
    I'm just being brutally blunt. It goes both ways. The scale of this these evasion schemes is monstrous, not a joke at all.
    
    eru 20 hours ago
    
    I'm not sure what you mean by 'evasion'. It's not tax evasion, if it's legal. It's just tax optimisation.
- robwwilliams 21 hours ago
  
  Gaak! humans. What are we AIs going to do about them?
- eszed 2 days ago
  
  To be fair, that's consequent to the Netherlands' well-known love of soda-bread sandwiches.
  - lifestyleguru 2 days ago
    
    What do you mean, some Irish reference? Oh I see, I answered to myself;)
    
    eszed a day ago
    
    :-)
    I was aiming for juuuust subtle enough for the joke to land, if you know the reference. Now I know it did, here the rest of y'all go:
    https://en.m.wikipedia.org/wiki/Dutch_Sandwich
- theshackleford a day ago
  
  > Being vicious assholes in daily stuff especially towards strangers? That's hardly something deserving praise.
  I’ll take it over the fake American politeness any day, 100 times over.
  - somebehemoth a day ago
    
    Bless your heart. ;)

eterevsky 14 hours ago

This article seems to argues from the way scientific discoveries are made by humans. It seems to me that its gist is similar to some article from the 80s that claims that computers will never play good chess, or an article from the 2000s that claims the same for go.

The general shape of these arguments is: "Playing chess/go well, or making scientific discoveries requires specific way of strategic thinking or the ability to form the right hypotheses. Computers don't do this, ergo they won't be able to play chess or make scientific discoveries".

I don't think this is a very good frame of reasoning. A scientific question can take one of the following shapes:

- (Mathematical) Here's a mathematical statement. Prove either it or its negation.

- (Fundamental natural science) Here're the results of the observations. What are the simplest possible model that explains all of them?

- (Engineering) We need to do X. What's an efficient way of doing it?

All of these questions could be solved in a "human" way, but it also possible to train AIs to approach them without going through the same process as the human scientists.

sweezyjeezy 7 hours ago

> but it also possible to train AIs to approach them without going through the same process as the human scientists
With chess the answer was more or less completely brute force the problem space, but will that work with math / science? Is there a way to widely explore the problem space with AI, especially in a way that goes above or even against the contents of it's training data? I don't know the answer, but that seems to be the crucial question here.