No, sometimes it is just Spanish football as for everything behind Cloudflare. Which is the case for this blog being blocked right now and redirecting to another page:
"El acceso a la presente dirección IP ha sido bloqueado en cumplimiento de lo dispuesto en la Sentencia de 18 de diciembre de 2024, dictada por el Juzgado de lo Mercantil nº 6 de Barcelona en el marco del procedimiento ordinario (Materia mercantil art. 249.1.4)-1005/2024-H instado por la Liga Nacional de Fútbol Profesional y por Telefónica Audiovisual Digital, S.L.U.
https://www.laliga.com/noticias/nota-informativa-en-relacion..."
At least you get some message about why. I'm on Vodafone and the only thing I saw was "Por causas ajenas a Vodafone, esta web no está disponible".
Fucking censorship sucks, and seemingly people still see Spain as a modern democracy when shit like this happens in public and everyone knows about it, yet here we are. Because of football we can't browse the web when there are matches...
Most people outside of Spain? Just as one example, "The Economist Democracy Index" lists Spain at spot 21, yet we have rampant government censorship, make that make sense.
Right, another thing you can try (if you haven't) is traveling to any country except Spain and think what those people think of Spain, and you'll learn the same thing.
It's intentional- If people can't use the internet they're more likely to watch the "game." For once management might have learned something from employees- take a dive, cry foul.
The full maxim I was taught being, “it’s either DNS or permissions”.
The fatal design flaw for the Domain Name System was failure to learn from SCSI, viz. that it should always be possible to sacrifice a goat to whatever gods are necessary to receive a blessing of stability. It hardly remains to observe that animal sacrifice is non-normative for IETF standards-track documents and the consequences for distributed systems everywhere are plainly evident.
Goats notwithstanding, I think it is splitting hairs to suggest that the phrase “it’s always DNS” is erroneously reductive, merely because it does not explicitly convey that an adjacent control-plane mechanism updating the records may also be implicated. I don’t believe this aphorism drives a misconception that DNS itself is an inherently unreliable design. We’re not laughing it off to the extent of terminating further investigation, root-cause analysis, or subsequent reliability and consistency improvement.
More constructively, also observe that the industry standard joke book has another one covering us for this circumstance, viz. “There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of processing 2. Exactly-once delivery”
SCSI had a reputation of being very stable and yet very finicky. Stable in the sense that not using the CPU for transfers yielded good performance and reliability. The finicky part was the quality of equipment (connectors, adapters, cables and terminators) something that led to users having to figure out the best order of connecting their devices in a chain that actually worked. “Hard drive into burner an always the scanner last.”
Is this meant to be a defense of the DNS protocol? I’ve never assumed the meme was that the DNS protocol is flawed, but that these changes are particularly sensitive/dangerous.
At Google we noticed the main cause of outages are config changes. Does that mean external config is dangerous? Of course not! But it does remind you to be vigilant
Paul Tagliamonte sounds like a nice guy who has thought about these issues at length. He's reached the second level of DNS enlightenment: "There's no way it's DNS".
Finality will arrive, and Paul will internalize the knowledge.
He links to the Slack incident. But that problem wasn’t caused by a DNSSEC rollout; the problem was entirely caused by a completely botched attempt to back out of DNSSEC, by doing it the worst way possible.
For DNS as a service to work, it has to be accessible and give the right answers. It doesn't matter why it is not accessible or why it doesn't give the right answers. If it doesn't, then the service is broken.
DNS is in the unique position that it is relatively high up in the network stack, so lots of (network) failures affect DNS as a service. It is a big distributed database, which gives many possibilities for wrong data, it is used by almost all applications, so a failure of DNS as a service is highly noticeable.
Finally, DNS has by nature (some what) centralized choke points. If you have a domain like company.com, then just about everything that company does has to go through the DNS servers for company.com. Any small failure there can have a huge effect.
I had the CEO and CTO of our ccTLD registry give a guest lecture to my CS students, and one question came up regarding the AWS incident.
Prior to the question, the CEO boasted a 100% uptime (not just five nines), and the CTO said “We’re basically 30 people maintaining a 1GB text file.”
So the question was, “How come 30 people can have 100% uptime, and the biggest cloud with all of its expertise can’t? Sure, it was DNS, but are you even doing the same thing?”
And the answer was, (paraphrasing) “No, what we do is simple. They use DNS to solve all sorts of distributed problems.”
As did the CTO with all of these new record types embedding authentication. But running CoreDNS in a Kubernetes megacluster is not “maintaining a 1GB text file”.
It's worth noting that the meme of "it was DNS", including the haiku[0], is from the old school sysadmin world, which has a lot more terrible DNS implementations than modern stuff (especially including Active Directory which has DNS attached to a massive complex system that does dozens of other things as well and because of which has its reliability suffer), so the meme is really a reflection of a harsher time.
It’s DNS far too often in large part because Linux has the default behaviour of having a singular (“the”) name server.
If you configure multiple, this is not the same as in Windows,
MacOS, iOS, or even Android. It’s not a pair of redundant servers, it’s a sequential list of logically layered configurations, like an override.
An outage of the primary will cause repeated timeouts until it comes back up. Contrast this with every other operating system that does not seek to emulate forever and ever the specific network setup of some computer lab at Berkeley in the 1970s, where failover is near instant (sub-second in Windows Vista and later) and persists, so that a small failure doesn’t become a big one.
To compound things, thanks to response caching and the relatively good stability of typical DNS servers, this failure mode is rare enough that most admins have never encountered it and probably don’t even recognise the problem. Or worse, they’ll handwave it away saying things starting with “You can…” or “You should…”. Who’s this “You” person? It’s not me! It’s not most admins, none I’ve met are named You. None have changed this default in my experience. Not one. That would be a design decision requiring sign off, testing, rollout, etc…
No, sometimes it is just Spanish football as for everything behind Cloudflare. Which is the case for this blog being blocked right now and redirecting to another page:
"El acceso a la presente dirección IP ha sido bloqueado en cumplimiento de lo dispuesto en la Sentencia de 18 de diciembre de 2024, dictada por el Juzgado de lo Mercantil nº 6 de Barcelona en el marco del procedimiento ordinario (Materia mercantil art. 249.1.4)-1005/2024-H instado por la Liga Nacional de Fútbol Profesional y por Telefónica Audiovisual Digital, S.L.U. https://www.laliga.com/noticias/nota-informativa-en-relacion..."
At least you get some message about why. I'm on Vodafone and the only thing I saw was "Por causas ajenas a Vodafone, esta web no está disponible".
Fucking censorship sucks, and seemingly people still see Spain as a modern democracy when shit like this happens in public and everyone knows about it, yet here we are. Because of football we can't browse the web when there are matches...
Who sees Spain as a modern democracy? Only those who benefit from the rampant clientelism.
Most people outside of Spain? Just as one example, "The Economist Democracy Index" lists Spain at spot 21, yet we have rampant government censorship, make that make sense.
Those lists are a joke. They show how much the priorities of the government align with those of the newspaper they are printed on.
Right, another thing you can try (if you haven't) is traveling to any country except Spain and think what those people think of Spain, and you'll learn the same thing.
It's intentional- If people can't use the internet they're more likely to watch the "game." For once management might have learned something from employees- take a dive, cry foul.
The full maxim I was taught being, “it’s either DNS or permissions”.
The fatal design flaw for the Domain Name System was failure to learn from SCSI, viz. that it should always be possible to sacrifice a goat to whatever gods are necessary to receive a blessing of stability. It hardly remains to observe that animal sacrifice is non-normative for IETF standards-track documents and the consequences for distributed systems everywhere are plainly evident.
Goats notwithstanding, I think it is splitting hairs to suggest that the phrase “it’s always DNS” is erroneously reductive, merely because it does not explicitly convey that an adjacent control-plane mechanism updating the records may also be implicated. I don’t believe this aphorism drives a misconception that DNS itself is an inherently unreliable design. We’re not laughing it off to the extent of terminating further investigation, root-cause analysis, or subsequent reliability and consistency improvement.
More constructively, also observe that the industry standard joke book has another one covering us for this circumstance, viz. “There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of processing 2. Exactly-once delivery”
A SCSI bus always needed three terminations: one at either end of the cable, and a black rooster.
what is the connection with SCSI?
SCSI had a reputation of being very stable and yet very finicky. Stable in the sense that not using the CPU for transfers yielded good performance and reliability. The finicky part was the quality of equipment (connectors, adapters, cables and terminators) something that led to users having to figure out the best order of connecting their devices in a chain that actually worked. “Hard drive into burner an always the scanner last.”
We used to joke that it should be called SCSl: System, Cables, Scanner last.
Why Computers engineers refuse to talk with manufacturing graybeards that operate critical systems at scale ?
The design shit I am seeing would not pass at a chemical plant not even a preliminary review.
I would greatly appreciate a concrete example, search term, or book if you can think of one.
I don't know any manufacturing graybeards. Where could I meet some?
Conferences! IEEE, AICHE, IMTS, Fabtech, Automate, Productronica
Is this meant to be a defense of the DNS protocol? I’ve never assumed the meme was that the DNS protocol is flawed, but that these changes are particularly sensitive/dangerous.
At Google we noticed the main cause of outages are config changes. Does that mean external config is dangerous? Of course not! But it does remind you to be vigilant
Paul Tagliamonte sounds like a nice guy who has thought about these issues at length. He's reached the second level of DNS enlightenment: "There's no way it's DNS".
Finality will arrive, and Paul will internalize the knowledge.
> a DNSSEC rollout bricking prod for hours
He links to the Slack incident. But that problem wasn’t caused by a DNSSEC rollout; the problem was entirely caused by a completely botched attempt to back out of DNSSEC, by doing it the worst way possible.
What's your point?
Truth. Unlike some people, I find it important.
What decision would people make differently knowing the extra detail you just provided?
This takes a rather narrow view at what is DNS.
For DNS as a service to work, it has to be accessible and give the right answers. It doesn't matter why it is not accessible or why it doesn't give the right answers. If it doesn't, then the service is broken.
DNS is in the unique position that it is relatively high up in the network stack, so lots of (network) failures affect DNS as a service. It is a big distributed database, which gives many possibilities for wrong data, it is used by almost all applications, so a failure of DNS as a service is highly noticeable.
Finally, DNS has by nature (some what) centralized choke points. If you have a domain like company.com, then just about everything that company does has to go through the DNS servers for company.com. Any small failure there can have a huge effect.
So DNS is a pretty exciting field to work in.
Well sure... it could be BGP
I had the CEO and CTO of our ccTLD registry give a guest lecture to my CS students, and one question came up regarding the AWS incident.
Prior to the question, the CEO boasted a 100% uptime (not just five nines), and the CTO said “We’re basically 30 people maintaining a 1GB text file.”
So the question was, “How come 30 people can have 100% uptime, and the biggest cloud with all of its expertise can’t? Sure, it was DNS, but are you even doing the same thing?”
And the answer was, (paraphrasing) “No, what we do is simple. They use DNS to solve all sorts of distributed problems.”
As did the CTO with all of these new record types embedding authentication. But running CoreDNS in a Kubernetes megacluster is not “maintaining a 1GB text file”.
Maintaining uptime on complex systems is hard.
That’s why the best systems have as little complexity as possible
But that doesn’t help boost your resume or get a bonus.
It's worth noting that the meme of "it was DNS", including the haiku[0], is from the old school sysadmin world, which has a lot more terrible DNS implementations than modern stuff (especially including Active Directory which has DNS attached to a massive complex system that does dozens of other things as well and because of which has its reliability suffer), so the meme is really a reflection of a harsher time.
[0] the original source of the haiku: https://www.reddit.com/r/sysadmin/comments/4oj7pv/network_so...
> but it is not the operational hazard it’s made out to be
Until you flip that DNSSEC toggle
It's always DNS, except when it's BGP.
This is a beautifully designed page.
I wish it had a little bit more padding on mobile, but I agree otherwise
it could also be gamma rays or a variety of problems that seem to appear and disappear between chairs and keyboards.
memes are jokes. people taking jokes as something other is the problem.
Resolver limitations, as opposed to server or protocol issues, are in my view the main reason why "it is always DNS".
It’s DNS far too often in large part because Linux has the default behaviour of having a singular (“the”) name server.
If you configure multiple, this is not the same as in Windows, MacOS, iOS, or even Android. It’s not a pair of redundant servers, it’s a sequential list of logically layered configurations, like an override.
An outage of the primary will cause repeated timeouts until it comes back up. Contrast this with every other operating system that does not seek to emulate forever and ever the specific network setup of some computer lab at Berkeley in the 1970s, where failover is near instant (sub-second in Windows Vista and later) and persists, so that a small failure doesn’t become a big one.
To compound things, thanks to response caching and the relatively good stability of typical DNS servers, this failure mode is rare enough that most admins have never encountered it and probably don’t even recognise the problem. Or worse, they’ll handwave it away saying things starting with “You can…” or “You should…”. Who’s this “You” person? It’s not me! It’s not most admins, none I’ve met are named You. None have changed this default in my experience. Not one. That would be a design decision requiring sign off, testing, rollout, etc…
I’ve seen this take down corporations for a day.
Defaults matter.
A lot of the time it's cabling.
Nope, the other times it's CORS
Though at least with CORS, once you actually get the damn thing working, it keeps working.
Tell that to AWS East 1
[dead]