2016-05-24

notes from an opinionated talk about running IPv6 in production

A few years ago, I was at SCaLE, and attended an excellent talk by someone who operated several campus-wide internetworks, and their hard won experience with IPv6.  They were very opinionated.  I loved it.  Here are some of the notes from that talk:


QoS is a bad word.
Control freaks love QoS.
They can debug it themselves.
People who have are held to SLAs operating production networks have better things to waste their time on, and better ways to crash their switches.

"But I'm not running IPv6!"  That means you actually are, and are nor longer in control of your network.
"I will block IPv6!".  Say goodbye to all the grants that pay your salary.  And everyone's desktops and devices will just make tunnels anyway.

Say NAT one more time, I dare you.

If you think that NAT is protecting you, let me know who you are, so I can blackhole your address range and your IS.

Turning off v4 ICMP is just stupid.
There are lots of stupid people.

You cannot turn off icmp6.
There is no frag in v6.
Thus mtu detect must be on.
Thus icmp6 must be on.
Live with it.

dhcp6 is port 547 not 67

2016-02-13

Regarding that article about gender bias in GitHub Pull Requests

Regarding "Gender Bias In Open Source: Pull Request Acceptance Of Women Vs. Men", or even worse, regarding all the uncritical and breathless articles by the BBC, Vice, HuffPost, and so forth:

First of all, anyone who names their project "DeveloperLiberationFront" and uses an icon of a raised fist in woodcut style, has already predeclared their bias away from objective truth.

Second, the authors of the paper exhibit little knowledge of about the large differences in workflow between different projects, and no knowledge about all the different ways that PRs are used and all the different meanings of an "abandoned PR", and also their definition of "project insider" is broken, as for many projects, an "insider" has write access, and may never use PRs at all.

Third, despite GitHub's growing influence, just grabbing tens of thousands of GH PRs is not in the slightest bit representative.

Fourth, their process for computing the gender of PR authors is laughably bad, for reasons that went on for 3 paragraphs before I edited down this text.

Fifth, how many have heard of "p-hacking"? or even have ever actually computed a p value since you took that really annoying stats class in college?  Did you even notice that the this paper both obviously did p-hacking, and then didn't even report the p values?

Finally, allow me to present the following disruption to the breathless and self-reinforcing narrative:

"So, let’s review. A non-peer-reviewed paper shows that women get more requests accepted than men. In one subgroup, unblinding gender gives women a bigger advantage; in another subgroup, unblinding gender gives men a bigger advantage. When gender is unblinded, both men and women do worse; it’s unclear if there are statistically significant differences in this regard. Only one of the study’s subgroups showed lower acceptance for women than men, and the size of the difference was 63% vs. 64%, which may or may not be statistically significant. This may or may not be related to the fact, demonstrated in the study, that women propose bigger and less useful changes on average; no attempt was made to control for this. This tiny amount of discrimination against women seems to be mostly from other women, not from men."
// ScottAlexander

If this was a real paper, submitted for real peer review, a good peer review would be:

"1. Report gender-unblinding results for the entire population before you get into the insiders-vs.-outsiders dichotomy.
2. Give all numbers represented on graphs as actual numbers too.
3. Declare how many different subgroup groupings you tried, and do appropriate Bonferroni corrections.
4. Report the magnitude of the male drop vs. the female drop after gender-unblinding, test if they’re different, and report the test results.
5. Add the part about men being harder on men and vice versa, give numbers, and do significance tests.
6. Try to find an explanation for why both groups’ rates dropped with gender-unblinding. If you can’t, at least say so in the Discussion and propose some possibilities.
7. Fix the way you present “Women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable”, at the very least by adding the comparable numbers about the similar drop for men in the same sentence. Otherwise this will be the heading for every single news article about the study and nobody will acknowledge that the drop for men exists at all. This will happen anyway no matter what you do, but at least it won’t be your fault.
8. If possible, control for your finding that women’s changes are larger and less-needed and see how that affects results. If this sounds complicated, I bet you could find people here who are willing to help you.
9. Please release an anonymized version of the data."
// ScottAlexander

I am willing to bet money that doing real honest academic statistical analysis of their raw data will invalidate their implications and their claims.

2016-01-22

Why SSH keys dont have metadata

And other tech rant. It was recently asked, in a forum that I read, the following: "Why is it that SSH public keys don’t have an embedded expiration date, anyway? PKI certificates have them."

My response:

Because as soon as you start adding all sorts of metadata to a key, then everyone will start adding all sorts of metadata to keys, with all sorts of obscure rules about how metadata interact with the environment and various implementations whether a key works or not.

And then the lawyers will show up and insist that you imbed 30 page PDFs of Word docs of someone’s T&Cs and their contracts of adhesion and their “don't hold anyone with money responsible for anything” disclaimers into metadata (you think I joke, I do not at all, this literally regularly happens with “standards based” PKI certs).

And then your keys are going to be huge weirdly encoded binary blobs of shit that you don’t have good tools to manipulate. And you will need to keep special indexes of them, and “bundles” of them, in multiple conflicting filesystem paths and “key stores”.

Part of why SSH took off at all in the first is because it doesn’t have this complex garbage wankery . An SSH public key is a SINGLE LINE, of printable ASCII7. You can edit and clean up your ~/.ssh/authorized_keys file with a textmode text editor.

The lack of metadata in SSH is a feature, not a problem.

2016-01-20

This is how to do it, or waving my cane.

1. Design a data abstraction that solves a class of problems.

2. Design a good wire protocol for that abstraction.

3. Better yet, design 2 protocols: one server-to-server and one client-to-server. Federation is the only model that has ever scaled large enough.

4. Implement a simple as possible server. Do not try too hard to make it performant, just very easy to install and very easy to understand. This is the protocol reference implementation.

5. Implement an open source client library, that completely covers the entire data model and the entire wire protocol.

6. Implement another open source client library, in a very different programming language. If this is difficult, you let your knowledge of your favorite language overconstrain the wire protocol. Go back to step 2 and fix it.

7. Implement a command line client on one of those libraries. Again, it must completely cover the entire data model.

8. Implement an ok GUI app.

9. Implement a very high performance highly scalable server. If you are tempted to change the wire protocol to do this, you screwed up.

10. Now, and only now, you can implement a very nice easy to use GUI. At this point, and at this point only, do you bring in any "designers", "UX" people, or anyone who uses Photoshop as working tool.


Of course, for the past 15 years, everyone has been doing this backwards, with disastrous results. It takes huge amounts of wasted CPU and wasted money by the millions and billions to make all the resulting garbage work at all.

2015-12-11

Idea: RedFish aggregators, and running them on OpenSwitch

Once upon a time, when you needed to "do stuff" to take care of a computer, you had to go there in person.  By "do stuff", that means things like: turning it off and on, looking to see if the AC was working, were the tape or disk motors broken, were any of the red warning lights on, had the UPS tripped, and so forth.   But, for many and obvious reasons, it was useful to do all this kind of stuff from a distance.

This led to the creation of "IPMI", which was built into most computers that were designed to be used in racks and datacenters.  With IMPI, a team of sysadmins could remotely turn computers on and off, check temperature, fans, power, network carrier, installed cards and devices, and read off model numbers, part numbers, and serial numbers.

IPMI is currently being improved/replaced with a thing called "RedFish".  RedFish does all the same sort of things, but it is designed in a way that is called "RESTful", which means it works the same way that web applications work, which makes it a lot easier to write tools that speak it.  Another cool thing about RedFish is that it accidentally also looks like a complete database of a "computer like thing", and does it in a way that "things" can be inside "things" and connected to other "things" all within how the protocol works.

And then I had an idea...

Write a web application that scans the local network looking for RedFish servers, and then itself acts like a RedFish server that integrates all these other smaller RedFish servers.

You can even stack this, making it so at a higher level, one of these "RedFish aggregators" discovers and integrates the lower level ones, and so on up.   Eventually you would have a top level one that would give you all the data and all the control over an entire datacenter or even larger set of data centers.

It wouldn't even be that terribly hard to write a small demonstration implementation.  It would be a challenge to make it fast and efficient, and to properly handle caches and avoid accidental recursion loops, but it doesn't look like a really difficult one.


To use something like this for real, the logical place to put it would be in the network switches.   But that used to be difficult, because production level network switches have been very closed and proprietary.  However, that's changing.   There is a new open source project spinning up right now, called "OpenSwitch".  If I was to push this RedFish aggregator so that it would be real world useful, I would make it a be a module that runs in the reference OpenSwitch box.


How hard could it be?

2015-06-07

About media leakers

I wonder about media leakers.

I'm not talking about whistleblowers, who reveal coverups by governments and corporations that are keeping secrets of bad or illegal actions.

I'm talking about people who "confidentially source" to the media details of business negotiations, media productions, and gossip of private heartache.   Things that are private and confidential for a reason, will be revealed when they are properly baked, and that do nobody any good for being revealed early, except maybe for a burst of clickstream traffic for the "news" source that "scooped" it.

I know a fair number of secrets.  Some of them are close friends' private heartaches, which are theirs to reveal, if ever.  And some of them are business negotiation secrets incidental to my job, and a few of them part of my job to know.   I actually go out of my way to avoid learning things I shouldn't need to know at my employer, just so as to firewall myself from even the appearance of impropriety.

Any of them, if I "confidentially sourced" them to the tech press, would do nothing but cost money that is not mine for no honest gain to anybody, possibly prevent good things that I would like to have happen from happening, and would betray my own principles I try to hold myself to

So, why do other people do it?

2015-05-25

A temporary mistake

I don't care about business models, I care about applications, and at true billion user trillion device scale the only scaling pattern that succeeds is user visible federation.

Email, the DNS, the HTML/HTTP hyperlink, XMPP, and the blockchain have no rent seeking gatekeeper business model, and do not require billion dollar data centers.

The past 15 year drive to unitary silo'ed apps with a rent-seeking gatekeeper has been a mistake that has diverted too much engineering effort towards just keeping them running instead delivering user-desired features and value, and has been driven by the corrupting need of VCs for their mythical billion dollar exits, and the telco encouraged temporary exhaustion of global address space and thus a temporary breaking of the end to end principle.

This is temporary, unsustainable, and not scalable to 10 billion users.