2018-04-20

On names

(upcomment)
There is no standard central registry of names in the US. Your "legal name" is whatever you say it is, with the restriction that you may not change your name to commit fraud or try to avoid a debt. The flip side of that is you have very few avenues to force anyone else to encode or display your name in the manner you prefer. If you change your name or have a name that is awkward to deal with, the obligation is generally on you to work out a working relationship with everyone else who needs to know your name.

A "legal name change" is just a helpful service provided by your local government court that lets you publicly declare and record a new name, bound to your previous name, on standard papers that other government agencies and various private organizations MAY (but not MUST) pay attention to.

There are multiple issuing authorities for IDs. Public, private, commercial, corporate, educational, military, government, ... An authority to issue IDs reaches only as far as their legal mandate, and no farther, except by rough consensus and a need to make stuff work.

Each issuing authority gets to have their own regularization rules and database schema. They will pick what works for them, and have very little interest or budget in completely reworking their databases and processes to accommodate someone who is being a pain in their ass.

Whenever you intersect with government, just expect ascii7 smashing.

US banking know-your-customer rules require ascii7 smashed names for account holders, both individuals and companies, to interface to government banking regulators and auditors.

The US interstate Driver License Compact imposes ascii7 smashing and length limits, for database and lookup compatibility reasons. When a cop looks at your DL (which you do have to give him if he's detained you while you are driving a car), it had better match what pops up on his terminal when he enters the the license# and/or the plate#, or you may be are about to have a bad day.

Things don't get a lot better when dealing with passports, from any country.

Passports impose ascii7 (with various accent composition hacks) and length limits, by treaty.

Different countries impose additional different rules and length limits, for reasons various and mostly stupid. Some countries try to appear to still permit accent marks, by trying to encode accents using various slightly different composition encodings layered on top of ascii7, and hoping that everyone uses the same encoding. This kind of sort of mostly works, except when it doesn't.

Passports from countries with widespread non-latin charsets will require an ascii7 smashed name, and often require a specific approved romanization algorithm, or an approved thesaurus, or both. Such a passport MAY have an additional field for the name in a local charset, but that is entirely at the option of the issuing authority.

What is printed on the passport photo page should match whats printed in the "machine readable zone", which should match whats on the rfid chip. "Should".

Most countries require air, sea, and rail carriers to pre-transmit passenger manifest before arrival. The names on those manifests also get ascii7-smashed, and any accent composition stripped, and had better match what's printed in your passport, or you are going to have a rough time boarding at departure, or clearing passport control at arrival.

Internal and external passports in China used to also require a field encoding the name in the form of Chinese telegraphy numbers, to sidestep transliteration issues between the various Chinese languages. I don't know if it still does.

2017-10-02

On base knowledge surveys

When I was in college, one of my Research Assistant jobs was to do clerical and basic number crunching work for base knowledge surveys. It takes scrupulous and expensive controls to prevent roughly a third of the surveys from being randomly answered for the lulz. Even with the most scrupulous controls and careful interview technique, there is still roughly 5% noise.

In other words, any newspaper headline of a newspaper article that is a restatement of the abstract of some random paper from some random academic journal based on some survey, especially if its a prepub paper or open access journal, that is of the form of "ONE THIRD OF LILPUTIANS ARE STUPID, ACCORDING TO SURVEY", is rank bullshit, and is anti-knowledge, as in anyone who reads it is less informed afterwards than before.

Show me the survey questions, the interview technique, the responder selection process, the population size, the population demographics, the pre-survey stats oversight board approval, the post-survey stats oversight board signoff, and the raw data, and THEN we will talk.

(My lead researcher when I was a RA sat on several of those Stats Oversight Approval Boards. I got well schooled in several of the ways that a researcher could lie to themselves, knowingly and unknowingly.)

2017-04-17

A theory: Bank of Apple

I have a theory, about Apple.

Apple has a quarter of a trillion dollars.  In cash.

That is a ludicrous amount of money.  That is so much money, that it is too much money.  It is too much to deposit as passive cash, because a bank can no longer be a neutral unbiased 3rd party when there is an account that big.   Not even a large nation's central bank.  When you have that much cash, risks like counterparty risk, fiat currency problems, and government confiscation start becoming a significant amount of the risk profile.

The way that most very large companies waste very large amounts of cash is to buy other large companies.  This almost invariably is a terrible idea.  The buying company almost always overpays, especially when they start bidding against someone else.  And companies on sale for a discount, are for sale for a discount for a reason.  So called claimed "synergies" almost never are realized.  Costs are always higher than expected.  Big mergers and aquisitions almost always are a mechanism where the senior executives burn the investors money in order to make said executives seem or feel more important.

Steve Jobs never felt the need to do M&A to seem or feel more important, so Apple only did aquistitions to obtain specific skilled teams or specific technologies.  And in his passing, Apple has generally continued this pattern.  (I think the aquisition of Beats was very non-Apple, and was a mistake, and probably is seen as an expensive mistake and expensive lesson by Apple's current leadership.)

But so and still, Apple has Too Much Cash.  What to do with it?

When you have a pile of cash that is so large that it in itself starts turning into a local economic distortion, there really is only one profitable thing to do with it:  Wrap a banking license around it, and open a bank.

Think about it.  Apple could run a bank, a very different kind of bank, with a much lower risks of fraud and loss.  They already have secure cryptoprocessors... everywhere! They can use iOS devices as the secure terminals, both for customers and for merchants.  They can use ApplePay for retail transactions.  They can use what they know about your from user's iOS devices and their AppleID for KYC.  They can push secure finantial messages around via the iMessage framework.  

With all this in place they could undercut all of the existing payment networks and still make, well, bank.

2016-05-24

notes from an opinionated talk about running IPv6 in production

A few years ago, I was at SCaLE, and attended an excellent talk by someone who operated several campus-wide internetworks, and their hard won experience with IPv6.  They were very opinionated.  I loved it.  Here are some of the notes from that talk:


QoS is a bad word.
Control freaks love QoS.
They can debug it themselves.
People who have are held to SLAs operating production networks have better things to waste their time on, and better ways to crash their switches.

"But I'm not running IPv6!"  That means you actually are, and are nor longer in control of your network.
"I will block IPv6!".  Say goodbye to all the grants that pay your salary.  And everyone's desktops and devices will just make tunnels anyway.

Say NAT one more time, I dare you.

If you think that NAT is protecting you, let me know who you are, so I can blackhole your address range and your IS.

Turning off v4 ICMP is just stupid.
There are lots of stupid people.

You cannot turn off icmp6.
There is no frag in v6.
Thus mtu detect must be on.
Thus icmp6 must be on.
Live with it.

dhcp6 is port 547 not 67

2016-02-13

Regarding that article about gender bias in GitHub Pull Requests

Regarding "Gender Bias In Open Source: Pull Request Acceptance Of Women Vs. Men", or even worse, regarding all the uncritical and breathless articles by the BBC, Vice, HuffPost, and so forth:

First of all, anyone who names their project "DeveloperLiberationFront" and uses an icon of a raised fist in woodcut style, has already predeclared their bias away from objective truth.

Second, the authors of the paper exhibit little knowledge of about the large differences in workflow between different projects, and no knowledge about all the different ways that PRs are used and all the different meanings of an "abandoned PR", and also their definition of "project insider" is broken, as for many projects, an "insider" has write access, and may never use PRs at all.

Third, despite GitHub's growing influence, just grabbing tens of thousands of GH PRs is not in the slightest bit representative.

Fourth, their process for computing the gender of PR authors is laughably bad, for reasons that went on for 3 paragraphs before I edited down this text.

Fifth, how many have heard of "p-hacking"? or even have ever actually computed a p value since you took that really annoying stats class in college?  Did you even notice that the this paper both obviously did p-hacking, and then didn't even report the p values?

Finally, allow me to present the following disruption to the breathless and self-reinforcing narrative:

"So, let’s review. A non-peer-reviewed paper shows that women get more requests accepted than men. In one subgroup, unblinding gender gives women a bigger advantage; in another subgroup, unblinding gender gives men a bigger advantage. When gender is unblinded, both men and women do worse; it’s unclear if there are statistically significant differences in this regard. Only one of the study’s subgroups showed lower acceptance for women than men, and the size of the difference was 63% vs. 64%, which may or may not be statistically significant. This may or may not be related to the fact, demonstrated in the study, that women propose bigger and less useful changes on average; no attempt was made to control for this. This tiny amount of discrimination against women seems to be mostly from other women, not from men."
// ScottAlexander

If this was a real paper, submitted for real peer review, a good peer review would be:

"1. Report gender-unblinding results for the entire population before you get into the insiders-vs.-outsiders dichotomy.
2. Give all numbers represented on graphs as actual numbers too.
3. Declare how many different subgroup groupings you tried, and do appropriate Bonferroni corrections.
4. Report the magnitude of the male drop vs. the female drop after gender-unblinding, test if they’re different, and report the test results.
5. Add the part about men being harder on men and vice versa, give numbers, and do significance tests.
6. Try to find an explanation for why both groups’ rates dropped with gender-unblinding. If you can’t, at least say so in the Discussion and propose some possibilities.
7. Fix the way you present “Women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable”, at the very least by adding the comparable numbers about the similar drop for men in the same sentence. Otherwise this will be the heading for every single news article about the study and nobody will acknowledge that the drop for men exists at all. This will happen anyway no matter what you do, but at least it won’t be your fault.
8. If possible, control for your finding that women’s changes are larger and less-needed and see how that affects results. If this sounds complicated, I bet you could find people here who are willing to help you.
9. Please release an anonymized version of the data."
// ScottAlexander

I am willing to bet money that doing real honest academic statistical analysis of their raw data will invalidate their implications and their claims.

2016-01-22

Why SSH keys dont have metadata

And other tech rant. It was recently asked, in a forum that I read, the following: "Why is it that SSH public keys don’t have an embedded expiration date, anyway? PKI certificates have them."

My response:

Because as soon as you start adding all sorts of metadata to a key, then everyone will start adding all sorts of metadata to keys, with all sorts of obscure rules about how metadata interact with the environment and various implementations whether a key works or not.

And then the lawyers will show up and insist that you imbed 30 page PDFs of Word docs of someone’s T&Cs and their contracts of adhesion and their “don't hold anyone with money responsible for anything” disclaimers into metadata (you think I joke, I do not at all, this literally regularly happens with “standards based” PKI certs).

And then your keys are going to be huge weirdly encoded binary blobs of shit that you don’t have good tools to manipulate. And you will need to keep special indexes of them, and “bundles” of them, in multiple conflicting filesystem paths and “key stores”.

Part of why SSH took off at all in the first is because it doesn’t have this complex garbage wankery . An SSH public key is a SINGLE LINE, of printable ASCII7. You can edit and clean up your ~/.ssh/authorized_keys file with a textmode text editor.

The lack of metadata in SSH is a feature, not a problem.

2016-01-20

This is how to do it, or waving my cane.

1. Design a data abstraction that solves a class of problems.

2. Design a good wire protocol for that abstraction.

3. Better yet, design 2 protocols: one server-to-server and one client-to-server. Federation is the only model that has ever scaled large enough.

4. Implement a simple as possible server. Do not try too hard to make it performant, just very easy to install and very easy to understand. This is the protocol reference implementation.

5. Implement an open source client library, that completely covers the entire data model and the entire wire protocol.

6. Implement another open source client library, in a very different programming language. If this is difficult, you let your knowledge of your favorite language overconstrain the wire protocol. Go back to step 2 and fix it.

7. Implement a command line client on one of those libraries. Again, it must completely cover the entire data model.

8. Implement an ok GUI app.

9. Implement a very high performance highly scalable server. If you are tempted to change the wire protocol to do this, you screwed up.

10. Now, and only now, you can implement a very nice easy to use GUI. At this point, and at this point only, do you bring in any "designers", "UX" people, or anyone who uses Photoshop as working tool.


Of course, for the past 15 years, everyone has been doing this backwards, with disastrous results. It takes huge amounts of wasted CPU and wasted money by the millions and billions to make all the resulting garbage work at all.