2018-11-21

A gentle lecture, what a codec is

I recently wrote an offhand post about doing software archaeology at my job into the FFmpeg project, and one of my nephews, who is very smart but his passions go doing a different path than mine responded with: "I have no idea what you're talking about but I'm sure it's great", and thus so, I wrote up a slightly longer lecture/essay as a comment reply. Then my friend Tim Lord responded with: "that description of codecs deserves to be someplace besides only in a comment on a Facebook post -- cogent, unpatronizing, good refresher". And thus so, here it is:

A codec is a software that turns sound or video into a computer file or back again. Every phone call you have ever heard, and every movie you have ever seen, and all the recorded music you have ever heard (except for actual film movies or phonographs or old cassette tapes) has been processed by an enCODer to record it, and another DECoder to play it back to you.
There are many many many different codecs. Early ones were designed around the limits that computers were not very fast or very powerful, so they did not do a very good job of using the fewest number of computer bits for the best possible audio or video. We have codecs today that are very very good at it, because our computers are fast and powerful enough.
There are many old codecs we can't easily stop using because they are built into systems that can't be easily all replaced at once, such as satellite receivers, telephone switches, and handset cellphones.
Some codecs were created to play movies and sound for computer games, and so the codec software was built into that particular game, and used only for one or a few games, and then never used again.
Codecs are very difficult to design well, because they depend on how human brains, human eyes, and human ears work. To save space, a codec does not want to spend computer power saving or playing back the parts of music or video that your brain cannot hear or see. So codec designers have to study human perception, and have to test things out on human volunteers, which is slow and expensive.
Widely used codecs are often "standards". Corporations, governments, and universities will work together to carefully design a codec that then can be used in many places at once, so that lots of systems can all talk to each other.
One of those groups is called the "Motion Picture Experts Group", which is supervised by an organization called the "International Standards Organization", and one of the standard codecs they designed was called "Audio Layer", and they made several versions of Audio Layer until the 3rd version was good enough. Thus "ISO MPEG version 2 AL version 3", or for short "MP3".
FFmpeg is an open source software project that was originally an implementation of one of the MPEG video codecs, but since then has become a project that tries to have an implementation of every possible codec. FFmpeg contains old codecs that are no longer used, old codecs that are still widely used, codecs that were used in old video games and never used again, new codecs that are now used a lot, and also lots of experimental codecs that people wrote to figure out what does and does not work in a codec.
Because FFmpeg has so many codecs in it, people now use it to "translate anything to anything", and they also use it to analyze and process audio and video information. My own employer uses FFmpeg in many places inside our company, to do lots of the things that our customers pay us for.

My nephew then had the question "So JPEG artifacts are because of an outdated codec?", to which I responded:
Every codec has artifacts if driven too hard, if its told "no, compress it even harder, fewer bits". There are newer codecs that are better than JPEG that don't start having visible artifacts so soon, and have artifacts that are less distracting. But we can't change all the image viewing software everywhere, so we will have to live with JPEG for a long time.

2018-10-28

floating an idea: an APA

I have no idea why I'm considering doing such a crazy thing, but I'm considering starting and running an APA. How an APA works:
  1. You write or draw on up to two sheets of paper, double sided.
  2. You send it to me, either via postal mail or email PDF.
  3. I collate them all together into a stack.
  4. I photocopy and bind the stack, making N copies. Enough to send one in the postal mail out to each member.
  5. I mail a copy to everyone.
  6. You get it the post. You read and enjoy it.
  7. You write or draw on up to two sheets of paper...
Some notes:
  • I'll be willing to cover the copy costs and postage, at least until it gets too expensive.
  • I'm willing to have members with non-US international addresses, again until it gets too expensive.
  • Other members will not see your postal address, unless you yourself decide to put it on one of your pages.
  • You can send your contribution in as a PDF, or as actual sheets.
  • Paper size has to be 8.5x11. Print will be BW photocopy.
  • I reserve the right to change number of sheets members get to send in
  • I reserve the right to cap the number of members.
  • I reserve the right to drop any contribution, and I reserve the right to evict any member.
  • The "code of conduct" is: don't piss me off, don't virtue signal, content needs to be kid-friendly and safe-for-work.
  • Topic is general, anything you want to write or draw. Fiction, non-fiction, gonzo journalism, bad poetry, good drawing, ...
So, if I decide to do this, do you want in? Reply here, or msg me, or email me.

2018-08-23

buried in sheets of colorful glowing glass

Anyone remember those scenes from ST:TNG that showed the captain doing paperwork at his desk? He was using a display terminal to read text and to do vtc, AND he was switching between half a dozen data tablets? Remember how conceptually silly that was?
My edc backpack contains a laptop, a tablet, two phones, and one and sometimes two eink reader tablets. My nightstand has two eink tablets on it, and charger points for the phones.
I just saw a coworker sitting at a table in this building talking on one phone, reading email on her laptop, consulting text on a tablet, and taking notes on a ReMarkAble tablet.
The conference room I'm in right has all my edc gear, plus a yuuge 75inch vtc display on the wall, with attached steerable camera, and a glass tablet on the table for controlling the vtc. Plus a multipoint polycom rig on table. And plus an Alexa for Business terminal on the table. Even the lightswitch isn't real, it is a multiibutton scene controller. At least the clock on the wall is a physical moving-hands analog, but I know it gets set and sync remotely via some wiring coming out of wall behind it. And the six 5'x8' white glass whiteboards lining the walls are not "smart". Yet.
We are within striking distance of ending the blizzard of paper that used to engulf work, but now we're being buried in sheets of colorful glowing glass.

2018-07-26

Thoughts on the Google Titan token

The tech press so excited about Google's "Titan" hardware token, and the breathless statement that they have "never had an account takeover" since rolling it out internally. They are excited about the wrong things, and are being taken for a ride by G's marketing and PR departments.

It's only a FIDO U2F token. I've had one for almost 2 years now, and my current employer issued me one on my first day of work, over a year ago. Mandating 2FA across an enterprise is hardly a new thing.

The actual stories here are:
* why did Google decide to cut out YubiCo?
* Was it price?
* Was it not-invented-here?
* Did Google not trust YubiCo to not backdoor the YubiKey tokens?
* Did Google want to put their own backdoor into the Titan tokens?
* Did Google license YubiCo's manufacturing patents? (If they did not, it will be really hard to manufacture them cheaper.)

2018-04-20

On names

(upcomment)
There is no standard central registry of names in the US. Your "legal name" is whatever you say it is, with the restriction that you may not change your name to commit fraud or try to avoid a debt. The flip side of that is you have very few avenues to force anyone else to encode or display your name in the manner you prefer. If you change your name or have a name that is awkward to deal with, the obligation is generally on you to work out a working relationship with everyone else who needs to know your name.

A "legal name change" is just a helpful service provided by your local government court that lets you publicly declare and record a new name, bound to your previous name, on standard papers that other government agencies and various private organizations MAY (but not MUST) pay attention to.

There are multiple issuing authorities for IDs. Public, private, commercial, corporate, educational, military, government, ... An authority to issue IDs reaches only as far as their legal mandate, and no farther, except by rough consensus and a need to make stuff work.

Each issuing authority gets to have their own regularization rules and database schema. They will pick what works for them, and have very little interest or budget in completely reworking their databases and processes to accommodate someone who is being a pain in their ass.

Whenever you intersect with government, just expect ascii7 smashing.

US banking know-your-customer rules require ascii7 smashed names for account holders, both individuals and companies, to interface to government banking regulators and auditors.

The US interstate Driver License Compact imposes ascii7 smashing and length limits, for database and lookup compatibility reasons. When a cop looks at your DL (which you do have to give him if he's detained you while you are driving a car), it had better match what pops up on his terminal when he enters the the license# and/or the plate#, or you may be are about to have a bad day.

Things don't get a lot better when dealing with passports, from any country.

Passports impose ascii7 (with various accent composition hacks) and length limits, by treaty.

Different countries impose additional different rules and length limits, for reasons various and mostly stupid. Some countries try to appear to still permit accent marks, by trying to encode accents using various slightly different composition encodings layered on top of ascii7, and hoping that everyone uses the same encoding. This kind of sort of mostly works, except when it doesn't.

Passports from countries with widespread non-latin charsets will require an ascii7 smashed name, and often require a specific approved romanization algorithm, or an approved thesaurus, or both. Such a passport MAY have an additional field for the name in a local charset, but that is entirely at the option of the issuing authority.

What is printed on the passport photo page should match whats printed in the "machine readable zone", which should match whats on the rfid chip. "Should".

Most countries require air, sea, and rail carriers to pre-transmit passenger manifest before arrival. The names on those manifests also get ascii7-smashed, and any accent composition stripped, and had better match what's printed in your passport, or you are going to have a rough time boarding at departure, or clearing passport control at arrival.

Internal and external passports in China used to also require a field encoding the name in the form of Chinese telegraphy numbers, to sidestep transliteration issues between the various Chinese languages. I don't know if it still does.