2010-06-30

More Drizzle Syslog. Logging big and slow queries.

One extremely useful feature in MySQL is the "slow query log". In the stock distribution, it records every query that takes more than 1 second to execute, and every query that requires a full table scan. A useful part of the Percona patches extend this feature to measure sub-second slow queries, which became important as machines got faster.

When someone asks "Why is the database slow?", the first counter question to ask is "What's in the slow query log?"

One of the confusing parts of administering MySQL is that the "query log" and the "slow query log" are different things. They have different controlling options, and different log formats.

When I ripped out and replaced the logging system during the Drizzle fork, I took these two logging systems, which were implemented as completely separate code, and merged them together. There is now only the "query log", but it can be filtered in-process to only record queries that are "too slow", "too large", or "too expensive".

This filtering is controlled with three configuration options or server variables:

syslog-logging-threshold-slow
syslog-logging-threshold-big-resultset
syslog-logging-threshold-big-examined

These all have a default value of is zero.

The "-slow" variable is in units of milliseconds. If a query takes longer than that much time, it gets logged.

The "-big-resultset" variable means that any query that has a resultset with more rows than this value logged.

The "-big-examined" is a bit more complicated. It counts the number of rows that were retrieved from the underlying storage engines. Any query that read more than the value gets logged. This is useful for detecting table scans.

For example, if a table PARTS has 10,000 rows, and has a column SID that is not indexed, then the query
SELECT * FROM PARTS WHERE SID='0042';
than the number of rows examined will be 10,000, no matter how many rows match the query. But if that column is indexed, and that index has a high cardinality, than the number of rows examined will be much less. In the best case, it will be the same as the number of rows returned.


Another major annoyance with stock MySQL is that to enable the query log and the slow query log, you have to restart the server to enable them. This is a problem for large systems with uptime requirements.

In Drizzle, on the other hand, the variables syslog-errmsg-enable and syslog-logging-enable can be set while the server is running.

Also, the threshold variables can be changed while the server is running. So you can leave them at the default values of zero when first setting up an application, and get a full log. Then turn them up to high values, and only get logged events when something unusual happens, and then turn the thresholds back down to lower more sensitive values to investigate the problem, all without restarting the server.

2010-06-28

A Modest Suggestion for a Truly Digital Unicode Script

I have an idea for a useful Unicode script.

It consists of a single character, but will have a number of "accents" or "combining characters".

MANUS SYMBOL
MANUS COMBINING CHIRALITY DEXTER
MANUS COMBINING CHIRALITY SINISTER
MANUS COMBINING FACING PALMAR
MANUS COMBINING FACING DORSAL
MANUS COMBINING DIGIT THUMB
MANUS COMBINING DIGIT INDEX
MANUS COMBINING DIGIT INSULT
MANUS COMBINING DIGIT RING
MANUS COMBINING DIGIT SMALL

For each of the chirality and facing types of combining character, exactly one must be specified. For the digit type of combining combining, zero to five may be specified.

It would take 128 glyphs to display this font, less if the font system can handle reflections or even better, algorithmically generate the glyphs.

What this font represents should be obvious.

Setting up Drizzle and the Syslog for each other

The UNIX syslog is well-known way for applications to emit potentially interesting messages. A message is merely a string of characters with a some associated metadata. That metadata includes an "identity", a "facility", and a "priority". A daemon process running on the host reads a configuration file, and then collects and processes the messages as they are generated. It can discard them, append them to a text file, or send them them over the network to other syslog daemons.

The "identity" is just a short string of characters, and identifies the source of the events. Traditionally, it is the name of the program. The default syslog identity for the Drizzle server is "drizzled". It is set with the "syslog-ident" option, and can be read, but not changed, via a server variable of the same name.

The syslog "facility" and "priority" values are selected from a predefined list. If you read the UNIX man page syslog(3) or examine the contents of /usr/include/syslog.h, you will find the facilities and priorities defined for your system.

The default syslog facility for the Drizzle server is "local0". It is set with the "syslog-facility" option, and can be examined but not changed, via a server variable of the same name. The Drizzle server has two priority settings. The priority of the query log messages is set with "syslog-logging-priority", and defaults to "info". The priority of the error message messages is set with "syslog-errmsg-priority", and defaults to "warning". These can be read, but not changed, via server variables.

The details of configuring syslog are very specific to your operating system distribution, local operations doctrine, and personal whims. You should carefully read the documentation your distribution's syslog daemon, and understand how it is currently configured.

Create some new empty logfiles, /var/log/drizzled/query.log and /var/log/drizzled/error.log and then configure syslog to send local0.info messages to that query.log and local0.warn messages to that error.log. You may also want to reconfigure such those events do not go to the "catch all" destinations, such as the /var/log/messages and /var/log/syslog files. Also, remember to configure logrotate to manage the the new files.

On a modern Ubuntu system, you do all that like so:

Run the following commands
mkdir -p /var/log/drizzled
touch /var/log/drizzled/query.log
touch /var/log/drizzled/error.log
Create a file named /etc/rsyslog.d/90-drizzled.conf with the following contents:
local0.=info  -/var/log/drizzle/query.log
local0.=warning  /var/log/drizzle/error.log

Find the file named /etc/syslog.d/50-default.conf and change the following lines

Change the line
*.*;auth,authpriv.none -/var/log/syslog
to
*.*;local0.none,auth,authpriv.none -/var/log/syslog

Change the lines
*.=info;*.=notice;*.=warn;\
    auth,authpriv.none;\
    cron,daemon.none;\
    mail,news.none  -/var/log/messages
to
*.=info;*.=notice;*.=warn;\
    local0.none;\
    auth,authpriv.none;\
    cron,daemon.none;\
    mail,news.none  -/var/log/messages

Start drizzle the way you are used to, but add some new options to the command line or to the configuration file.

/sbin/drizzled ${other_options} \
    --syslog-logging-enable \
    --syslog-errmsg-enable

Open up some new terminal windows and run "tail -f" against the new log files, and then connect to the drizzle server you just started, and run some queries. You will see the queries show up in the query log, and if there are any errors, error messages show up in the error log.

2010-06-26

Drizzle plugin/syslog has been merged, exploring how to use it

My new syslog module has been merged with the Drizzle mainline.

After updating your local copy of the tree, you can see the sources in plugin/syslog, which have completely replaced plugin/logging_syslog.

To check to see if your build or distribution of Drizzle has this module available, run the the drizzled binary with the --help option, and look for lines referring to "syslog", like this:

  $PREFIX/sbin/drizzled --help | grep syslog

You should see something similar to this:

  --syslog-ident=name
  --syslog-facility=name 
  --syslog-logging-enable 
  --syslog-logging-priority=name 
  --syslog-logging-threshold-slow[=#] 
  --syslog-logging-threshold-big-resultset[=#] 
  --syslog-logging-threshold-big-examined[=#] 
  --syslog-errmsg-enable 
  --syslog-errmsg-priority=name 
  syslog-ident                      drizzled
  syslog-facility                   local0
  syslog-logging-enable             false
  syslog-logging-priority           info
  syslog-logging-threshold-slow     0
  syslog-logging-threshold-big-resultset 0
  syslog-logging-threshold-big-examined 0
  syslog-errmsg-enable              false
  syslog-errmsg-priority            warning

The lines beginning with the double dash are options, which can be specified on the drizzled command line, or in the drizze server configuration files. The lines without the double dash are server variables, which can be examined and sometimes set from a SQL prompt or other Drizzle client. Next to the server variables are their current value. Notice that both syslog-logging-enable and syslog-errmsg-enable default to false.

The next post will be a quick overview of the UNIX syslog, and how this module works with it.

2010-06-24

New syslog module for Drizzle DB

One of the long-time continuing complaints against using MySQL in an enterprise or large scale environment is it's logging system. The query log, slow query log, and error log all get written to the server's local filesystem, which is difficult and awkward to hook up to enterprise log collection and analysis systems. A recurring feature request is to be able to use the UNIX syslog instead.

One of the first things I did to Drizzle was tear out the existing logging code. This also removed a number of mutex locks, one of which was completely unnecessary, and one that got constantly hit even if logging was turned off.

In it's place I put two of the first Drizzle plugin interfaces, one for the query logging, which combined the features of the old MySQL query log and the slow query log, and one for error message reporting. I then wrote a plugin that took the query log, and sent it to the syslog.

At the last MySQL Conference, during the Drizzle Developer Day, Monty Taylor pointed out that we were missing something. There was no "send error messages to the syslog" plugin. Oops.

Now there is.

I've completely replaced the plugin/logging_syslog module with a new plugin/syslog module that contains three plugins. In addition to a rewritten "query log to syslog", there is "error messages to syslog", and also a new SQL function SYSLOG(facility, priority, text).

I've pushed the work up to launchpad, and I'm confident that after review and possibly some tweaks, it will be integrated with Drizzle.

Documentation and examples for using this new module are in upcoming posts.

2010-06-08

Centralized Social Networking vs One Social Web, and Cloud Computing

A few weeks ago, at the 10th Internet Identity Workshop in Mountain View, I watched an impressive demo of the reference implementation of the One Social Web design.

It's a fully decentralized full federated "Facebook". I was amazed. It can do everything that Facebook can, it has far better trust model than "Trust Mark Zucherberg's ethics", it scales without polling, it uses simple open network protocols, and the reference implementation is open source.

Since before and while the Diaspora people have been collecting money via Kickstarter and giving interviews to the public media, the OSW people have designed a good scalable network protocol based on proven components and designs, and have written running code.

Huge centralized systems like Facebook and Twitter have been driving the creation of some pretty neat technology, which they have been forced to use because of their gigantic size and load and growth. Engineering effort that use to be used to add value and features are now instead consumed to just stay alive and running. These huge systems tend to run in their own data centers because at that scale it makes sense, for the same reason that Coca-Cola runs their own water treatment plants.

But, if a system can federate into small pieces then it does not have to go to such heroic efforts as it grows. The implementations can have simpler designs, and more straightforward code that is easier to develop and maintain. They are "small pieces, loosely joined", and "Fast, Cheap, and Out of Control", to invoke some proven patterns. They can keep growing when the centralized systems fail. They can handle the load of ten billion people and a trillion devices.

I've preached all these points before. But now I come to the point that occurred to me today:

These federated systems work very well with hosting providers and with cloud computing systems. They require basic off-the-rack system administration skills, and can be designed to use basic cloud infrastructure APIs and components.

If you bring up a OWS node, you could run it on a junk machine under your desk. But when you start depending on having it be stable and managed, it makes since to let Rackspace Slicehost or Aamzon AWS EC2 run it for you.

Feh on "Expedited Border Crossing Schemes"

I just got spammed with an invitation to join "Flux", which appears to be yet another Expedited Border Crossing Scheme, this time between the US and the Netherlands.

I think that the proliferation of all these special pairwise expedited border crossing schemes is stupid. Each one has it's own special set of requirements, each one has it's own special treaty basis, each one has it's own fee, each one has its own special kiosk, each one has it's own database (of varying, unproven, and opaque security), and each one issues it own special card. And they all are exist only as workarounds to bypass the idiotic assinine border paranoia and xenophobia zeitgeist that infest the current system international border control and visas (of which my own country, the US, is one of the very worst offenders).

If we are going to have an international "trusted traveller" scheme, make it much more useful and general. That is, one that lets me use the TSA "registered traveller" inside the US, and also gives me a visa and expedited entry into all of NAFTA, the Commonwealth countries (UK,CA,AU,NZ), and the Schengen Area, and so forth. Also, it shouldn't have it's own special card, but instead can just be accessed via the barcode on my passport, or if the PTBs really insist, a biometric enhanced passport.

But for now, I will give Flux a pass, just like I've given SENTRI and NEXUS and TSART a pass.