transactor

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
USAGE
CLIENT COMMUNICATION PROTOCOL
DEBUG FLAGS AND DEBUGMASK
NETWORKING
SECURITY
FILES
DIAGNOSTICS
NOTES
FAILURE RECOVERY
SEE ALSO
BUGS
COPYRIGHT
AUTHOR

NAME

transactor - a coordinator of distributed transactions

SYNOPSIS

transactor -c configfile [OPTION]...

DESCRIPTION

This manual page describes transactor, a general purpose transaction coordinator that implements a distributed consensus protocol.

Transactor can synchronize transactions between multiple redundant databases that run on different computers (nodes), possibly spread across the world and connected by the Internet.

Transactor does not offer database functionality itself. On top of a database, it can be used to implement a failsafe, redundant and distributed system that will continue to work even if some nodes experience failure.

The system will continue to work as long as a majority of nodes remains alive; a majority is more than half of all nodes, i.e. two out of three, or three out of five. After a network outage or hardware failure/replacement, a failed node can join the system through a recovery process.

There are no restrictions on the underlying database, since transactor does not interpret the transaction data in any way beyond safely storing it, communicating it to the peers and reaching a consensus on the next transaction. Therefore, while transactions are usually found in the realm of databases, transactor can be used in any situation where a distributed application must reach a consensus across all nodes.

Since transactor implements the basic service of transaction coordination, it needs to be used in conjunction with an application/database-specific server that operates the database.

OPTIONS

-c filename, --config filename

Use this configuration file. See the man page for transactor.conf(5). A configuration file must be specified.

-nd, --nodetach

Do not run as a daemon: remain attached to the terminal, and do not fork. This option will also set a few debug flags, unless the option -dm is specified. Debug information will be printed on stderr or to the debug log file specified with -d.

-d filename, --debug filename

Write debug information to this debug log file. The amount of debug information is determined by the debug flags set in the debug mask; if no debug mask is specified, all debug flags will be set.

-dm value, --debugmask value

Set the debug mask to value. The debug mask is an unsigned 32-bit number. Each bit that is set causes specific debug information to be emitted to the debug file (or to stderr, if -nd is specified). See the section DEBUGMASK below for a description of the bits. The value can be specified as a hex value, preceded by '0x', or as an octal value, preceded by '0'.

--speedtest

Allow running a speed test. The admin interface (see the section USAGE below) must be used to turn on the speed test. This option should not be set for production use.

-h, --help

Print a list of commandline options.

USAGE

Application Server (Client)

An application utilizing transactor needs an application server that communicates with transactor, operates the database, and serves data/takes requests from the rest of the application. The application server acts as a client to transactor.

The application server should do the following:

Open the database and connect to transactor.

Subscribe to new transactions from transactor.

Whenever there is a request from the application to perform some operation on the database, encode the request in a chunk of binary data (a transaction specification) and submit (post) that data to transactor.

Whenever the transactor sends a transaction specification (tlog_tspec), decode it and perform the equivalent operations on the database.

If the transaction specification does not match the one posted on behalf of the local application, it has been submitted on a remote node. Probably the result of the database operation should be discarded. (That depends on the application).

If the transaction specification matches the specification posted, return the results of the database operation to the application.

Transactor will only consider one locally submitted transaction specification for any particular transaction.

If two or more local application servers compete for placing their transaction specifications in the next transaction, transactor will handle that situation correctly but not very efficiently. In particular, bandwidth may be wasted by distributing a specification to all peer transactors that will be immediately replaced in the next transaction.

For best performance, only one local application server should be used, or the transaction servers should employ some kind of locking scheme to ensure only one is posting transactions at any point in time.

Transaction Specification

A transaction specification (tspec) is a chunk of binary data that has been prepared by the application server and is intended to become a transaction.

It is completely opaque to transactor; the contents of the chunk will not be interpreted in any way.

Specification Id: sid

When transactor receives such a transaction specification, it marks it with a unique stamp, a specification id (sid). The specification id is of the form node:number, where node is the 16-bit index of the node in the cluster, and number is a 64-bit number.

With this stamp, the tspec is distributed among the peers, and considered for the next transaction. The transactor client (i.e. the application server) is given the sid for further reference.

Transaction Id: tid, current tid, tlog_tid

All transactions in the system are numbered consecutively by a 64-bit number, starting with 1 for the first one. This number is called the transaction id (tid). With this number, a transaction can be looked up in the transaction log (tlog).

There are two important tid numbers at any time: the ``current tid'' is the next transaction to be completed by transactor; it has either started or will be started as soon as one of the transactor nodes in the cluster receives a transaction specification.

The ``tlog_tid'' is the last transaction in the local transaction log files; up to that number, all transactions can be retrieved by the application server.

If the local transactor is up-to-date, the current tid will be one above the tlog_tid. If the transactor is recovering, the difference between both numbers will be larger.

Transactions and the Consensus Protocol

Once a transaction specification has been submitted to any transactor node in the system, the cluster of transactors will start the consensus protocol to decide on the transaction specification to place in the current transaction. If multiple specifications have been submitted concurrently, one will be picked.

While some nodes may fail in that process, the overall system will proceed as long as a majority of all transactors (more than half of the total number) is alive. (If there is no majority of nodes participating, the process hangs.)

The final outcome of this process is a decision on the next transaction specification, which is known by all non-failed nodes in the cluster, Those nodes also have the specification available.

Each transactor stores the transaction in the tlog, and informs its application servers if so requested.

Transactor as a Server

Transactor runs as a server process and offers its service via a TCP network port. An application server using transactor is a client to the transactor service, connecting to this port and issuing commands.

Transactor binds to another port that is intended for administration purposes. It offers essentially the same functionality as the main service port.

See the section on the CLIENT COMMUNICATION PROTOCOL below for a description of those network ports.

Transactor reads those port specifications, as well as the list of its peers, from the configuration file (see man transactor.conf(5)).

It communicates with its peers over TCP and UDP. See the NETWORKING section below.

Transactor saves all completed transactions in a perpetual transaction log on the file system. It also keeps a message log to recover operations when the server has been shut down. Those are described in the FILES section below.

CLIENT COMMUNICATION PROTOCOL

After connecting to transactor's service port, a client (i.e. an application server) can issue commands.

Transactor also offers an admin port, to which an operator (a human) can connect with telnet, netcat, or socat.

On both interfaces, the same commands are available; the difference is that the admin interface will not be closed after syntactic errors.

If a keepalive timeout is set in the configuration file, transactor will send a pingn message to the client after some time without communication, which must be answered by a corresponding pongn by the client. Both messages consist in four characters, followed by a newline. Those ping and pong messages are not commands, therefore they are not listed below.

Command Syntax

Each command must be on a single line. Whitespace before and after the command is discarded. If a command has arguments, those are on the same line, separated by whitespace.

The only exception from this rule is run-length encoded data in the post command and in the tlog_tspec response.

Commands with syntax errors are answered by a line starting with error: and describing the error. On the regular service port, transactor will disconnect after the error to avoid confusing an application server that expects a different response. The admin interface will remain open.

Binary Data Encoding

In the post command, binary data can be passed to transactor in two possible encodings.

The first encoding is a C-style-encoded string where all non-printable or non-ascii characters are encoded as \ooo (three octal digits), and the characters " and ' are encoded as " and '. The string is enclosed by double quotes " on each side.

Note that encoding the data as a string does not mean it will be interpreted in any way by transactor; in particular, transactor does not know (and does not want to know) about character sets, Unicode, multibyte characters, 0 characters and the likes. Transactor treats the string as an opaque sequence of bytes.

The string cannot span multiple lines; the opening and closing double-quote must be on the same line as the command word.

The second encoding is a run-length encoded binary transfer. The client first sends the length in bytes as a textual number, then a newline character, then all bytes of the binary data, and finally another newline character.

In the tlog_tspec response to the gettspec or the subscribe command, transactor sends binary data to the client. This data is always run-length encoded, i.e. in the second method mentioned above.

Client Commands and Responses: Admin Commands

The first group of commands are primarily intended for the admin interface. They should not be used by a regular client.

quit

exit

The quit and exit commands cause the server to close the connection to the client. The client need not use these commands; it can simply close the connection on its own.

terminate

The terminate command causes the server to shut down operations gracefully and exit. All allocated memory is freed before exiting. Transactor should be able to restart even if it has not been gracefully shut down (i.e. with kill -9).

help

Prints a short overview of all commands.

help command

Prints details about command.

start

The start command causes transactor to start serving clients on the regular interface, after that has been suspended by the stop command. This command is not implemented yet.

stop

The stop command causes transactor to stop serving clients on the regular interface. This command is not implemented yet.

reset_node nodename

The reset_node command causes transactor to discard all messages ever sent to or received from the node named nodename. The nodename argument must be the name of a peer in the configuration file, encoded as a C-style string. See section FAILURE RECOVERY for details. This command is not implemented yet.

info

Prints info about the current transactor status in a human-readable format that might also be suitable for automatic parsing.

dump

Prints a detailed debug dump of the internal data structures to the client.

dump filename

Prints a detailed debug dump of the internal data structures to file filename.

debugdump

Prints a detailed debug dump of the internal data structures to the debug-file. A debug dump can also be triggered by the signals SIGUSR1, SIGUSR2, and SIGHUP; for those signals, the debug dump will be written to the signal_dumpfile specified in the configuration file.

debugfile filename

Sets or changes the name of the debug file.

debugmask all

Sets all debug flags (prints all available debug messages).

debugmask none

Unsets all debug flags (prints no debug messages).

debugmask +flag

Sets the specified debug flag (turns on printing the corresponding debug messages to the debug log file). For possible values of flag, see the section DEBUG FLAGS AND DEBUGMASK below.

debugmask -flag

Unsets the specified debug flag (turns off printing the corresponding debug messages). For possible values of flag, see the section DEBUG FLAGS AND DEBUGMASK below.

debugmask

Show the debug mask.

debug on

Switch on debugging.

debug off

Switch off debugging and close the debugfile.

debug

Show whether debug is on or off.

speedtest size

Start an endless loop of random transactions of the specified size in bytes.

speedtest off

Stop the speed test.

Client Commands and Responses: Regular Commands

The second group of commands is intended for use by the regular client. They are required for the client to interact with and properly use the transactor.

subscribe tid

Subscribe to all transactions with transaction id from tid upwards.

After issuing this command, the client will be fed the binary data of all transactions, starting from tid up to the last transaction completed. As new transactions are being completed, the client will be fed those as well.

The client is also sent an initial status message, and will be sent further status messages for all status changes of the transactor. See the getstatus command for an explanation of the status response.

Subscriptions can be changed by the same subscribe command; a newer command supersedes an older one.

Responses (not necessarily instantaneous):

status n_peers iself majority_alive tlog_status tlogtid tid started

The status message is sent as an immediate response to the subscribe request. Until the subscription is turned off, a new status message will be sent whenever the majorit_alive or tlog_status flags change their states.

See the getstatus command for an explanation of the status response.

tlog_tspec tid tid sid sid size sizendata...n

The data of the specification for transaction tid in the tlog.

This message is sent once for all transactions between the subscribed tid and the current tlog end tid. As the transactor makes progress and additional transactions are added to the transaction log, new messages will be sent to keep the client updated, until the subscription is turned off.

See the gettspec command for an explanation of the tlog_tspec response.

subscribe off

Turn off the subscription.

post rqnum tid "data_string"

Post the data in the C-style quoted data_string. This form of the post command is simply newline-terminated. Except for the data encoding, it behaves like the run-length encoded form below.

post rqnum tid sizendata...n

Post the run-length encoded data of size bytes length. The data starts immediately after the n that ends the post line, and is immediately followed by another n character.

A positive 32-bit number rqnum can be chosen by the client; the number will be used in the response to this post command. The number is not otherwise used in transactor.

The post command causes transactor to try and place the transaction specification data in transaction tid.

Responses:

post_response rqnum tid_old

It is too late. The transaction id tid given by the client is outdated; internally, the transactor is ahead and has already started or even completed this transaction.

The client should obtain the current transaction id with the getstatus command, unless it knows it through subscribe. If the current transaction id is the same as used by the client in the request, the transaction is already started and the client must wait until this transaction is completed, before trying to post again.

post_response rqnum tid_new

It is too early. The transaction id tid given by the client is ahead of the end of the local transaction log (see the FILES section for a description of the transaction log).

Note that the transaction log (tlog) may lag back behind the current transaction id of the transactor, if transactor is recovering from a disconnected or failed phase. In this case, there is no possible tid for the client to use that is not either tid_old or tid_new. The client should wait until the tlog_tid as shown by the getstatus command is one behind the current transaction id of the transactor; then the transactor is up-to-date again.

post_response rqnum other_sid

The transactor currently holds another block of binary data in a transaction specification that is part of an ongoing transaction and cannot be overridden.

The client should retry posting when the current transaction is completed.

post_response rqnum tid_started

The tid given by the client is the current transaction id of transactor, and also just one ahead of the last tid in the transaction log (the tlog_tid), but the transaction is already started and it's too late.

The client should retry posting when the current transaction is completed.

post_response rqnum post_error

An unspecified error occurred. This should not happen, and the client could react by terminating.

post_response rqnum propose_error

An unspecified error occurred. This should not happen, and the client could react by terminating.

post_response rqnum posted sid tid

Success! (At least, for the moment). The request data was successfully posted and made into a transaction specification with the specification id sid. The transactor will try to place it in transaction tid. The specification id sid is unique to this particular transaction data and will not be used by another transactor in the same cluster.

From now on, the decisions are out of the hands of the client. It's too late for the client to withdraw the transaction specification, but transactor will not guarantee that this ends up as the next transaction. Another client on another computer could win the race.

Note that there is no notification of the client of the final outcome for transaction id tid as a specific response to this request. The client can either poll transactor with getstatus until tid is finished, then retrieve the result with gettspec and compare the sid to find out whether it has been successful, or it can subscribe and be notified of the completion automatically.

repost tid sid

Try to place the previously posted data with specification id sid in transaction tid.

This is the same as the post command, but avoids retransmission of the binary transaction data from the client to transactor and, more importantly, redistribution of that data to the other transactors in the cluster.

In order for this to succeed, the transaction specification must not have become a transaction yet; so, the previous post must have been successful but another transaction specification must have been selected by the cluster of transactors for the previously used tid.

Also, no other client to the local transactor must have had a successful post in the meantime, since the cluster of transactors will only consider one particular transaction specification from each transactor at one single point in time.

If one of these conditions is violated, the repost will fail and the client must use a plain post command instead.

Responses:

repost_response tid sid tid_old