About STF
Resources

Table of Contents

STF requires MySQL 5.x (but please see the section about Q4M before installing MySQL > 5.1).

All of the data except for the actual object content will be stored in this MySQL, so you should be careful to tune it and give it ample storage.

Once you have a MySQL server running, create a new database and set it up using stf.sql (e.g. https://github.com/stf-storage/stf/blob/master/misc/stf.sql)

Overview

STF uses a queue to communicate with the backend workers. You can currently choose between Q4M and TheSchwartz for this. For very high-load systems, you may want to choose Q4M (but I'm only saying that because I haven't used TheSchwartz in a real production STF deployment)

DO NOT FORGET to tell STF where your queue is: You must specify how to connect to your queue by either specifying it in environment variables or by editing your config file:

# 1) If you're using environment variables
# You need to do the same for workers, too!
env STF_QUEUE_DSN="dbi:mysql:dbname=stf_queue" \
    STF_QUEUE_USERNAME=foobar \
    STF_QUEUE_USERNAME=password \
    plackup -a etc/dispatcher.psgi
# 2) OR, if you're editing the config file
'DB::Queue' => [
    "dbi:mysql:dbname=stf_queue",
    "foobar",
    "password",
    {
        AutoCommit => 1,
        AutoInactiveDestroy => 1,
        RaiseError => 1,
        mysql_enable_utf8 => 1,
    }
],

You only need to do one of the above

Q4M

Q4M (http://q4m.github.com) is a MySQL storage engine that allows you to use it as a fast, robust queue.

To install Q4M, you need to have MySQL 5.1 available (if you're installing Q4M from source, you will also need the MySQL source code).

The Q4M instance typically requires much less resources than the main backend MySQL DB, so unless your traffic is extremely large it can be hosted in the same machine as other services (see diagram in "Typical Setup")

Once you have a Q4M server running, create a new database and set it up using stf_queue.sql (e.g. https://github.com/stf-storage/stf/blob/master/misc/stf.sql)

TheSchwartz

TheSchwartz (CPAN) is a Job Queue written by Six Apart Ltd., which uses MySQL as its backend. Unlike Q4M, TheSchwartz does not require you to install a custom storage engine, so if you can't install Q4M, this may be a good choice for you.

To install TheSchwartz, simply install it from CPAN:

cpanm TheSchwartz

Once you have installed TheSchartz, create a new database and set it up using stf_schwartz.sql (e.g. https://github.com/stf-storage/stf/blob/master/misc/stf_schwartz.sql)

You also need to tell your workers and your dispatchers that you're using TheSchwartz as your backend by setting STF_QUEUE_TYPE to "Schwartz":

export STF_QUEUE_TYPE=Schwartz
plackup -a dispatcher.psgi

and

export STF_QUEUE_TYPE=Schwartz
./bin/stf-worker

See Envicornment Variables for details on STF_QUEUE_TYPE

Overiview

STF currently relies on reproxying via X-Reproxy-URL, so you need to setup an HTTP proxy server for the dispatcher. Any proxy that can handle X-Reproxy-URL should do, but currently we only use Apache (with mod_reproxy) in production, and there is a version that runs on dotCloud, stf-on-dotcloud, which uses nginx.

Apache

Apache does not come with X-Reproxy-URL support. You need to install mod_reproxy (which is in turn based on Kazuho Oku's original version)

Enable your proxy, and also set the ReproxyIgnoreHeader directive, because mod_reproxy don't support chunked encoding at the moment

Reproxy on
ReproxyIgnoreHeader Accept-Ranges

Please see the sample config file for more details

nginx

nginx doesn't support X-Reproxy-URL natively, but it's very easy to emulate it. Simply place a config like the following in your config.

location = "/reproxy" {
    internal;
    resolver xxx.xxx.xxx.xxx; # set up a proper resolver
    set $reproxy $upstream_http_x_reproxy_url;
    proxy_pass $reproxy;
}

You also need to set the following environemnt variables for your dispatcher:

# enables X-Accel-Redirect header
STF_NGINX_STYLE_REPROXY: 1

# sets the internal redirect url. should match what you put in
# in your nginx.conf. If you configured it to be "/reproxy",
# then you don't need to set it.
STF_NGINX_STYLE_REPROXY_ACCEL_REDIRECT_HEADER: /reproxy
Overview

STF uses memcached to boost performance. Typically you can install these on the same machine as the dispatchers

Simply start as many instances as you can, and list the server:port values in the Memcached section of config.pl

STF comes with a primitive admin interface. Boot it up by using the etc/admin.psgi PSGI application

# Note: pick a real port. 9000 is just a random number I came up with right now
plackup -p 9000 -a etc/admin.psgi

Then you should be able to access the admin interface at http://127.0.0.1:9000/

Workers do a bunch of things, including keeping an eye out on the object's replicas, deleting objects and buckets, and other assorted goodies.

Workers can be setup pretty much anywhere it can access the Q4M queue. In our diagram from "Typical Setup", we host them on the dispatcher machines, but it's your call.

The number of workers to activate also depends on your setup. Just make sure that you have enough workers to handle your queue so the queue does not overflow.

Overview

Storages are just dumb HTTP servers that understand basic CRUD via HTTP GET, PUT, DELETE. STF provides you with a simple PSGI app to handle this, but if you happen to find that this is not enough, you can choose to build your own equivalent. It's not that hard.

STF comes with a simple storage app you can easily deploy:

# Note: pick a real port. 8888 is just a random number I came up with right now
export STF_STORAGE_ROOT=/path/to/storage
plackup -a $STF_HOME/etc/storage.psgi -p 8888

You need to tell the STF dispatcher how to access this storage by inserting a row in the "storage" table

INSERT INTO storage (id, uri, mode) VALUES ( 1, "http://127.0.0.1:8888", 1 );

Alternatively, you can access this information from the admin interface.

Overview

The STF dispatcher needs to be configured so that it knows what kind of environment it's running in. All configuration is ultimately in a configuration file, but certain switches that you might want to toggle on/off easily can be specified via environment variables (you can control how this is reflected if you provide your own config script, though. see below)

config.pl

The configuration is stored in a regular Perl script file. By default etc/config.pl is used. The default config.pl provided with STF should be enough for most purposes, but you can always customize it however you like -- it's just a Perl script!

If you provide your own config script, be sure to check out what environment variables you're using -- the environment variables described in this document assumes that you're using the default config.pl file

If you're using the default config.pl, you only should need to change a few values:

DB::Master

'DB::Master' => [
    $dsn,
    $username,
    $passsword,
    \%options,
]

This value should be filled with a list of parameters that tells us how to connect to the STF database. Normally you only need to change $dsn, $username, and $password. See etc/config.pl for a sample. If you want to customize further, please read the documentation for DBI.pm

DB::Queue

This value should be filled with a list of parameters that tells us how to connect to the STF Q4M (or TheSchartz database) instance. See DB::Master section above.

Memcached

'Memcached' => {
    servers => \@list_of_servers,
    ... other ...
}

This specifies the Memcached servers to store your data. The values in this section are directly passed to Cache::Memcached::Fast. Please see the documentation for detaills

Firing it up

Here's how you might start the dispatcher, along with some environment variables that you need to setup.

# any integer value. must be unique for each dispatcher
# you don't need to set this if you edited your config.pl to include this value
# in Dispatcher.host_id field.
export STF_HOST_ID=101

# specify if you're using nginx as your proxy (default: 0)
# export STF_NGINX_STYLE_REPROXY=1
# specify if you're using a url other than /reproxy (default: /reproxy)
# export STF_NGINX_STYLE_REPROXY_ACCEL_REDIRECT_URL=/path/to/reproxy

# specify if you're using TheSchwartz as your queue (default: Q4M)
# export STF_QUEUE_TYPE=Schwartz

# specify the path to your stf root dir. (default: current directory)
# export STF_HOME=/path/to/stf

# set to true if you're NOT running behind a X-Reproxy-URL capable
# reverse proxy. ONLY USE THIS FOR TESTING
# export USE_PLACK_REPROXY=1

plackup -a /path/to/dispatcher.psgi
Name Applies to Default Value Description
STF_HOME All Current Directory Path to the STF installation
STF_DEBUG All false Set to 1 if you would like to see debug output from STF
STF_TIMER All false Set to 1 if you would like to see timers measuring STF performance
STF_HOST_ID Dispatcher (null) The UNIQUE host id for each dispatcher. This is used to generate a unique ID for buckets and objects, so you MUST provide a unique integer value for each of your dispatchers. If you don't specify a value in Dispatcher.host_id in your config.pl this variable is required
USE_PLACK_REPROXY Dispatcher false When set to a true value, will automatically load Plack::Middleware::Reproxy::Furl to do the reproxying for you. This is required if you're not running behind a X-Reproxy-URL capable reverse proxy. DO NOT USE in production environments.
STF_NGINX_STYLE_PROXY Dispatcher false When set to a true value, will include X-Accel-Redirect HTTP header in the dispatcher response for GET requests.
STF_NGINX_STYLE_PROXY_ACCEL_REDIRECT_URL Dispatcher /reproxy The URL that is configured to process the reproxying.
STF_MYSQL_DSN Dispatcher/Worker dbi:mysql:dbname=stf DBI-style connect string for your MySQL (main) database.
STF_MYSQL_USERNAME Dispatcher/Worker root DBI-style username for your MySQL (main) database (maybe omitted if specified in DSN).
STF_MYSQL_PASSWORD Dispatcher/Worker root DBI-style password for your MySQL (main) database (maybe omitted if specified in DSN).
STF_QUEUE_TYPE Dispatcher/Worker Q4M The type of job queue to use. By default Q4M is used. specify "Schwartz" to use TheSchwartz. See also: STF_QUEUE_DSN, STF_QUEUE_USERNAME, STF_QUEUE_PASSWORD
STF_QUEUE_DSN Dispatcher/Worker dbi:mysql:dbname=stf_queue DBI-style connect string for your queue database.
STF_QUEUE_USERNAME Dispatcher/Worker root DBI-style username for your queue database (maybe omitted if specified in DSN).
STF_QUEUE_PASSWORD Dispatcher/Worker root DBI-style password for your queue database (maybe omitted if specified in DSN).
STF_STORAGE_ROOT Storage (null) The root directory where files should be stored. Required.