Introducing FileStruct (for Python)

FileStruct is a lightweight and fast file-cache / file-server designed for web-applications.  It solves the problems of “where do I save all of those uploads” that has been encountered time and time again.  FileStruct uses the local filesystem, but in a sensible way (keeping permissions sane), and with the ability to secure it to a reasonable level.

https://github.com/appcove/FileStruct/

Here is a simple example of taking an image upload, resizing, and saving it:

with client.TempDir() as TempDir:
   open(TempDir.FilePath('upload.jpg'), 'wb').write(mydata)
   TempDir.ResizeImage('upload.jpg', 'resize.jpg', '100x100')
   hash1 = TempDir.Save('upload.jpg')
   hash2 = TempDir.Save('resize.jpg')

Design Goals

Immutable Files

FileStruct is designed to work with files represented by the SHA-1 hash of their contents. This means that all files in FileStruct are immutable.

High Performance

FileStruct is designed as a local repository of file data accessable (read/write) by an application or web application. All operations are local I/O operations and therefore, very fast.

Where possible, streaming hash functions are used to prevent iterating over a file twice.

Direct serving from Nginx

FileStruct is designed so that Nginx can serve files directly from it’s Data directory using an X-Accel-Redirect header. For more information on this Nginx configuration directive, see http://wiki.nginx.org/XSendfile

Assuming that nginx runs under nginx user and file database is owned by the fileserver group, nginx needs to be in thefileserver group to serve files:

# usermod -a -G fileserver nginx

Secure

FileStruct is designed to be as secure as your hosting configuration. Where possible, a dedicated user should be allocated to read/write to FileStruct, and the database directory restricted to this user.

Simple

FileStruct is designed to be incredibly simple to use.

File Manipulaion

FileStruct is designed to simplify common operations on files, especially uploaded files. Image resizing for thumbnails is supported.

Temporary File Management

FileStruct is designed to simplify the use of Temp Files in an application. The API supports creation of a temporary directory, placing files in it, Ingesting files into FileStruct, and deleting the directory when completed (or retaining it in the event of an error)

Garbage Collection

FileStruct is designed to retain files until garbage collection is performed. Garbage collection consists of telling FileStruct what files you are interested in keeping, and having it move the remaining files to the trash.

Backup and Sync with Rsync

FileStruct is designed to work seamlessly with rsync for backups and restores.

Atomic operations

At the point a file is inserted or removed from FileStruct, it is a filesystem move operation. This means that under no circumstances will a file exist in FileStruct that has contents that do not match the name of the file.

No MetaData

FileStruct is not designed to store MetaData. It is designed to store file content. There may be several “files” which refer to the same content. empty.logempty.txt, and empty.ini may all refer to the empty fileData/da/39/da39a3ee5e6b4b0d3255bfef95601890afd80709. However, this file will be retained as long as any aspect of the application still uses it.

Automatic De-Duplication

Because file content is stored in files with the hash of the content, automatic file-level de-duplication occurs. When a file is pushed to FileStruct that already exists, there is no need to write it again.

This carries the distinct benifit of being able to use the same FileStruct database across multiple projects if desired, because the content of file Data/da/39/da39a3ee5e6b4b0d3255bfef95601890afd80709 is always the same, regardless of the application that placed it there.

Note: In the event that multiple instances or applications use the same database, the garbage collection routine MUST take all references to a given hash into account, across all applications that use the database. Otherwise, it would be easy to delete data that should be retained.

nginx + apache + mod_wsgi + python: how to make dynamic pages expire

When writing dynamic web applications, we use nginx as a front-end web server and apache+mod_wsgi as an application server.

It is the job of nginx to:

  1. Handle SSL, and domain-level rewriting/redirects
  2. Handle static content (.jpeg, .png, .css, .js, .txt, .ico, .pdf, etc….)
  3. Handle dynamic downloads through X-Accel-Redirect
  4. Proxy other requests to apache
  5. Set the proper cache-control and expires headers on content

Ever run into the situation where you click log out, and then click the back button, and are still able to see the pages!  That is bad.   They are dynamic pages anyway, and should not be cached.

However, images, etc… SHOULD be cached. It is important that any references to images have a way to invalidate the cache. We append a number as a query string:

/path/to/script.js?192012129

This number is updated from time to time (via Python variable) when we need to invalidate the cache.

Anyway, here are some helpful nginx configuration directives.

# Send static requests directly back to the client
location ~ \.(gif|jpg|png|ico|xml|html|css|js|txt|pdf)$
{
    root  /path/to/document/root;
    expires max;
}

# Send the rest to apache
location /
{
    add_header Cache-Control 'no-cache, no-store, max-age=0, must-revalidate';
    add_header Expires 'Thu, 01 Jan 1970 00:00:01 GMT';
    proxy_pass http://127.0.0.1:8123;
}

Home School Software now supports Images!

Today I got to a great milestone with the homeschool software I discussed here.  It now has excellent image support.

For every activity, you can attach an unlimited number of images.  They are stored in the database with all of the other data, so it’s easy to backup.  Also, I am using ImageMagick to resize them into various preview sizes for quick speeds while working with them.

I used uploadify to power the image uploader and ImageMagick to resize them into preview sizes.

Here is a screen shot of the list page:

And a screen shot of the details page: