FileStruct is a lightweight and fast file-cache / file-server designed for web-applications. It solves the problems of “where do I save all of those uploads” that has been encountered time and time again. FileStruct uses the local filesystem, but in a sensible way (keeping permissions sane), and with the ability to secure it to a reasonable level.
https://github.com/appcove/FileStruct/
Here is a simple example of taking an image upload, resizing, and saving it:
with client.TempDir() as TempDir: open(TempDir.FilePath('upload.jpg'), 'wb').write(mydata) TempDir.ResizeImage('upload.jpg', 'resize.jpg', '100x100') hash1 = TempDir.Save('upload.jpg') hash2 = TempDir.Save('resize.jpg')
Design Goals
Immutable Files
FileStruct is designed to work with files represented by the SHA-1 hash of their contents. This means that all files in FileStruct are immutable.
High Performance
FileStruct is designed as a local repository of file data accessable (read/write) by an application or web application. All operations are local I/O operations and therefore, very fast.
Where possible, streaming hash functions are used to prevent iterating over a file twice.
Direct serving from Nginx
FileStruct is designed so that Nginx can serve files directly from it’s Data directory using an X-Accel-Redirect
header. For more information on this Nginx configuration directive, see http://wiki.nginx.org/XSendfile
Assuming that nginx runs under nginx
user and file database is owned by the fileserver
group, nginx
needs to be in thefileserver
group to serve files:
# usermod -a -G fileserver nginx
Secure
FileStruct is designed to be as secure as your hosting configuration. Where possible, a dedicated user should be allocated to read/write to FileStruct, and the database directory restricted to this user.
Simple
FileStruct is designed to be incredibly simple to use.
File Manipulaion
FileStruct is designed to simplify common operations on files, especially uploaded files. Image resizing for thumbnails is supported.
Temporary File Management
FileStruct is designed to simplify the use of Temp Files in an application. The API supports creation of a temporary directory, placing files in it, Ingesting files into FileStruct, and deleting the directory when completed (or retaining it in the event of an error)
Garbage Collection
FileStruct is designed to retain files until garbage collection is performed. Garbage collection consists of telling FileStruct what files you are interested in keeping, and having it move the remaining files to the trash.
Backup and Sync with Rsync
FileStruct is designed to work seamlessly with rsync for backups and restores.
Atomic operations
At the point a file is inserted or removed from FileStruct, it is a filesystem move operation. This means that under no circumstances will a file exist in FileStruct that has contents that do not match the name of the file.
No MetaData
FileStruct is not designed to store MetaData. It is designed to store file content. There may be several “files” which refer to the same content. empty.log
, empty.txt
, and empty.ini
may all refer to the empty fileData/da/39/da39a3ee5e6b4b0d3255bfef95601890afd80709
. However, this file will be retained as long as any aspect of the application still uses it.
Automatic De-Duplication
Because file content is stored in files with the hash of the content, automatic file-level de-duplication occurs. When a file is pushed to FileStruct that already exists, there is no need to write it again.
This carries the distinct benifit of being able to use the same FileStruct database across multiple projects if desired, because the content of file Data/da/39/da39a3ee5e6b4b0d3255bfef95601890afd80709
is always the same, regardless of the application that placed it there.
Note: In the event that multiple instances or applications use the same database, the garbage collection routine MUST take all references to a given hash into account, across all applications that use the database. Otherwise, it would be easy to delete data that should be retained.