If you’re a regular reader, you might have noticed that I’ve been on a quest for the perfect backup program, and ended up writing my own encryption layer over bup.
While writing encbup, I wasn’t very satisfied with having to download the entire huge archive just to restore a file, and still wished that I could use EncFS together with rdiff-backup to have true remote-mountable, encrypted, deduplicated, versioned backups.
Trying obnam again (spoiler: it’s still pretty slow), I noticed that it included a
mount command. Looking at it, I discovered fuse-python and fusepy, and realized that writing a FUSE filesystem in Python is pretty much trivial.
The astute observer will have already realized where I’m going with this: I decided to write an encrypted filesystem layer in Python! This layer would be very similar to EncFS, with a few crucial differences:
- It would work in reverse mode by default, accepting regular files and exposing an encrypted directory. This way, any backup program will see (and back up) encrypted directories, without requiring any extra storage whatsoever.
- It would also accept a configuration file with a list of directories and would expose them under the mountpoint. That way, all the backup script would need to do would be to back the mountpoint up, and various disparate directories would get backed up at once.
- It would be geared towards backups, rather than encrypted storage, and would be very fun to write.
A sample FUSE filesystem
The first step towards writing this script is to write a pure pass-through filesystem. This would merely accept one directory and expose it under the mountpoint, ensuring that all changes in that mountpoint would be mirrored to the source.
fusepy requires you to write a class with various OS-level filesystem methods defined. You define the ones your filesystem supports, and leave the others undefined, but I needed to define them all, since mine is a pass-through filesystem that should act like the original one as faithfully as possible.
It was pretty easy and fun to write this, as most methods are just thin wrappers around the os module (and, indeed, you can just assign them directly if you like, e.g.
open = os.open, but my module needs some path expansion). Unfortunately, fuse-python had a bug (as far as I can tell) that did not allow it to pass the file handlers back to the filesystem when opening and reading files, so my script couldn’t tell which file handle an app wanted to read or write to, causing failures. fusepy worked very well, and required minimal changes. It’s also a single file, so you can just bundle it with your project.
I would like to give you the code here, in case you want to implement your own filesystem. This provides a great starting point, as you can just paste the class into your project and override the methods you need, ignoring the rest.
Here is the actual listing (licensed under the BSD license):
#!/usr/bin/env python from __future__ import with_statement import os import sys import errno from fuse import FUSE, FuseOSError, Operations class Passthrough(Operations): def __init__(self, root): self.root = root # Helpers # ======= def _full_path(self, partial): if partial.startswith("/"): partial = partial[1:] path = os.path.join(self.root, partial) return path # Filesystem methods # ================== def access(self, path, mode): full_path = self._full_path(path) if not os.access(full_path, mode): raise FuseOSError(errno.EACCES) def chmod(self, path, mode): full_path = self._full_path(path) return os.chmod(full_path, mode) def chown(self, path, uid, gid): full_path = self._full_path(path) return os.chown(full_path, uid, gid) def getattr(self, path, fh=None): full_path = self._full_path(path) st = os.lstat(full_path) return dict((key, getattr(st, key)) for key in ('st_atime', 'st_ctime', 'st_gid', 'st_mode', 'st_mtime', 'st_nlink', 'st_size', 'st_uid')) def readdir(self, path, fh): full_path = self._full_path(path) dirents = ['.', '..'] if os.path.isdir(full_path): dirents.extend(os.listdir(full_path)) for r in dirents: yield r def readlink(self, path): pathname = os.readlink(self._full_path(path)) if pathname.startswith("/"): # Path name is absolute, sanitize it. return os.path.relpath(pathname, self.root) else: return pathname def mknod(self, path, mode, dev): return os.mknod(self._full_path(path), mode, dev) def rmdir(self, path): full_path = self._full_path(path) return os.rmdir(full_path) def mkdir(self, path, mode): return os.mkdir(self._full_path(path), mode) def statfs(self, path): full_path = self._full_path(path) stv = os.statvfs(full_path) return dict((key, getattr(stv, key)) for key in ('f_bavail', 'f_bfree', 'f_blocks', 'f_bsize', 'f_favail', 'f_ffree', 'f_files', 'f_flag', 'f_frsize', 'f_namemax')) def unlink(self, path): return os.unlink(self._full_path(path)) def symlink(self, name, target): return os.symlink(name, self._full_path(target)) def rename(self, old, new): return os.rename(self._full_path(old), self._full_path(new)) def link(self, target, name): return os.link(self._full_path(target), self._full_path(name)) def utimens(self, path, times=None): return os.utime(self._full_path(path), times) # File methods # ============ def open(self, path, flags): full_path = self._full_path(path) return os.open(full_path, flags) def create(self, path, mode, fi=None): full_path = self._full_path(path) return os.open(full_path, os.O_WRONLY | os.O_CREAT, mode) def read(self, path, length, offset, fh): os.lseek(fh, offset, os.SEEK_SET) return os.read(fh, length) def write(self, path, buf, offset, fh): os.lseek(fh, offset, os.SEEK_SET) return os.write(fh, buf) def truncate(self, path, length, fh=None): full_path = self._full_path(path) with open(full_path, 'r+') as f: f.truncate(length) def flush(self, path, fh): return os.fsync(fh) def release(self, path, fh): return os.close(fh) def fsync(self, path, fdatasync, fh): return self.flush(path, fh) def main(mountpoint, root): FUSE(Passthrough(root), mountpoint, nothreads=True, foreground=True) if __name__ == '__main__': main(sys.argv, sys.argv)
If you want to run it as a script, just install fusepy, put the script into a file (e.g.
myfuse.py) and run
python myfuse.py /your/dir /mnt/point. You will see all files in
/mnt/point and be able to manipulate them exactly as if they were in the original filesystem.
You can find the latest version of the above code in this GitHub repo:
NOTE: Felix Fontein was kind enough to contact me with a bug of the sample code where it failed when used in multiple threads. The
nothreads=True parameter ensures that won’t happen.
Overall, I didn’t expect writing a filesystem to be so easy. What remains now is to write the encryption/decryption functionality into the script, as well as some helper methods. I aim to have this script be a full EncFS alternative, except it will be much more extensible (as it’s written in Python), and it will include extra features aimed at backing up files.
If you want to follow the development of this script, please subscribe to my mailing list below, or follow me on Twitter. As always, feedback is very welcome (the comments section below is ideal).