Just enough Weave
Note: I am keeping this code around for historical purposes, but it has not worked since Weave 1.0 RC2. I created this because Mozilla’s public sync servers were initially quite unreliable, but they have remedied the situation and performance problems are a thing of the past. I also learned the inner workings of Weave/Firefox Sync in the process, and am satisfied as to the security of the system. Since I no longer use Firefox myself, I do not expect to ever revive this project. Feel free to take it over, otherwise you are best served by using Mozilla’s cloud.
Like most of my readers, I use multiple computers: my Mac Pro at home, my MacBook Air when on the road, 3 desktop PCs at work, a number of virtual machines, and so on. I have Firefox installed on all of them. The Mozilla Weave extension allows me to sync bookmarks, passwords et al between them. Weave encrypts this data before uploading it to the server, but I do not like to rely on third-party web services for mission-critical functions (my Mozilla server was down last Monday, for instance, due to the surge of traffic from people returning to work and performing a full sync against 0.5). Through Weave 0.5, I ran my own instance of the Mozilla public Weave server version 0.3. Unfortunately, Weave 0.6 requires server version 0.5 and I had to upgrade.
The open-source Weave server is implemented in PHP. It doesn’t require Apache compiled with mod_dav as early versions did (I prefer to run nginx), but it is still a fairly gnarly piece of code that is anything but plug-and-play. Somehow I had managed to get version 0.3 running on my home server, but no amount of blundering around got me to a usable state with 0.5. I ended up deciding to implement a minimalist Weave server in Python, as it seemed less painful than continuing to struggle with the Mozilla spaghetti code, which confusingly features multiple pieces of code that appear to do exactly the same thing in three different places. Famous last words…
Three days of hacking later, I managed to get it working. 200 or so lines of Python code replaced approximately 12,000 lines of PHP. Of course, I am not trying to reproduce an entire public cloud infrastructure like Mozilla’s, just enough for my own needs, using the “simplest thing that works” principle. Interestingly, the Mozilla code includes a vestigial Python reference implementation of a Weave server for testing purposes. It does not seem to have been working for a while, though. I used it as a starting point but ended up rewriting almost everything. Here are the simplifying hypotheses:
- My weave server is meant for a single user (my wife prefers Safari)
- It does not implement authentication, logging or SSL encryption — it is meant to be used behind a nginx (or Apache) reverse proxy that will perform these functions.
- It has no configuration file. There are just three variables to set at the top of the source file.
- It does not implement the full server protocol, just the parts that are actually used by the extension today.
- More controversially, it does not even implement persistence, keeping all data in RAM instead. Python running on Solaris is very reliable, and the expected uptime of the server is likely months on end. If the server fails, the Firefoxes will just have to perform a full sync and reconciliation. Fortunately, that has been much improved in Weave 0.6, so the cost is minimal. This could even be construed as a security feature, since there is no data on disk to be misplaced. It would take catastrophically losing all my browsers simultaneously to risk data loss. Short of California falling into the ocean, that’s not going to happen, and if it does, I probably have more pressing concerns…
The code could be extended fairly easily to lift these hypotheses, e.g. adding persistence or multiple user support using SQLite, PostgreSQL or MySQL.
Here is the server itself, weave_server.py:
#!/usr/local/bin/python
"""
Based on tools/scripts/weave_server.py from
http://hg.mozilla.org/labs/weave/
do the Simplest Thing That Can Work: just enough to get by with Weave 0.6
- SSL, authentication and loggin are done by nginx or other reverse proxy
- no persistence, in case of process failure do a full resync
- only one user. If you need more, create multiple instances on different
ports and use rewrite rules to route traffic to the right one
"""
import sys, time, logging, socket, urlparse, httplib, pprint
try:
import simplejson as json
except ImportError:
import json
import wsgiref.simple_server
URL_BASE = 'https://your.server.name/'
#BIND_IP = ''
BIND_IP = '127.0.0.1'
DEFAULT_PORT = 8000
class HttpResponse:
def __init__(self, code, content='', content_type='text/plain'):
self.status = '%s %s' % (code, httplib.responses.get(code, ''))
self.headers = [('Content-type', content_type),
('X-Weave-Timestamp', str(timestamp()))]
self.content = content or self.status
def JsonResponse(value):
return HttpResponse(httplib.OK, value, content_type='application/json')
class HttpRequest:
def __init__(self, environ):
self.environ = environ
content_length = environ.get('CONTENT_LENGTH')
if content_length:
stream = environ['wsgi.input']
self.contents = stream.read(int(content_length))
else:
self.contents = ''
def timestamp():
# Weave rounds to 2 digits and so must we, otherwise rounding errors will
# influence the "newer" and "older" modifiers
return round(time.time(), 2)
class WeaveApp():
"""WSGI app for the Weave server"""
def __init__(self):
self.collections = {}
def url_base(self):
"""XXX should derive this automagically from self.request.environ"""
return URL_BASE
def ts_col(self, col):
self.collections.setdefault('timestamps', {})[col] = str(timestamp())
def parse_url(self, path):
if not path.startswith('/0.5/') and not path.startswith('/1.0/'):
return
command, args = path.split('/', 4)[3:]
return command, args
def opts_test(self, opts):
if 'older' in opts:
return float(opts['older'][0]).__ge__
elif 'newer' in opts:
return float(opts['newer'][0]).__le__
else:
return lambda x: True
# HTTP method handlers
def _handle_PUT(self, path, environ):
command, args = self.parse_url(path)
col, key = args.split('/', 1)
assert command == 'storage'
val = self.request.contents
if val[0] == '{':
val = json.loads(val)
val['modified'] = timestamp()
val = json.dumps(val, sort_keys=True)
self.collections.setdefault(col, {})[key] = val
self.ts_col(col)
return HttpResponse(httplib.OK)
def _handle_POST(self, path, environ):
try:
status = httplib.NOT_FOUND
if path.startswith('/0.5/') or path.startswith('/1.0/'):
command, args = self.parse_url(path)
col = args.split('/')[0]
vals = json.loads(self.request.contents)
for val in vals:
val['modified'] = timestamp()
self.collections.setdefault(col, {})[val['id']] = json.dumps(val)
self.ts_col(col)
status = httplib.OK
finally:
return HttpResponse(status)
def _handle_DELETE(self, path, environ):
assert path.startswith('/0.5/') or path.startswith('/1.0/')
response = HttpResponse(httplib.OK)
if path.endswith('/storage/0'):
self.collections.clear()
elif path.startswith('/0.5/') or path.startswith('/1.0/'):
command, args = self.parse_url(path)
col, key = args.split('/', 1)
if not key:
opts = urlparse.parse_qs(environ['QUERY_STRING'])
test = self.opts_test(opts)
col = self.collections.setdefault(col, {})
for key in col.keys():
if test(json.loads(col[key]).get('modified', 0)):
logging.info('DELETE %s key %s' % (path, key))
del col[key]
else:
try:
del self.collections[col][key]
except KeyError:
return HttpResponse(httplib.NOT_FOUND)
return response
def _handle_GET(self, path, environ):
if path.startswith('/0.5/') or path.startswith('/1.0/'):
command, args = self.parse_url(path)
return self.handle_storage(command, args, path, environ)
elif path.startswith('/1/'):
return HttpResponse(httplib.OK, self.url_base())
elif path.startswith('/state'):
return HttpResponse(httplib.OK, pprint.pformat(self.collections))
else:
return HttpResponse(httplib.NOT_FOUND)
def handle_storage(self, command, args, path, environ):
if command == 'info':
if args == 'collections':
return JsonResponse(json.dumps(self.collections.get('timestamps', {})))
if command == 'storage':
if '/' in args:
col, key = args.split('/')
else:
col, key = args, None
try:
if not key: # list output requested
opts = urlparse.parse_qs(environ['QUERY_STRING'])
test = self.opts_test(opts)
result = []
for val in self.collections.setdefault(col, {}).itervalues():
val = json.loads(val)
if test(val.get('modified', 0)):
result.append(val)
result = sorted(result,
key=lambda val: (val.get('sortindex'),
val.get('modified')),
reverse=True)
if 'limit' in opts:
result = result[:int(opts['limit'][0])]
logging.info('result set len = %d' % len(result))
if 'application/newlines' in environ.get('HTTP_ACCEPT', ''):
value = '\n'.join(json.dumps(val) for val in result)
return HttpResponse(httplib.OK, value,
content_type='application/text')
else:
return JsonResponse(json.dumps(result))
else:
return JsonResponse(self.collections.setdefault(col, {})[key])
except KeyError:
if not key: raise
return HttpResponse(httplib.NOT_FOUND, '"record not found"',
content_type='application/json')
def __process_handler(self, handler):
path = self.request.environ['PATH_INFO']
response = handler(path, self.request.environ)
return response
def __call__(self, environ, start_response):
"""Main WSGI application method"""
self.request = HttpRequest(environ)
method = '_handle_%s' % environ['REQUEST_METHOD']
# See if we have a method called 'handle_METHOD', where
# METHOD is the name of the HTTP method to call. If we do,
# then call it.
if hasattr(self, method):
handler = getattr(self, method)
response = self.__process_handler(handler)
else:
response = HttpResponse(httplib.METHOD_NOT_ALLOWED,
'Method %s is not yet implemented.' % method)
start_response(response.status, response.headers)
return [response.content]
class NoLogging(wsgiref.simple_server.WSGIRequestHandler):
def log_request(self, *args):
pass
if __name__ == '__main__':
socket.setdefaulttimeout(300)
if '-v' in sys.argv:
logging.basicConfig(level=logging.DEBUG)
handler_class = wsgiref.simple_server.WSGIRequestHandler
else:
logging.basicConfig(level=logging.ERROR)
handler_class = NoLogging
logging.info('Serving on port %d.' % DEFAULT_PORT)
app = WeaveApp()
httpd = wsgiref.simple_server.make_server(BIND_IP, DEFAULT_PORT, app,
handler_class=handler_class)
httpd.serve_forever()
Here is the relevant fragment from my nginx configuration file:
# Mozilla Weave
location /0.5 {
auth_basic "Weave";
auth_basic_user_file /home/majid/web/conf/htpasswd.weave;
proxy_pass http://localhost:8000;
proxy_set_header Host $http_host;
}
location /1.0 {
auth_basic "Weave";
auth_basic_user_file /home/majid/web/conf/htpasswd.weave;
proxy_pass http://localhost:8000;
proxy_set_header Host $http_host;
}
location /1/ {
auth_basic "Weave";
auth_basic_user_file /home/majid/web/conf/htpasswd.weave;
proxy_pass http://localhost:8000;
proxy_set_header Host $http_host;
}
This code is hereby released into the public domain. You are welcome to use it as you wish. Just keep in mind that since it is reverse-engineered, it may well break with future releases of the Weave extension, or if Mozilla changes the server protocol.
Update (2009-10-03):
I implemented some minor changes for compatibility with Weave 0.7. The diff with the previous version is as follows:
--- weave_server.py~ Thu Sep 3 17:46:44 2009
+++ weave_server.py Sat Oct 3 02:59:19 2009
@@ -65,8 +65,7 @@
command, args = path.split('/', 4)[3:]
return command, args
- def opts_test(self, environ):
- opts = urlparse.parse_qs(environ['QUERY_STRING'])
+ def opts_test(self, opts):
if 'older' in opts:
return float(opts['older'][0]).__ge__
elif 'newer' in opts:
@@ -92,7 +91,7 @@
def _handle_POST(self, path, environ):
try:
status = httplib.NOT_FOUND
- if path.startswith('/0.5/') and path.endswith('/'):
+ if path.startswith('/0.5/'):
command, args = self.parse_url(path)
col = args.split('/')[0]
vals = json.loads(self.request.contents)
@@ -113,7 +112,8 @@
command, args = self.parse_url(path)
col, key = args.split('/', 1)
if not key:
- test = self.opts_test(environ)
+ opts = urlparse.parse_qs(environ['QUERY_STRING'])
+ test = self.opts_test(opts)
col = self.collections.setdefault(col, {})
for key in col.keys():
if test(json.loads(col[key]).get('modified', 0)):
@@ -142,10 +142,14 @@
if args == 'collections':
return JsonResponse(json.dumps(self.collections.get('timestamps', {})))
if command == 'storage':
- col, key = args.split('/')
+ if '/' in args:
+ col, key = args.split('/')
+ else:
+ col, key = args, None
try:
if not key: # list output requested
- test = self.opts_test(environ)
+ opts = urlparse.parse_qs(environ['QUERY_STRING'])
+ test = self.opts_test(opts)
result = []
for val in self.collections.setdefault(col, {}).itervalues():
val = json.loads(val)
@@ -155,6 +159,8 @@
key=lambda val: (val.get('sortindex'),
val.get('modified')),
reverse=True)
+ if 'limit' in opts:
+ result = result[:int(opts['limit'][0])]
logging.info('result set len = %d' % len(result))
if 'application/newlines' in environ.get('HTTP_ACCEPT', ''):
value = '\n'.join(json.dumps(val) for val in result)
Update (2009-11-17):
Weave 1.0b1 uses 1.0 as the protocol version string instead of 0.5 but is otherwise unchanged. I updated the script and nginx configuration accordingly.