Monday, 11 April 2011

async Redis await BookSleeve

UPDATE

BookSleeve has now been succeeded by StackExchange.Redis, for lots of reasons. The API and intent is similar, but the changes are significant enough that we had to reboot. All further development will be in StackExchange.Redis, not BookSleeve.

ORIGINAL CONTENT

At Stack Exchange, performance is a feature we work hard at. Crazy hard. Whether that means sponsoring load-balancer features to reduce system impact, or trying to out-do the ORM folks on their own turf.

One of the many tools in our performance toolkit is Redis; a highly performant key-value store that we use in various ways:

  • as our second-level cache
  • for various tracking etc counters, that we really don’t want to bother SQL Server about
  • for our pub/sub channels
  • for various other things that don’t need to go direct to SQL Server

It is really fast; we were using the redis-sharp bindings and they served us well. I have much thanks for redis-sharp, and my intent here is not to critique it at all – but rather to highlight that in some environments you might need that extra turn of the wheel. First some context:

  • Redis itself is single threaded supporting multiple connections
  • the Stack Exchange sites work in a multi-tenancy configuration, and in the case of Redis we partition (mainly) into Redis databases
  • to reduce overheads (both handshakes etc and OS resources like sockets) we re-use our Redis connection(s)
  • but since redis-sharp is not thread-safe we need to synchronize access to the connection
  • and since redis-sharp is synchronous we need to block while we get each response
  • and since we are split over Redis databases we might also first have to block while we select database

Now, LAN latency is low; most estimates put it at around 0.3ms per call – but this adds up, especially if you might be blocking other callers behind you. And even more so given that you might not even care what the response is (yes, I know we could offload that somewhere so that it doesn’t impact the current request, but we would still end up adding blocking for requests that do care).

Enter BookSleeve

Seriously, what now? What on earth is BookSleeve?

As a result of the above, we decided to write a bespoke Redis client with specific goals around solving these problems. Essentially it is a wrapper around Redis dictionary storage; and what do you call a wrapper around a dictionary? A book-sleeve. Yeah, I didn’t get it at first, but naming stuff is hard.

And we’re giving it away (under the Apache License 2.0)! Stack Exchange is happy to release our efforts here as open source, which is groovy.

So; what are the goals?

  • to operate as a fully-functional Redis client (obviously)
  • to be thread-safe and non-blocking
  • to support implicit database switching to help with multi-tenancy scenarios
  • to be on-par with redis-sharp on like scenarios (i.e. a complete request/response cycle)
  • to allow absolute minimum cost fire-and-forget usage (for when you don’t care what the reply is, and errors will be handled separately)
  • to allow use as a “future” – i.e request some data from Redis and start some other work while it is on the wire, and merge in the Redis reply when available
  • to allow use with callbacks for when you need the reply, but not necessarily as part of the current request
  • to allow C# 5 continuation usage (aka async/await)
  • to allow fully pipelined usage – i.e. issue 200 requests before we’ve even got the first response
  • to allow fully multiplexed usage – i.e. it must handle meshing the responses from different callers on different threads and on different databases but on the same connection back to the originator

(actually, Stack Exchange didn’t strictly need the C# 5 scenario; I added that while moving it to open-source, but it is an excellent fit)

Where are we? And where can I try it?

It exists; it works; it even passes some of the tests! And it is fast. It still needs some tidying, some documentation, and more tests, but I offer you BookSleeve:

http://code.google.com/p/booksleeve/

The API is very basic and should be instantly familiar to anyone who has used Redis; and documentation will be added.

In truth, the version I’m open-sourcing is more like the offspring of the version we’re currently using in production – you tend to learn a lot the first time through. But as soon as we can validate it, Stack Exchange will be using BookSleeve too.

So how about some numbers

These are based on my dev machine, running redis on the same machine, so I also include estimates using the 0.3ms latency per request as mentioned above.

In each test we are doing 5000 INCR commands (purely as an arbitrary test); spread over 5 databases, in a round-robin in batches of 10 per db – i.e. 10 on db #0, 10 on db #1, … 10 on db #4 – so that is an additional 500 SELECT commands too.

redis-sharp:

  • to completion 430ms
  • (not meaningful to measure fire-and-forget)
  • to completion assuming 0.3ms LAN latency: 2080ms

BookSleeve

  • to completion 391ms
  • 2ms fire-and-forget
  • to completion assuming 0.3ms LAN latency: 391ms

The last 2 are the key, in particular noting that the time we aren’t waiting on LAN latency is otherwise-blocking time we have subtracted for other callers (web servers tend to have more than one thing happening…); the fire-and-forget performance allows us to do a lot of operations without blocking the current caller.

As a bonus we have added to ability to do genuinely parallel work on a single caller – by starting a Redis request first, doing the other work (TSQL typically), and then asking for the Redis result. And let’s face it, while TSQL is versatile, Redis is so fast that it would be quite unusual for the Redis reply to not to already be there by the time you get to look.

Wait – did you say C# 5?

Yep; because the API is task based, it can be used in any of 3 ways without needing separate APIs:

As an example of the last:

async redis

IMPORTANT: in the above “await” does not mean “block until this is done” – it means “yield back to the caller here, and run the rest as a callback when the answer is available” – or for a better definition see Eric Lippert’s blog series.

And did I mention…

…that a high perfomance binary-based dictionary store works well when coupled with a high performance binary serializer? ;p

40 comments:

Livingston said...

So how did I totally miss that you work at Stack Exchange? Do you still live in the UK, or did you migrate to the states?

Marc Gravell said...

It was never a secret: http://blog.stackoverflow.com/2010/06/welcome-stack-overflow-valued-associates-00006-and-00007/

But yes, I'm still UK based.

Bryan said...

Good stuff!

I'd love to know more about how you decide between data to store in Redis vs stuff "that we really don’t want to bother SQL Server about."

Fodder for a future blog post?

N. Harebottle III said...

Ditto to Bryan, I'm curious about this too.

Marc Gravell said...

@Bryan @N.HarebottleIII - that is tricky to qualify; a mix of measurement and "gut"...

heriyanto binduni said...

i just download booksleeve today and when try this code:

var result = conn.GetString(db, "some-key");
Console.WriteLine(await result);


i got some error:

The 'await' operator can only be used in a method or lambda marked with the 'async' modifier

can you help? tq

heriyanto binduni said...

i've try like below too:

var result = await conn.GetString(db, "some-key");
Console.WriteLine(result);


but still got the same error (T_T)

Marc Gravell said...

With the async CTP you must mark the containing method as async. For example:

private async void Foo() {
var foo = {...}
var bar = await foo;
}

this is just how C# 5 works

Anonymous said...

You can use Visual Studio Async CTP with SP1 using new release from 12th April 2011.

http://blogs.msdn.com/b/visualstudio/archive/2011/04/13/async-ctp-refresh.aspx

Marc Gravell said...

@anon - thanks - I meant to update that; I'm using the async CTP refresh here

Mikhail Mikheev said...

Hi Marc,

I've been playing with Booksleeve 0.9 and analyzing its source code for a couple of hours and didn't actually find support for pipelining.

Hovewer at http://code.google.com/p/booksleeve/ you wrote 'By offering pipelined, asynchronous, multiplexed and thread-safe access to redis, BookSleeve enables efficient redis access even for the busiest applications.'

Have I missed something or there is no support for pipelining?

Marc Gravell said...

@Mikhail er.... ***everything*** is pipelined. Every. Single. Command.

By "pipelining" here, I mean as per the redis definition: http://redis.io/topics/pipelining - i.e. not waiting for a response after every command. The client we were replacing did exactly that: wait each time to verify the result.

With boolsleeve, the responses are handled *entirely* async (exposing the familiar Task API to allow you to handle results as you choose).

Mikhail Mikheev said...

Marc, thanks for the answer! I see now.

What I actually looked for is batching rather than pipelining. The matter I care about is sending a groups of small requests to Redis. My use case is a few small requests (to manupulate with sorted sets and finally select values) are sent as a transaction.

Keeping in mind we have MTU for ethernet I expected it is better to pack that small requests in a single TCP request (as they are more likely to fit MTU) and send them as a single package that would get us smaller lattency than sending the requests one by one.

Did you have any experiency in this area may be you tried a prototype with support of batching before implemented booksleeve? Maybe you tested such approach? If yes could you please say a few words about it as I don't know now (not tried yet) whether it make sense to have batching support as maybe network driver already does batching (i.e. buffering) for us very well?

And one more little question: i found Booksleve uses a separate thread to send TCP requests asynchronously (i.e. what you wrote about pipelining in the answer to my previous post). So why did you chose this approach? Did connection per thread (which is popular when dealing with sockets) work bad in your case? If yes what the load your system experienced when connection per thread becomes bad? (I also raised this question on google groups you could check it if you are interested: http://groups.google.com/group/redis-db/browse_thread/thread/185760fc61256920)

Mikhail Mikheev said...

By the way, what is about transactions? Actually I didn't found commands MULTI, EXEC exposed by Booksleeve.

Will they work in general with single connection approach that is used in Booksleeve? AFAIK Redis just queues commands after MULTI is called and then executes them when EXEC is recieved. As I can judge it queues all the commend recieved on a specific TCP port so if we use single TCP connection and somehow managed to send MULTI and EXEC commands by Booksleve would commands between MULTU and EXEC mixed up on Redis side if they sent from different threads on Booksleeve side?

Mikhail Mikheev said...

Hi Marc,

Sorry for being annoying. Hope this is my final question.

How do you deal with connection drop?

As you recommends in the comments for RedisConnection class I use the only instance of the connection. And I don't found any reconnection login in Booksleve (if connection is closed by Redis server or by some other reason like network outage). I have some thought how to reconnect in such situations. But I'm just interested about you way to deal with that in production.

Tons of thanks in advance! And sorry for such long questions but at least it will be usefull for other users who starts with Booksleeve to quicker understand how properly use it.

PS. BTW Booksleeve is the best of the C# clients that are now in existence as it is the only who does network calls asynchronously. It is very important in high load scenarios. Here are all clients exist at the moment enumerated from the worst to best from my point of view: redis-sharp, TeamDev Redis Client, ServiceStack.Redis, Sider, Booksleeve)

Marc Gravell said...

We deal with this by wrapping the redis connection in an abstraction layer that automatically tried to reconnect *by spawning a new client* periodically. You should also be able to handle task-level exceptions for anything that couldn't be sent, although most times we don't need that.

As it happens, our abstraction layer also provides a local in-memory cache that we use to reduce throughput to redis. We maintain that local cache by using pub/sub to watch for keys being invalidated.

Spikyz said...

Whats the recommmended way to delete a value in the cache store? Use the Expire with 0 seconds?

Marc Gravell said...

@Spikyz I'm on mobile, so I can't remember the specifics, but there is a "Delete" or "Remove" method (can't remember which). In the updated API it is under .Keys

Joel said...

I noticed references to 'localhost' in the sample code. Which makes me assume you have a Windows targeted redis install. Is there a particular one port that you recommend?

Marc Gravell said...

@Joel I use this one for local dev work: https://github.com/dmajkic/redis/downloads

Our production redis is Linux, though

Chris said...

Thanks for the article and effort Marc! Does StackExchange use this client in production today?

Marc Gravell said...

Yes, we do. All day, every day.

jackfoxy said...

What does your Redis hardware and OS stack look like? Which version of Redis are you running?

Marc Gravell said...

@jackfoxy we have 2 linux nodes configured as master/slave, sat behind haproxy with some code to co-ordinate failover if needed. We run entirely in memory (VM disabled). We're a bit behind version-wise - 2.0.2,'but currently testing a 2.4 switchover.

Marc Gravell said...

Oh, and we use a db per SE site, using Booksleeve as a multiplexer allowing each .NET app-domain to use a single redis connection to run any site (we use multi-tenancy).

As a result, everything ultimately hits a single "master" node. It works fine; CPU is really low.

jackfoxy said...

Thanks! Were a MS only shop now, this will be our first foray into Linux. How much memory are you running on the Linux nodes?

Marc Gravell said...

@jackfoxy currently 16GB, but we're probably going to add more when we do the upgrade - simply because if we need to shut them down *anyway*, and memory is insanely cheap, etc... 2.4 is actually *more* efficient (memory etc), so the upgrade will grab us more headroom too. But we're using quite a large % of that 16GB currently, so we just want to make that problem go away for a while. Like, the next 5 years. What memory you need depends *entirely* on your setup though, and I don't suggest your needs are the same - you could need more, could need less. It depends on your data sizes, your expirations, whether you enable virtual memory inside redis, etc.

Rasiel said...

Marc,

Would you mind elaborating on "some code to co-ordinate failover if needed"?

Thanks,
Raciel

Anonymous said...

There is a race-condition between Remove and the Increments.

Marc Gravell said...

the remove and the increments... can you be more specific? operations are queued (unless you intentionally prioritise them), so will be sent (and replied) in order. Can you give an illustration of the scenario you mean?

Max Solender said...

Hi -

I have started using this library over the last few days. I was using ServiceStack.Redis but ran into many problems with the connection pooling functionality.

I am seeing an issue and I would like to know if this is common.

Everything runs great for a few minutes, and then the connection becomes idle and the connection becomes closed. The code in "ReadReplyHeader" receives zero bytes and shuts down the connection. Is this the expected behavior?

For a long running server process, should I watch for connection closed (I see there is an event in the connection object) and then re-open the connection then next time I need it? I'm wondering if there is some issue, or if this was the intended behavior?

Thanks!

Marc Gravell said...

Not sure what is killing the connection there... note that if redis *advertises* that it has a connection timeout policy, then booksleeve will set up a heartbeat to ensure it stays alive. I could perhaps tweak it to add a heartbeat either way?

Naveen said...

Marc,

I am trying to use redis for pub/sub mechanism.

I have been playing with booksleeve and it works well except that

(a) Publish: takes string as value to be transmitted
(b) Subscription: returns byte[]

I have been doing byte[]/string conversions etc.

I would like to use protobuf.net objects to communicated between publisher and subscriber.

Any ideas/suggestions (even if it is not in the scope of what you do for booksleeve) to be make it more friendly with protobuf.net serializable objects.

Your suggestion will be greatly valued.

Thank you very much in advance.

Marc Gravell said...

@Naveen publish should already have an overload that takes byte[]. If it doesn't then that is an oversight, but I'd be surprised if I missed that.

Naveen said...

I appreciate Marc for your prompt response. Let me check publish before you do.

Any suggestions on pub/sub using protobuf.net objects

Naveen said...

You are absolutely right. I overlooked. Publish does have byte[] input. My like for string was so much that I left the byte[] overload. My apologies.

Marc Gravell said...

Re protobuf-net - well, protobuf-net is ideal for turning something into a byte[], usually via MemoryStream. We use this **extensively**

Naveen said...

Thank you very much Marc. I appreciate your comments.

mfelicio said...

Hello Marc,

Thank you for your effort in creating a very nice API for Redis.

I have one question though. Is there a reason for not doing asynchronous I/O when sending bytes over the NetworkStream ?

Wouldn't it be possible to use BeginWrite/EndWrite and BeginRead/EndRead instead of the synchronous Write and Read methods?

This would do non-blocking IO and should be much more efficient.

What is your opinion in this?

Best regards,
Manuel FelĂ­cio

Marc Gravell said...

In many ways I can't disagree - the current implementation (sync write, async read) is simply: some code that worked well enough. But yes, it could be async on both. I'll add it to my list of things to consider. Indeed, I'm certainly not unfamiliar with the async write API.