数据类型-string


数据类型-String

在讲String数据类型前,先引入一个新的对象RedisObject,其扮演着用户数据类型同底层数据结构之间桥梁的关系.

typedef struct redisObject {
    unsigned type:4; //面向用户的数据类型(String/List/Hash/Set/ZSet等),4个bits
    unsigned encoding:4; //每个数据类型对应的编码结构SDS/ziplist/intset/hashtable/skiplist等),4个bits
    unsigned lru:LRU_BITS;  //redisObject的LRU时间,LRU_BITS为24个bits
    int refcount; //redisObject的引用计数,4个字节
    void *ptr; //指具体的指针,8个字节
} robj;
//总计16个字节

优点:

  • 为多种数据类型提供统一的表示方式, 一切皆对象。

  • 同一种数据类型,对下底层可以对应不同实现,节省内存,对上统一对象,屏蔽底层实现细节。

  • 支持对象共享和引用计数,共享对象存储一份,可多次使用,节省内存。

String的外部应用场景
  • 缓存功能:存储会话的Token,验证码,对象序列化之后的数据

  • 简单的排行榜,计数类功能

  • 分布式锁

String的内部应用场景
  • 在redis中,除字符串字面量,绝大部分可以被修改的字符串值,都是通过string来标识

String的常用命令

image-20220408191700602

String的共享对象

当我们的Value类型为整数且<=10000时,将会直接使用Redis默认的共享String对象(常用命令也有共享对象)。(即提高了效率,又节约了内存,有点类似于对于热点数据的预加热)

//server.c 系统在启动时会创建默认的共享对象
void createSharedObjects(void) {
    //...
    for (j = 0; j < OBJ_SHARED_INTEGERS; j++) {
        shared.integers[j] =
            makeObjectShared(createObject(OBJ_STRING,(void*)(long)j));
        shared.integers[j]->encoding = OBJ_ENCODING_INT;
    }
}   



/* Create a string object from a long long value. When possible returns a
 * shared integer object, or at least an integer encoded one.
 *
 * If valueobj is non zero, the function avoids returning a shared
 * integer, because the object is going to be used as value in the Redis key
 * space (for instance when the INCR command is used), so we want LFU/LRU
 * values specific for each key. */
robj *createStringObjectFromLongLongWithOptions(long long value, int valueobj) {
    robj *o;

    if (server.maxmemory == 0 ||
        !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS))
    {
        /* If the maxmemory policy permits, we can still return shared integers
         * even if valueobj is true. */
        valueobj = 0;
    }

    //在共享对象表示的范围内,会使用共享对象
    if (value >= 0 && value < OBJ_SHARED_INTEGERS && valueobj == 0) {
        incrRefCount(shared.integers[value]);
        o = shared.integers[value];
    } else {
        if (value >= LONG_MIN && value <= LONG_MAX) {
            o = createObject(OBJ_STRING, NULL);
            o->encoding = OBJ_ENCODING_INT;
            o->ptr = (void*)((long)value);
        } else {
            o = createObject(OBJ_STRING,sdsfromlonglong(value));
        }
    }
    return o;
}
String的三种编码结构
/* Try to encode a string object in order to save space */
robj *tryObjectEncoding(robj *o) {
    long value;
    sds s = o->ptr;
    size_t len;

    /* Make sure this is a string object, the only type we encode
     * in this function. Other types use encoded memory efficient
     * representations but are handled by the commands implementing
     * the type. */
    serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);

    /* We try some specialized encoding only for objects that are
     * RAW or EMBSTR encoded, in other words objects that are still
     * in represented by an actually array of chars. */
    if (!sdsEncodedObject(o)) return o;

    /* It's not safe to encode shared objects: shared objects can be shared
     * everywhere in the "object space" of Redis and may end in places where
     * they are not handled. We handle them only as values in the keyspace. */
     if (o->refcount > 1) return o;

    /* Check if we can represent this string as a long integer.
     * Note that we are sure that a string larger than 20 chars is not
     * representable as a 32 nor 64 bit integer. */
    len = sdslen(s);
    if (len <= 20 && string2l(s,len,&value)) {
        /* This object is encodable as a long. Try to use a shared object.
         * Note that we avoid using shared integers when maxmemory is used
         * because every object needs to have a private LRU field for the LRU
         * algorithm to work well. */
        if ((server.maxmemory == 0 ||
            !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
            value >= 0 &&
            value < OBJ_SHARED_INTEGERS)
        {
            decrRefCount(o);
            incrRefCount(shared.integers[value]);
            return shared.integers[value];
        } else {
            if (o->encoding == OBJ_ENCODING_RAW) {
                sdsfree(o->ptr);
                o->encoding = OBJ_ENCODING_INT;
                o->ptr = (void*) value;
                return o;
            } else if (o->encoding == OBJ_ENCODING_EMBSTR) {
                decrRefCount(o);
                return createStringObjectFromLongLongForValue(value);
            }
        }
    }

    /* If the string is small and is still RAW encoded,
     * try the EMBSTR encoding which is more efficient.
     * In this representation the object and the SDS string are allocated
     * in the same chunk of memory to save space and cache misses. */
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
        robj *emb;

        if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
        emb = createEmbeddedStringObject(s,sdslen(s));
        decrRefCount(o);
        return emb;
    }

    /* We can't encode the object...
     *
     * Do the last try, and at least optimize the SDS string inside
     * the string object to require little space, in case there
     * is more than 10% of free space at the end of the SDS string.
     *
     * We do that only for relatively large strings as this branch
     * is only entered if the length of the string is greater than
     * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */
    trimStringObjectIfNeeded(o);

    /* Return the original object. */
    return o;
}
编码类型 转换规则
int 长度小于20并且值范围要在长整型表示范围类(64位系统<2^63-1==9223372036854775808)
embstr 长度<=44
raw 其他

image-20220408170556282

注意事项

老规矩计算成本:(其中16为redisobject的固定开销)

编码类型 开销 备注
int 16 redis只进行一次内存分配
embstr 16+3+1+len redis进行两次内存分配,但redisobject同sds部分是紧密相连。emb与raw编码的关键节点44=64-20
raw 16+sdsheader+len+1 redis进行两次内存分享,但redisobject同sds部分是分隔开的。
参考资料

极客时间,蒋德钧《Redis核心技术与实战》

黄建宏,《Redis设计与实现》

钱文品,《Redis深度历险:核心原理与应用实践》