CMU15445 Lecture 22 Introduction to Distributed Databases

PARALLEL VS. DISTRIBUTED

PARALLEL(比如oracle的数据库一体机)

节点(机器)之间离得很近，比如放在一个机房之中
节点之间的数据在高速局域网(LAN)中传输
数据传输的花销小
DISTRIBUTED
数据库集群的节点之间距离很远
节点之间通过network连接
数据传输的延迟很大

Distributed DBMSs

容错问题，避免单节点故障影响整个系统

System Architectures

system architecture指的是distrubuted system的内存与disk的分布，以及cpu是如何去访问这些内存与disk。architecture能够影响cpu如何去存取database中的数据

shared everything已经算不上是分布式数据库了，一般是在自己的电脑上测试集群

Shared Memory

内存与硬盘是统一的，数据库之间互相感知，这种architecture基本不用，应该这种架构的memory根本就不能叫memoery了

Shared Disk

把海量的数据存储到统一的硬盘中
存储节点(disk)与计算节点(cpu)解耦

内存中的缓存需要同步，比如一个节点修改了数据库内容，那么需要通知其他节点去读取新的内容

Shared Nothing

类似与互联网上的主机通信

如果要修改的数据在本地的话，那么修改的速度会很快
为了容错，各个节点之间也会存在数据副本

Design Issues(设计分布式数据库需要考虑哪些问题)

How does the application find data?

How should queries be executed on a distributed data? Should the query be pushed to where the data

is located?
把query发到各个节点，最后把结果汇总

Or should the data be pooled into a common location to execute the query?

把data先汇总，之后对汇总后的数据做query

How does the DBMS ensure correctness?(如何保证分布式数据库节点之间的一致性)

集群中的节点是如何交互的

homogenous node(均一的节点)，指的是每个节点的作用一致，存的内容不同，这样会使得划分更简单，但是也会更容易故障
heterogeneous node，nodes的职责不是一样的

Partitioning Schemes

NA?VE TABLE PARTITIONING

HORIZONTAL PARTITIONING

logically partition是指把disk切开吗？
查找时最好使用partitionKey

扩容时要重新hash

consistent hash

解决了数据库扩容时，数据重新hash的问题

需要设置副本时

LOGICAL PARTITIONING

shard disk只是逻辑上区分

PHYSICAL PARTITIONING

Distributed Concurrency Control

有两种分布式事务的处理方式
middleware方案与centralized coordinator不同之处在于不用告知middleware具体的节点号，application server对于centralized coordinator需要加锁时要告知其特定的节点号，而middleware不需要被告知，只要给它quer，它就知道如何去查找相关节点，也就是具有router的作用

decentralized 不容易处deadlock

Centralized

对于centralized的transaction处理方式，如果多个client请求centralized的coordinator，会出现并发度过大，处理不过来的问题

Decentralized

CMU15445 公开课数据库

CMU15445 Lecture 22 Introduction to Distributed Databases

PARALLEL VS. DISTRIBUTED

Distributed DBMSs

System Architectures

Shared Memory

Shared Disk

Shared Nothing

Design Issues(设计分布式数据库需要考虑哪些问题)

How does the application find data?

How should queries be executed on a distributed data? Should the query be pushed to where the data

Or should the data be pooled into a common location to execute the query?

How does the DBMS ensure correctness?(如何保证分布式数据库节点之间的一致性)

集群中的节点是如何交互的

Partitioning Schemes

NA?VE TABLE PARTITIONING

HORIZONTAL PARTITIONING

consistent hash

LOGICAL PARTITIONING

PHYSICAL PARTITIONING

Distributed Concurrency Control

Centralized

Decentralized

相关

mysql数据库死锁的产生原因及解决办法

【数据库】快速安装 SQL Server Express

关于SQL递归查询在不同数据库中的实现方法

GraphX 在图数据库 Nebula Graph 的图计算实践

主流开源分布式图数据库 Benchmark

图数据库 Nebula Graph 的安装部署

图数据库 Nebula Graph 是什么

Windows安装SQL Server数据库

SQL Server 维护计划实现数据库备份（Step by Step）(转)

mysql查出来的时间跟数据库相差13/14小时

Gin中记录Gorm数据库表生成-查询使用

如何用时序数据库 CTSDB 与 TARS 结合，解决海量监控数据难题

标签