面臨的問(wèn)題: select * from a, b where a.user_id =b.user_id
在分庫(kù)分表的情況下,如何決定一個(gè)正確的JDBC DataSource,一個(gè)正確的Table Name
數(shù)據(jù)庫(kù)模式
本文檔中提供了兩個(gè)數(shù)據(jù)源db0
和db1
,每個(gè)數(shù)據(jù)源之中包含了兩組表t_order_0
和t_order_1
,t_order_item_0
和t_order_item_1
。這兩組表的建表語(yǔ)句為:
CREATE TABLE IF NOT EXISTS `t_order_x` ( `order_id` INT NOT NULL, `user_id` INT NOT NULL, PRIMARY KEY (`order_id`) ); CREATE TABLE IF NOT EXISTS `t_order_item_x` ( `item_id` INT NOT NULL, `order_id` INT NOT NULL, `user_id` INT NOT NULL, PRIMARY KEY (`item_id`) );
邏輯表與實(shí)際表映射關(guān)系
均勻分布
數(shù)據(jù)表在每個(gè)數(shù)據(jù)源內(nèi)呈現(xiàn)均勻分布的態(tài)勢(shì)
db0 ├── t_order_0 └── t_order_1 db1 ├── t_order_0 └── t_order_1
表規(guī)則可以使用默認(rèn)的配置
TableRule orderTableRule = TableRule.builder("t_order").actualTables(Arrays.asList("t_order_0", "t_order_1")).dataSourceRule(dataSourceRule).build();
自定義分布
數(shù)據(jù)表呈現(xiàn)有特定規(guī)則的分布
db0 ├── t_order_0 └── t_order_1 db1 ├── t_order_2 ├── t_order_3 └── t_order_4
表規(guī)則可以指定每張表在數(shù)據(jù)源中的分布情況
TableRule orderTableRule = TableRule.builder("t_order").actualTables(Arrays.asList("db0.t_order_0", "db0.t_order_1", "db1.t_order_2", "db1.t_order_3", "db1.t_order_4")).dataSourceRule(dataSourceRule).build();
本教程采用的數(shù)據(jù)分布例子
db0 ├── t_order_0 user_id為偶數(shù) order_id為偶數(shù) ├── t_order_1 user_id為偶數(shù) order_id為奇數(shù) ├── t_order_item_0 user_id為偶數(shù) order_id為偶數(shù) └── t_order_item_1 user_id為偶數(shù) order_id為奇數(shù) db1 ├── t_order_0 user_id為奇數(shù) order_id為偶數(shù) ├── t_order_1 user_id為奇數(shù) order_id為奇數(shù) ├── t_order_item_0 user_id為奇數(shù) order_id為偶數(shù) └── t_order_item_1 user_id為奇數(shù) order_id為奇數(shù)
邏輯表與實(shí)際表
配置分庫(kù)分表的目的是將原有一張表的數(shù)據(jù)分散到不同庫(kù)不同表中,且不改變?cè)?code style="box-sizing: border-box; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14.4px; padding: 2px 4px; color: #c7254e; border-radius: 4px; background-color: #f9f2f4;">SQL語(yǔ)句的情況下來(lái)使用這一張表。那么從一張表到多張的映射關(guān)系需要使用邏輯表與實(shí)際表這兩種概念。下面通過(guò)一個(gè)例子來(lái)解釋一下。假設(shè)在使用PreparedStatement
訪問(wèn)數(shù)據(jù)庫(kù),SQL
如下:
select * from t_order where user_id = ? and order_id = ?;
當(dāng)user_id=0
且order=0
時(shí),Sharding-JDBC
將會(huì)將SQL
語(yǔ)句轉(zhuǎn)換為如下形式:
select * from db0.t_order_0 where user_id = ? and order_id = ?;
其中原始SQL
中的t_order
就是 邏輯表,而轉(zhuǎn)換后的db0.t_order_0
就是 實(shí)際表
規(guī)則配置
以上分庫(kù)分表的形式Sharding-JDBC
是通過(guò)規(guī)則配置來(lái)進(jìn)行的描述的,下面講通過(guò)幾個(gè)小節(jié)來(lái)描述規(guī)則的詳細(xì)配置:
ShardingRule shardingRule = ShardingRule.builder() .dataSourceRule(dataSourceRule) .tableRules(Arrays.asList(orderTableRule, orderItemTableRule)) .databaseShardingStrategy(new DatabaseShardingStrategy("user_id", new ModuloDatabaseShardingAlgorithm())) .tableShardingStrategy(new TableShardingStrategy("order_id", new ModuloTableShardingAlgorithm()))) .build();
數(shù)據(jù)源配置
首先我們來(lái)構(gòu)造DataSourceRule
對(duì)象,它是來(lái)描述數(shù)據(jù)源的分布規(guī)則的。
DataSourceRule dataSourceRule = new DataSourceRule(dataSourceMap);
這里構(gòu)造器需要一個(gè)入?yún)ⅲ簲?shù)據(jù)源名稱(chēng)與真實(shí)數(shù)據(jù)源之間的映射關(guān)系,這個(gè)關(guān)系的構(gòu)造方法如下
Map<String, DataSource> dataSourceMap = new HashMap<>(2); dataSourceMap.put("ds_0", createDataSource("ds_0")); dataSourceMap.put("ds_1", createDataSource("ds_1"));
真實(shí)的數(shù)據(jù)源可以使用任意一種數(shù)據(jù)庫(kù)連接池,這里使用DBCP來(lái)舉例
private static DataSource createDataSource(final String dataSourceName) { BasicDataSource result = new BasicDataSource(); result.setDriverClassName(com.mysql.jdbc.Driver.class.getName()); result.setUrl(String.format("jdbc:mysql://localhost:3306/%s", dataSourceName)); result.setUsername("root"); result.setPassword(""); return result; }
策略配置
數(shù)據(jù)源策略與表策略
Sharding-JDBC認(rèn)為對(duì)于分片策略存有兩種維度 - 數(shù)據(jù)源分片策略
DatabaseShardingStrategy
:數(shù)據(jù)被分配的目標(biāo)數(shù)據(jù)源 - 表分片策略TableShardingStrategy
:數(shù)據(jù)被分配的目標(biāo)表,該目標(biāo)表存在與該數(shù)據(jù)的目標(biāo)數(shù)據(jù)源內(nèi)。故表分片策略是依賴(lài)與數(shù)據(jù)源分片策略的結(jié)果的 這里注意的是兩種策略的API完全相同,以下針對(duì)策略API的講解將適用于這兩種策略
全局默認(rèn)策略與特定表策略
策略是作用在特定的表規(guī)則上的,數(shù)據(jù)源策略與表策略與特定表相關(guān)
TableRule orderTableRule = TableRule.builder("t_order") .actualTables(Arrays.asList("t_order_0", "t_order_1") .dataSourceRule(dataSourceRule) .databaseShardingStrategy(new DatabaseShardingStrategy("user_id", new ModuloDatabaseShardingAlgorithm())) .tableShardingStrategy(new TableShardingStrategy("order_id", new ModuloTableShardingAlgorithm()))) .build();
如果分片規(guī)則中的所有表或大部分表的分片策略相同,可以使用默認(rèn)策略來(lái)簡(jiǎn)化配置。以下兩種配置是等價(jià)的:
//使用了默認(rèn)策略配置 TableRule orderTableRule = TableRule.builder("t_order") .actualTables(Arrays.asList("t_order_0", "t_order_1") .dataSourceRule(dataSourceRule) .build(); TableRule orderItemTableRule = TableRule.builder("t_order_item") .actualTables(Arrays.asList("t_order_item_0", "t_order_item_1") .dataSourceRule(dataSourceRule) .build(); ShardingRule shardingRule = ShardingRule.builder() .dataSourceRule(dataSourceRule) .tableRules(Arrays.asList(orderTableRule, orderItemTableRule)) .databaseShardingStrategy(new DatabaseShardingStrategy("user_id", new ModuloDatabaseShardingAlgorithm())) .tableShardingStrategy(new TableShardingStrategy("order_id", new ModuloTableShardingAlgorithm()))) .build();
//未使用默認(rèn)策略配置 TableRule orderTableRule = TableRule.builder("t_order") .actualTables(Arrays.asList("t_order_0", "t_order_1") .dataSourceRule(dataSourceRule) .build(); TableRule orderItemTableRule = TableRule.builder("t_order_item") .actualTables(Arrays.asList("t_order_item_0", "t_order_item_1") .dataSourceRule(dataSourceRule) .databaseShardingStrategy(new DatabaseShardingStrategy("user_id", new ModuloDatabaseShardingAlgorithm())) .tableShardingStrategy(new TableShardingStrategy("order_id", new ModuloTableShardingAlgorithm()))) .build(); ShardingRule shardingRule = ShardingRule.builder() .dataSourceRule(dataSourceRule) .tableRules(Arrays.asList(orderTableRule, orderItemTableRule)) .databaseShardingStrategy(new DatabaseShardingStrategy("user_id", new ModuloDatabaseShardingAlgorithm())) .tableShardingStrategy(new TableShardingStrategy("order_id", new ModuloTableShardingAlgorithm()))) .build();
分片鍵
分片鍵是分片策略的第一個(gè)參數(shù)。分片鍵表示的是SQL語(yǔ)句中WHERE中的條件列。分片鍵可以配置多個(gè)
- 單分片策略
new TableShardingStrategy("order_id", new SingleKeyShardingAlgorithm()))
- 多分片策略
new TableShardingStrategy(Arrays.asList("order_id", "order_type", "order_date"), new MultiKeyShardingAlgorithm()))
分片算法
分片算法接口類(lèi)圖關(guān)系如下:
綁定表
綁定表代表一組表,這組表的邏輯表與實(shí)際表之間的映射關(guān)系是相同的。比如t_order
與t_order_item
就是這樣一組綁定表關(guān)系,它們的分庫(kù)與分表策略是完全相同的,那么可以使用它們的表規(guī)則將它們配置成綁定表
new BindingTableRule(Arrays.asList(orderTableRule, orderItemTableRule))
那么在進(jìn)行SQL路由時(shí),如果SQL為
SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.user_id=? AND o.order_id=?
其中t_order
在FROM的最左側(cè),Sharding-JDBC將會(huì)以它作為整個(gè)綁定表的主表。所有路由計(jì)算將會(huì)只使用主表的策略,那么t_order_item
表的分片計(jì)算將會(huì)使用t_order
的條件。故綁定表之間的分區(qū)鍵要完全相同。
分片算法詳解
單分片鍵算法與多分片鍵算法
這兩種算法從名字上就可以知道前者是針對(duì)只有一個(gè)分片鍵,后者是針對(duì)有多個(gè)分片鍵的。單分片鍵算法是多分片鍵算法的一種簡(jiǎn)便形式,所以完全可以使用多分片算法去替代單分片鍵算法。下面兩種形式是等價(jià)的
new TableShardingStrategy("order_id", new SingleKeyShardingAlgorithm())) new TableShardingStrategy(Arrays.asList("order_id"), new MultiKeyShardingAlgorithm()))
同時(shí)在算法內(nèi)部,doSharding
等方法的shardingValue
入?yún)⒏鶕?jù)使用算法類(lèi)型不同而不同 單分片鍵算法,方法簽名
public String doEqualSharding(final Collection<String> dataSourceNames, final ShardingValue<Integer> shardingValue)
多分片鍵算法,方法簽名
public Collection<String> doSharding(final Collection<String> availableTargetNames, final Collection<ShardingValue<?>> shardingValues)
分片鍵算法類(lèi)型
根據(jù)數(shù)據(jù)源策略與表策略、單分片與多分片,這兩種組合,一共產(chǎn)生了4種可供實(shí)現(xiàn)的分片算法的接口
- 單分片鍵數(shù)據(jù)源分片算法
SingleKeyDatabaseShardingAlgorithm
- 單分片表分片算法
SingleKeyTableShardingAlgorithm
- 多分片鍵數(shù)據(jù)源分片算法
MultipleKeyDatabaseShardingAlgorithm
- 多分片表分片算法
MultipleKeyTableShardingAlgorithm
單分片鍵算法
單分片鍵算法需要實(shí)現(xiàn)三個(gè)方法,下面以”單分片鍵數(shù)據(jù)源分片算法“舉例
@Override public String doEqualSharding(final Collection<String> availableTargetNames, final ShardingValue<Integer> shardingValue) @Override public Collection<String> doInSharding(final Collection<String> availableTargetNames, final ShardingValue<Integer> shardingValue) @Override public Collection<String> doBetweenSharding(final Collection<String> availableTargetNames, final ShardingValue<Integer> shardingValue)
這三種算法作用如下 - doEqualSharding
在WHERE使用=
作為條件分片鍵。算法中使用shardingValue.getValue()
獲取等=
后的值 - doInSharding
在WHERE使用IN
作為條件分片鍵。算法中使用shardingValue.getValues()
獲取IN
后的值 - doBetweenSharding
在WHERE使用BETWEEN
作為條件分片鍵。算法中使用shardingValue.getValueRange()
獲取BETWEEN
后的值
下面是一個(gè)余2的算法的例子,當(dāng)分片鍵的值除以2余數(shù)就是實(shí)際表的結(jié)尾。注意注釋中提供了一些算法生成SQL的結(jié)果,參數(shù)tableNames
集合中有兩個(gè)參數(shù)t_order_0
和t_order_1
public final class ModuloTableShardingAlgorithm implements SingleKeyTableShardingAlgorithm<Integer> { /** * select * from t_order from t_order where order_id = 11 * └── SELECT * FROM t_order_1 WHERE order_id = 11 * select * from t_order from t_order where order_id = 44 * └── SELECT * FROM t_order_0 WHERE order_id = 44 */ public String doEqualSharding(final Collection<String> tableNames, final ShardingValue<Integer> shardingValue) { for (String each : tableNames) { if (each.endsWith(shardingValue.getValue() % 2 + "")) { return each; } } throw new IllegalArgumentException(); } /** * select * from t_order from t_order where order_id in (11,44) * ├── SELECT * FROM t_order_0 WHERE order_id IN (11,44) * └── SELECT * FROM t_order_1 WHERE order_id IN (11,44) * select * from t_order from t_order where order_id in (11,13,15) * └── SELECT * FROM t_order_1 WHERE order_id IN (11,13,15) * select * from t_order from t_order where order_id in (22,24,26) * └──SELECT * FROM t_order_0 WHERE order_id IN (22,24,26) */ public Collection<String> doInSharding(final Collection<String> tableNames, final ShardingValue<Integer> shardingValue) { Collection<String> result = new LinkedHashSet<>(tableNames.size()); for (Integer value : shardingValue.getValues()) { for (String tableName : tableNames) { if (tableName.endsWith(value % 2 + "")) { result.add(tableName); } } } return result; } /** * select * from t_order from t_order where order_id between 10 and 20 * ├── SELECT * FROM t_order_0 WHERE order_id BETWEEN 10 AND 20 * └── SELECT * FROM t_order_1 WHERE order_id BETWEEN 10 AND 20 */ public Collection<String> doBetweenSharding(final Collection<String> tableNames, final ShardingValue<Integer> shardingValue) { Collection<String> result = new LinkedHashSet<>(tableNames.size()); Range<Integer> range = (Range<Integer>) shardingValue.getValueRange(); for (Integer i = range.lowerEndpoint(); i <= range.upperEndpoint(); i++) { for (String each : tableNames) { if (each.endsWith(i % 2 + "")) { result.add(each); } } } return result; } }
多分片鍵算法
多分片鍵試用于使用場(chǎng)景比較復(fù)雜,為了能提供更高的靈活性,故只提供實(shí)現(xiàn)一個(gè)方法。
@Override public Collection<String> doSharding(final Collection<String> availableTargetNames, final Collection<ShardingValue<?>> shardingValues)
算法實(shí)現(xiàn)的時(shí)候根據(jù)shardingValue.getType()
來(lái)獲取條件是=
,IN
或者BETWEEN
。然后根據(jù)業(yè)務(wù)進(jìn)行靈活的實(shí)現(xiàn)。
如果表的數(shù)據(jù)分布如下
db0 ├── t_order_00 user_id以a偶數(shù) order_id為偶數(shù) ├── t_order_01 user_id以a偶數(shù) order_id為奇數(shù) ├── t_order_10 user_id以b奇數(shù) order_id為偶數(shù) └── t_order_11 user_id以b奇數(shù) order_id為奇數(shù)
算法實(shí)現(xiàn)如下:
public final class MultipleKeysModuloTableShardingAlgorithm implements MultipleKeysTableShardingAlgorithm { @Override public Collection<String> doSharding(final Collection<String> availableTargetNames, final Collection<ShardingValue<?>> shardingValues) { Set<Integer> orderIdValueSet = getShardingValue(shardingValues, "order_id"); Set<Integer> userIdValueSet = getShardingValue(shardingValues, "user_id"); List<String> result = new ArrayList<>(); /* userIdValueSet[10,11] + orderIdValueSet[101,102] => valueResult[[10,101],[10,102],[11,101],[11,102]] */ Set<List<Integer>> valueResult = Sets.cartesianProduct(userIdValueSet, orderIdValueSet); for (List<Integer> value : valueResult) { String suffix = Joiner.on("").join(value.get(0) % 2, value.get(1) % 2); for (String tableName : availableTargetNames) { if (tableName.endsWith(suffix)) { result.add(tableName); } } } return result; } private Set<Integer> getShardingValue(final Collection<ShardingValue<?>> shardingValues, final String shardingKey) { Set<Integer> valueSet = new HashSet<>(); ShardingValue<Integer> shardingValue = null; for (ShardingValue<?> each : shardingValues) { if (each.getColumnName().equals(shardingKey)) { shardingValue = (ShardingValue<Integer>) each; break; } } if (null == shardingValue) { return valueSet; } switch (shardingValue.getType()) { case SINGLE: valueSet.add(shardingValue.getValue()); break; case LIST: valueSet.addAll(shardingValue.getValues()); break; case RANGE: for (Integer i = shardingValue.getValueRange().lowerEndpoint(); i <= shardingValue.getValueRange().upperEndpoint(); i++) { valueSet.add(i); } break; default: throw new UnsupportedOperationException(); } return valueSet; } }
構(gòu)造ShardingDataSource
完成規(guī)則配置后,我們可以通過(guò)ShardingDataSourceFactory
工廠得到ShardingDataSource
DataSource dataSource = new ShardingDataSourceFactory.createDataSource(shardingRule);
使用ShardingDataSource
通過(guò)一個(gè)例子來(lái)看看如何使用該數(shù)據(jù)源
String sql = "SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.user_id=? AND o.order_id=?"; try ( Connection conn = dataSource.getConnection(); PreparedStatement preparedStatement = conn.prepareStatement(sql); ) { preparedStatement.setInt(1, 10); preparedStatement.setInt(2, 1001); ResultSet rs = preparedStatement.executeQuery(); while (rs.next()) { System.out.println(rs.getInt(1)); System.out.println(rs.getInt(2)); System.out.println(rs.getInt(3)); } rs.close(); }
該數(shù)據(jù)源與普通數(shù)據(jù)源完全相同,你可以通過(guò)上例的API形式來(lái)使用,也可以將其配置在Spring,Hibernate等框架中使用。
如果希望不依賴(lài)于表中的列傳入分片鍵值,參考:基于暗示(Hint)的分片鍵值注冊(cè)方法