OceanBase 从4.3.0 版本开始,引入了列式存储的支持。用户可以根据业务的具体需求,选择创建列存表、行存表或是行列混存表。无论选择哪种表类型,在不同的Zone内,租户使用的副本模式都是一致的。详见官网文档: https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001429675

为了达成TP与AP资源在物理层面上的严格隔离,OceanBase 4.3.3.0版本引入了一种创新的部署模式:它允许在原有集群的基础上,增设独立的zone来专门存储列存副本(简称C副本)。但在4.3.3.0和4.3.3.1这两个版本中,列存副本功能被界定为实验性质,因此并不推荐在生产环境中应用。

副本类型的说明详见官网文档:

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001431874

副本类型选举投票日志投票sstableclogmemtable副本类型转换
F参与参与有,major为行存sstable可以转为R副本
R不参与不参与有,major为行存sstable可以转为F副本
C不参与不参与有,major为列存sstable不能转为其他副本

创建列存副本前的环境

# 集群拓扑
MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP         | SVR_PORT | ID | ZONE  | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME         | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME                | MODIFY_TIME                | BUILD_VERSION                                                                             | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 |    12882 |  1 | zone1 |    12881 | YES             | ACTIVE | 2024-11-04 10:27:09.942001 | NULL      | NULL                  | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.191 |    22882 |  2 | zone2 |    22881 | NO              | ACTIVE | 2024-11-04 10:28:31.472704 | NULL      | NULL                  | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    32882 |  3 | zone3 |    32881 | NO              | ACTIVE | 2024-11-04 10:29:29.111769 | NULL      | NULL                  | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
3 rows in set (0.01 sec)


# 模拟已有的租户
create resource unit u1 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;

create resource pool p1_1 unit='u1',zone_list=('zone1'),unit_num=1;
create resource pool p1_2 unit='u1',zone_list=('zone2'),unit_num=1;
create resource pool p1_3 unit='u1',zone_list=('zone3'),unit_num=1;

create tenant test1 resource_pool_list=('p1_1','p1_2','p1_3'),
primary_zone='zone1,zone2,zone3',locality='F@zone1, F@zone2, F@zone3',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';

mysql -h127.0.0.1  -P12881 -uroot@test1 -p -A
alter user root identified by 'xxx';

扩展 zone4 供列存副本使用

参考 obd 集群扩容: https://www.oceanbase.com/docs/community-obd-cn-1000000001477803

oceanbase-ce:
  servers:
    - name: server4
      ip: 11.xxx.xxx.192
  server4:
    zone: zone4
    obshell_port: 45881
    mysql_port: 42881
    rpc_port: 42882
    local_ip: 11.xxx.xxx.192
    home_path: /home/heshun.lxd/observer4
    data_dir: /obdata/data/data4
    redo_dir: /obdata/log/log4
obd cluster scale_out ob433 -c ob433_scale_out_zone4.yaml -v

扩容后的集群拓扑

MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP         | SVR_PORT | ID | ZONE  | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME         | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME                | MODIFY_TIME                | BUILD_VERSION                                                                             | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 |    12882 |  1 | zone1 |    12881 | YES             | ACTIVE | 2024-11-04 10:27:09.942001 | NULL      | NULL                  | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.191 |    22882 |  2 | zone2 |    22881 | NO              | ACTIVE | 2024-11-04 10:28:31.472704 | NULL      | NULL                  | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    32882 |  3 | zone3 |    32881 | NO              | ACTIVE | 2024-11-04 10:29:29.111769 | NULL      | NULL                  | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
| 11.xxx.xxx.192 |    42882 |  4 | zone4 |    42881 | NO              | ACTIVE | 2024-11-04 11:48:24.538274 | NULL      | NULL                  | 2024-11-04 11:09:44.030541 | 2024-11-04 11:48:26.306543 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL              |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
4 rows in set (0.00 sec)

给已有的租户扩列存副本

1、扩容前租户副本分布

MySQL [oceanbase]>  select tenant_id,tenant_name,primary_zone,locality  from dba_ob_tenants where tenant_type='user';
+-----------+-------------+-------------------+---------------------------------------------+
| tenant_id | tenant_name | primary_zone      | locality                                    |
+-----------+-------------+-------------------+---------------------------------------------+
|      1010 | test1       | zone1,zone2,zone3 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 |
+-----------+-------------+-------------------+---------------------------------------------+
1 row in set (0.03 sec)

2、在增加副本之前,需要确认租户在目标 zone 上是否有资源池,并记录好当前该租户在各 zone 上的资源池名。

MySQL [oceanbase]> select * from dba_ob_resource_pools where tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1');
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME                | MODIFY_TIME                | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
|             1008 | p1_1 |      1010 | 2024-11-04 11:01:36.377693 | 2024-11-04 11:02:00.918615 |          1 |           1004 | zone1     | FULL         |
|             1009 | p1_2 |      1010 | 2024-11-04 11:01:36.395700 | 2024-11-04 11:02:01.221993 |          1 |           1004 | zone2     | FULL         |
|             1010 | p1_3 |      1010 | 2024-11-04 11:01:36.410597 | 2024-11-04 11:02:01.224139 |          1 |           1004 | zone3     | FULL         |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
3 rows in set (0.02 sec)

3、确认各 resource pool 使用的 unit ,和 dba_ob_resource_pools 的 unit_config_id 进行关联

MySQL [oceanbase]> select * from dba_ob_unit_configs;
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
| UNIT_CONFIG_ID | NAME            | CREATE_TIME                | MODIFY_TIME                | MAX_CPU | MIN_CPU | MEMORY_SIZE | LOG_DISK_SIZE | DATA_DISK_SIZE | MAX_IOPS            | MIN_IOPS            | IOPS_WEIGHT | MAX_NET_BANDWIDTH   | NET_BANDWIDTH_WEIGHT |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
|              1 | sys_unit_config | 2024-10-22 20:07:12.701353 | 2024-10-22 20:07:12.701353 |       2 |       2 |  2147483648 |    3221225472 |           NULL | 9223372036854775807 | 9223372036854775807 |           2 | 9223372036854775807 |                    2 |
|           1004 | u1              | 2024-11-04 11:01:30.256177 | 2024-11-04 11:01:30.256177 |       3 |       3 |  4294967296 |   12884901888 |           NULL |               10000 |               10000 |           0 | 9223372036854775807 |                    3 |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
2 rows in set (0.01 sec)

4、给 test1 租户在 zone4 上创建 resource pool

create resource pool p1_4 unit='u1' ,unit_num=1,zone_list=('zone4');

5、修改 test1 租户的 resource_pool_list

alter tenant test1 resource_pool_list=('p1_1','p1_2','p1_3','p1_4');

6、修改 test1 租户的 locality

alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4';

7、确认 test1 租户 locality 修改情况

select * from dba_ob_tenant_jobs  
where job_type='alter_tenant_locality' 
and tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1')
order by start_time desc limit 1 \G
*************************** 1. row ***************************
     JOB_ID: 2
   JOB_TYPE: ALTER_TENANT_LOCALITY
 JOB_STATUS: SUCCESS
RESULT_CODE: 0
   PROGRESS: 100
 START_TIME: 2024-11-04 12:01:55.851907
MODIFY_TIME: 2024-11-04 12:02:26.819124
  TENANT_ID: 1010
   SQL_TEXT: alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4'
 EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3', TO: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3, COLUMNSTORE{1}@zone4'
  RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
1 row in set (0.02 sec)

新建租户时创建列存副本

create resource unit u2 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;

create resource pool p2_1 unit='u2',zone_list=('zone1'),unit_num=1;
create resource pool p2_2 unit='u2',zone_list=('zone2'),unit_num=1;
create resource pool p2_3 unit='u2',zone_list=('zone3'),unit_num=1;
create resource pool p2_4 unit='u2',zone_list=('zone4'),unit_num=1;

create tenant test2 
resource_pool_list=('p2_1','p2_2','p2_3','p2_4'),
primary_zone='zone1,zone2,zone3;zone4',
locality='F@zone1, F@zone2, F@zone3, C@zone4',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';

mysql -h127.0.0.1  -P12881 -uroot@test2 -p -A
alter user root identified by 'xxx';

配置 obproxy

使用 root@proxysys 登录对应的 obproxy 

独占的 obproxy

给列存副本单独创建一个 obproxy 并登录后进行如下配置

alter proxyconfig set obproxy_read_consistency='1';
alter proxyconfig set init_sql = 'set @@ob_route_policy="COLUMN_STORE_ONLY";';
共享的 obproxy

没有独立的机器资源供列存副本使用,需要复用已有的 obproxy环境,此时可以设置 obproxy 多级配置,关于 obproxy 的多级配置可以详见 官网文档:

https://www.oceanbase.com/docs/common-odp-doc-cn-1000000001409917

replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');

replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');

访问列存副本测试

使用如上配置的 obproxy 登录测试

# sys 租户
MySQL [oceanbase]> select zone,tenant_id,name,value,default_value from gv$ob_parameters where tenant_id=1010 and name='default_table_store_format';
+-------+-----------+----------------------------+-------+---------------+
| zone  | tenant_id | name                       | value | default_value |
+-------+-----------+----------------------------+-------+---------------+
| zone1 |      1010 | default_table_store_format | row   | row           |
| zone4 |      1010 | default_table_store_format | row   | row           |
| zone3 |      1010 | default_table_store_format | row   | row           |
| zone2 |      1010 | default_table_store_format | row   | row           |
+-------+-----------+----------------------------+-------+---------------+
4 rows in set (0.03 sec)

# test1 租户
MySQL [test]> show create table t1 \G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL
) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0
 partition by hash(id)
(partition `p0`,
partition `p1`,
partition `p2`)
1 row in set (0.01 sec)


MySQL [test]> explain select * from t1;
+----------------------------------------------------------------------+
| Query Plan                                                           |
+----------------------------------------------------------------------+
| ================================================================     |
| |ID|OPERATOR                    |NAME    |EST.ROWS|EST.TIME(us)|     |
| ----------------------------------------------------------------     |
| |0 |PX COORDINATOR              |        |1       |7           |     |
| |1 |└─EXCHANGE OUT DISTR        |:EX10000|1       |7           |     |
| |2 |  └─PX PARTITION ITERATOR   |        |1       |7           |     |
| |3 |    └─COLUMN TABLE FULL SCAN|t1      |1       |7           |     |
| ================================================================     |
| Outputs & filters:                                                   |
| -------------------------------------                                |
|   0 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16     |
|   1 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16     |
|       dop=1                                                          |
|   2 - output([t1.id]), filter(nil), rowset=16                        |
|       force partition granule                                        |
|   3 - output([t1.id]), filter(nil), rowset=16                        |
|       access([t1.id]), partitions(p[0-2])                            |
|       is_index_back=false, is_global_index=false,                    |
|       range_key([t1.__pk_increment]), range(MIN ; MAX)always true    |
+----------------------------------------------------------------------+
19 rows in set (0.01 sec)
  • 表结构没有 with column group ,default_table_store_format 是默认的行存,执行计划展示上显示 COLUMN TABLE FULL SCAN,说明使用到了列存的范围扫描。
  • 这里的测试表 t1 是在 test1 租户下测试的,该租户的拓扑 3F-1C ,有4个副本,但是在 show create table 和 show create tenant 结果中 replica_num都等于3,使用的是全功能副本的数量。

注意事项

1、observer 需要 4.3.3.0 及其之上的版本。

2、ocp 需要 4.3.3 及其之上的版本(当前还没有发布ocp 4.3.3)。

3、obd 需要 2.10.1-1 及其之上的版本。

4、obproxy 需要 4.3.2 及其之上的版本。

5、不建议部署 2 个及以上数目的列存副本。

6、全功能和只读副本不支持转为列存副本,列存副本也不支持转为全功能和只读副本。

7、物理恢复不支持恢复列存副本。

8、如果主库未部署列存副本,备库也不建议部署列存副本。

9、列存表是指表的分区 Leader & Follower 的 Schema 均为列存格式,查询可以是强读;

列存副本是在保证表的分区 Leader & Follower 的 Schema 为行存格式的前提下,只读副本 Learner 为列存格式,并且 OLAP 的查询只能是弱读。

其他详见官网文档:

列存副本

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001428590

Logo

了解最新的技术洞察和前沿趋势,参与 OceanBase 定期举办的线下活动,与行业开发者互动交流

更多推荐