Hive
的元数据需要保存在 MySQL
,先安装 MySQL
。
MySQL
安装 mysql
yum -y install mysql-server mysql
启动 mysql
/etc/init.d/mysqld start
chkconfig mysqld on
设置 mysql
的 root
用户密码
mysqladmin -u root password 123456
root
用户登陆 mysql
mysql -uroot -p123456
设置 mysql
权限
use mysql;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY 'v123456' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'127.0.0.1' IDENTIFIED BY '123456' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION;
update user set password=password('123456') where user='root';
delete from user where not (user='root') ;
delete from user where user='root' and password='';
drop database test;
DROP USER ''@'%';
flush privileges;
Hive
http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.15.1.tar.gz 下载地址
解压设置换变量不说了,直接看配置
拷贝 mysql jar
hive
需要使用 mysql
,先拷贝 jar
(mysql-connector-java-5.1.45-bin.jar
,自行下载) 到 hive
的 lib
下
配置文件
hive-env.sh
cp hive-env.sh.template hive-env.sh
vi hive-env.sh
# 配置 hadoop home
HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.15.1
hive-site.xml
vi hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!-- 设置范文 mysql 账密-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
</configuration>
启动
执行 hive
即可
hive -hiveconf hive.root.logger=DEBUG,console // 输出更多日志
元数据表
元数据是存在 MySQL
表中(本文设置是在hive
数据库中),总计有 32 张表,这里只挑选其中几个来讲解
VERSION
查询 Hive 版本信息
mysql> select * from VERSION;
+--------+-----------------+-------------------+----------------------------------------+
| VER_ID | SCHEMA_VERSION | SCHEMA_VERSION_V2 | VERSION_COMMENT |
+--------+-----------------+-------------------+----------------------------------------+
| 1 | 1.1.0-cdh5.15.1 | NULL | Set by MetaStore hadoop@192.168.22.147 |
+--------+-----------------+-------------------+----------------------------------------+
DBS
Hive
中所有数据库信息
mysql> select * from DBS;
+-------+-----------------------+-------------------------------------------------------------+-----------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+-------------------------------------------------------------+-----------+------------+------------+
| 1 | Default Hive database | hdfs://192.168.22.147:9000/user/hive/warehouse | default | public | ROLE |
| 6 | NULL | hdfs://192.168.22.147:9000/user/hive/warehouse/danner_db.db | danner_db | hadoop | USER |
+-------+-----------------------+-------------------------------------------------------------+-----------+------------+------------+
用户 hadoop
创建danner_db
数据库,数据存放在 HDFS
,目录是 /user/hive/warehouse/danner_db.db
DATABASE_PARAMS
存储数据库的相关参数,在
CREATE DATABASE
时候用
元数据表字段 | 说明 | 示例数据 |
---|---|---|
DB_ID | 数据库 ID | 2 |
PARAM_KEY | 参数名 | createdby |
PARAM_VALUE | 参数值 | danner |
注:DBS 和 DATABASE_PARAMS 这两张表通过 DB_ID
字段关联
TBLS
存储 Hive 表、视图、索引表的基本信息
mysql> select * from TBLS;
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+----------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+----------------+--------------------+--------------------+
| 1 | 1568797726 | 6 | 0 | hadoop | 0 | 1 | access_wide | EXTERNAL_TABLE | NULL | NULL |
| 6 | 1568987510 | 6 | 0 | hadoop | 0 | 16 | domain_traffic | MANAGED_TABLE | NULL | NULL |
| 11 | 1569026323 | 1 | 0 | hadoop | 0 | 21 | uid_pid | MANAGED_TABLE | NULL | NULL |
| 16 | 1569029401 | 6 | 0 | hadoop | 0 | 26 | uid_pid | MANAGED_TABLE | NULL | NULL |
| 23 | 1569419500 | 6 | 0 | hadoop | 0 | 35 | platform_stat | EXTERNAL_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+----------------+--------------------+--------------------+
TABLE_PARAMS
存储表/视图的属性信息
mysql> select * from TABLE_PARAMS;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY | PARAM_VALUE |
+--------+-----------------------+-------------+
| 1 | EXTERNAL | TRUE |
| 1 | transient_lastDdlTime | 1568797726 |
| 6 | COLUMN_STATS_ACCURATE | true |
| 6 | numFiles | 1 |
| 6 | numRows | 0 |
| 6 | rawDataSize | 0 |
| 6 | totalSize | 162 |
| 6 | transient_lastDdlTime | 1568989802 |
| 11 | COLUMN_STATS_ACCURATE | true |
| 11 | numFiles | 1 |
| 11 | numRows | 0 |
| 11 | rawDataSize | 0 |
| 11 | totalSize | 121 |
| 11 | transient_lastDdlTime | 1569027477 |
| 16 | COLUMN_STATS_ACCURATE | true |
| 16 | numFiles | 1 |
| 16 | totalSize | 121 |
| 16 | transient_lastDdlTime | 1569029465 |
| 23 | EXTERNAL | TRUE |
| 23 | transient_lastDdlTime | 1569419500 |
+--------+-----------------------+-------------+
20 rows in set (0.00 sec)
TBL_PRIVS
存储表/视图的授权信息
元数据表字段 | 说明 | 示例数据 |
---|---|---|
TBL_GRANT_ID | 授权 ID | 1 |
CREATE_TIME | 授权时间 | 1436320455 |
GRANT_OPTION | 0 | |
GRANTOR | 授权执行用户 | hadoop |
GRANTOR_TYPE | 授权者类型 | USER |
PRINCIPAL_NAME | 被授权用户 | username |
PRINCIPAL_TYPE | 被授权用户类型 | USER |
TBL_PRIV | 权限 | Select、Alter |
TBL_ID | 表 ID | 22,对应TBLS表中的TBL_ID |
其他相关
- 存储信息相关
SDS
:保存文件存储的基本信息,如 INPUT_FORMAT、OUTPUT_FORMAT、是否压缩等SD_PARAMS
:存储 Hive 存储的属性信息,在创建表时候使用SERDES
:存储序列化使用的类信息SERDE_PARAMS
:存储序列化的一些属性、格式信息,比如:行、列分隔符
- 字段相关
COLUMNS_V2
:存储表对应的字段信息
- 分区相关
PARTITIONS
:存储表分区的基本信息PARTITION_KEYS
:存储分区的字段信息PARTITION_KEY_VALS
:存储分区字段值PARTITION_PARAMS
:存储分区的属性信息
- 不常用
DB_PRIVS
:数据库权限信息表。通过GRANT
语句对数据库授权后,将会在这里存储。IDXS
:索引表,存储 Hive 索引相关的元数据INDEX_PARAMS
:索引相关的属性信息TAB_COL_STATS
:表字段的统计信息。使用ANALYZE
语句对表字段分析后记录在这里TBL_COL_PRIVS
:表字段的授权信息PART_PRIVS
:分区的授权信息PART_COL_STATS
:分区字段的统计信息PART_COL_PRIVS
:分区字段的权限信息FUNCS
:用户注册的函数信息FUNC_RU
:用户注册函数的资源信息
参考资料
Hive学习之路 (三)Hive元数据信息对应MySQL数据库表](https://www.cnblogs.com/qingyunzong/p/8710356.html)