Hive 安装

1 安装说明

在安装 hive 之前,需要先安装 hadoop 集群环境,如果没有可以先安装。

在启动 hive 之前,需要先启动 hadoop 集群。

文章末尾有部分错误的解决办法。

1.1 用到的软件

软件 版本 下载地址
hadoop hadoop-2.7.1 http://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
java jdk-8u211-linux-x64 https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
hive hive-1.2.1 https://archive.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz
mysql-connector-java mysql-connector-java-5.1.45.jar 命令行安装
postgresql-jdbc4 postgresql-jdbc4.jar 命令行安装

1.2 节点

名称 ip hostname
主节点 *.*.105.64 Master
子节点1 *.*.104.78 Slave01
子节点2 *.*.104.224 Slave02

1.3 说明

注意:本文的hiveMySQLPostgreSQL均只安装在Master节点上,实际生产环境中,需根据实际情况调整,如果有必要,需要在所有服务器安装

Hive默认元数据保存在内嵌的 Derby 数据库中,这是最简单的一种存储方式,使用derby存储方式时,运行hive会在当前目录生成一个derby文件和一个metastore_db目录。Derby数据库中,只能允许一个会话连接,只适合简单的测试,实际生产环境中不适用。 为了支持多用户会话,则需要一个独立的元数据库,使用 MySQL 或者PostgreSQL作为元数据库,Hive 内部对 MySQLPostgreSQL提供了很好的支持。

本文将逐一介绍hive连接DerbyPostgreSQLMySQL这三种数据库数据库的安装和配置。

2.安装

2.1 下载解压

1
2
3
4
$ wget https://archive.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz
$ tar -zxvf apache-hive-1.2.1-bin.tar.gz
$ mv apache-hive-1.2.1-bin /usr/local/hive-1.2.1
$ sudo chown -R hadoop:hadoop hive

2.2 修改配置文件

要修改的文件在/usr/local/hive-1.2.1/conf目录下,需要修改hive-site.xmlhive-env.shhive-log4j2.properties这3个文件。

先把.template文件复制一份出来,然后进行修改。

1
2
3
4
$ cd /usr/local/hive-1.2.1/conf
$ cp hive-default.xml.template hive-site.xml
$ cp hive-env.sh.template hive-env.sh
$ cp hive-log4j.properties.template hive-log4j.properties

2.2.1 修改配置文件 hive-site.xml

hive-site.xml文件开头加入如下配置,因为后续的配置要用到 system:java.io.tmpdir、system:user.name 这两个变量,所以提前加入。

1
2
3
4
5
6
7
8
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/local/hive-1.2.1/iotmp</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>

2.2.2 修改环境配置文件 hive-env .sh

在最后添加:

1
2
export HADOOP_HOME=/usr/local/bigdata/hadoop-2.7.1
export HIVE_CONF_DIR=/usr/local/bigdata/hive-1.2.1/conf

2.2.3 hive-log4j2.properties

日志配置可以先默认,暂时不修改什么。

2.3 配置环境变量

~/.bashrc文件中添加如下内容,执行source ~/.bashrc使其生效。

1
2
export HIVE_HOME=/usr/local/bigdata/hive-1.2.1
export PATH=$PATH:/usr/local/bigdata/hive-1.2.1/bin

2.4 为hive创建数据仓库存储目录

因为在hive-site.xml中有这样的配置:

1
2
3
4
5
6
7
8
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
</property>

所以要让hadoop新建/user/hive/warehouse目录.

注意先启动hadoop集群

1
2
3
4
$ hadoop fs -mkdir -p /user/hive/warehouse
$ hadoop fs -mkdir -p /user/hive/tmp
$ hadoop fs -chmod g+w /user/hive/warehouse
$ hadoop fs -chmod g+w /user/hive/tmp

3 hive连接Derby

3.1 hive-site.xml(Derby)

配置Derby只需要修改javax.jdo.option.ConnectionURL指定metastore_db的存储位置即可
具体修改如下:

1
2
3
4
5
6
7
8
9
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/usr/local/hive-1.2.1/metastore/metastore_db;create=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>

3.2 启动hive

初始化元数据数据库

1
$ schematool -initSchema -dbType derby

成功初始化应该出现如下内容:

1
2
3
4
5
6
7
8
$ schematool -initSchema -dbType derby
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 1.2.0
Initialization script hive-schema-1.2.0.derby.sql
Initialization script completed
schemaTool completed

启动hive

1
$ hive

如果成功运行将出现如下内容:

1
2
3
4
5
6
$ hive
Logging initialized using configuration in file:/usr/local/hive-1.2.1/conf/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.993 seconds, Fetched: 1 row(s)

创建表

1
2
3
4
5
6
7
8
9
10
11
create table t1(
id int
,name string
,hobby array<string>
,add map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
;

至此,以Derby做元数据库的hive连接方式就配置完成了。

下面介绍如何将hive连接到PostgreSQLMySQL

4 PostgreSQL的安装

4.1 安装

执行如下命令:

1
$ sudo apt install postgresql postgresql-contrib

安装完成后默认会有一个postgres的用户,且没有密码,作为管理员

4.2 启动PostgreSQL

1
2
$ sudo systemctl enable postgresql
$ sudo systemctl start postgresql

4.3 登录

1
2
3
4
5
6
7
8
9
10
11
12
13
hadoop@Master:~$ sudo -i -u postgres
postgres@Master:~$ psql
psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
Type "help" for help.

postgres=# help
You are using psql, the command-line interface to PostgreSQL.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
postgres=#

4 hive连接PostgreSQL

4.1 安装PostgreSQL-JDBC驱动

1
2
$ sudo apt-get install libpostgresql-jdbc-java
$ ln -s /usr/share/java/postgresql-jdbc4.jar /usr/local/bigdata/hive-1.2.1/lib

4.2 修改pg_hba.conf文件

修改 /etc/postgresql/10/main/pg_hba.conf文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Database administrative login by Unix domain socket
#local all postgres peer
local all postgres trust

# TYPE DATABASE USER ADDRESS METHOD

# "local" is for Unix domain socket connections only
#local all all peer
local all all trust
# IPv4 local connections:
#host all all 127.0.0.1/32 md5
host all all 127.0.0.1/32 trust
# IPv6 local connections:
#host all all ::1/128 md5
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local replication all peer
#local replication all peer
#local replication all peer
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust

4.3 在PostpreSQL中创建数据库和用户

先创建一个名为hiveuser的用户,密码:123456

然后创建一个名为metastore的数据库:

1
2
3
4
$ sudo -u postgres psql 

postgres=# CREATE USER hiveuser WITH PASSWORD '123456';
postgres=# CREATE DATABASE metastore;

测试用户和数据库是否能登录

1
$ psql -h localhost -U hiveuser -d pymetastore

登录成功说明配置完成

1
2
3
4
5
6
7
hadoop@Master:~$  psql -h localhost -U hiveuser -d metastore
Password for user hive:
psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

pymetastore=>

4.5 修改hive-site.xml(PostgreSQL)

之前配置的是以Derby做元数据库,现在同样也是修改hive-site.xml文件。
首先在开头添加如下内容:

1
2
3
4
5
6
7
8
 <property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>

然后修改如下属性:

name value description
javax.jdo.option.ConnectionURL jdbc:postgresql://localhost/metastore 指定连接的数据库(之前创建的)
javax.jdo.option.ConnectionDriverName org.postgresql.Driver 数据库驱动
javax.jdo.option.ConnectionUserName hiveuser 用户名(之前创建的)
javax.jdo.option.ConnectionPassword 123456 用户名密码

具体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost/metastore</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>Username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>

4.6 启动Hive

先运行schematool进行初始化:

1
schematool -dbType postgres -initSchema

然后执行$ hive 启动hive。

创建表格进行测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
hadoop@Master:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-1.2.1/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/usr/local/bigdata/hive-1.2.1/conf/hive-log4j2.properties Async: true
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/bigdata/hadoop-2.7.7/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
> show databases;
OK
default
Time taken: 12.294 seconds, Fetched: 1 row(s)
hive> create table t1(
> id int
> ,name string
> ,hobby array<string>
> ,add map<String,string>
> )
> row format delimited
> fields terminated by ','
> collection items terminated by '-'
> map keys terminated by ':'
> ;
OK
Time taken: 1.239 seconds
hive> Connection reset by 192.168.233.200 port 22

5 MySQL安装

5.1 安装

1
$ sudo apt install mysql-server

5.2 设置MySQL的root用户密码

如果没有设置密码的话,设置密码。

这里密码设置为hadoop

1
$ mysql -u root -p

6 Hive连接MySQL

6.1 在MySQL中为Hive新建数据库

用来存放Hive的元数据。

与Hive配置文件hive-site.xml中的 mysql://localhost:3306/metastore 对应

1
2
3
4
5
6
7
8
9
10
11
#建立数据库和用户
mysql> create database if not exists metastore;
mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY '123456';

#设置远程登录的权限
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hiveuser'@'localhost';
mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hiveuser'@'localhost';

#刷新配置
mysql> FLUSH PRIVILEGES;
mysql> quit;

6.2 安装MySQL-JDBC驱动

1
2
$ sudo apt-get install libmysql-java
$ ln -s /usr/share/java/mysql-connector-java-5.1.45.jar /usr/local/hive-1.2.1/lib

或者直接下载,并移动到 /lib 目录下

1
2
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.48/mysql-connector-java-5.1.48.jar
mv mysql-connector-java-5.1.48.jar /usr/local/hive-1.2.1/lib

6.3 修改修改hive-site.xml(MySQL)

修改如下属性:

name value description
javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/metastore?useSSL=true 指定连接的数据库(之前创建的)
javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver 数据库驱动
javax.jdo.option.ConnectionUserName hiveuser 用户名(之前创建的)
javax.jdo.option.ConnectionPassword 123456 用户名密码

具体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore?useSSL=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>Username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>

6.4 启动hive

先初始化

1
schematool -dbType mysql -initSchema

和前面一样,执行

1
$ hive

6.5 分发Hive分别到slave1,slave2上

1
2
scp -r /usr/local/apache-hive-1.2.1-bin slave1:/usr/local/
scp -r /usr/local/apache-hive-1.2.1-bin slave2:/usr/local/

配置环境变量如同master。

7 问题总结

问题1

初始化derby时报如下错误,提示没有hive-exec-*.jar

1
2
hadoop@Master:~$ schematool -initSchema -dbType derby
Missing Hive Execution Jar: /usr/local/biddata/hive-1.2.1/lib/hive-exec-*.jar
解决:

检查该目录下是否确实不存在hive-exec-*.jar,如果不存在,下载一个放到该目录下。
下载地址:[https://mvnrepository.com/artifact/org.apache.hive/hive-exec/1.2.1]
如果存在,那一定是环境变量配置有问题,查看HIVE_HOME$HIVE_HOME/bin是否配置正确。

问题2

解决

下面两个错均可用这个办法。在hive-site.xml文件开头加入如下配置:

1
2
3
4
5
6
7
8
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/local/hive-1.2.1/iotmp</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>

报错如下:

1
2
Logging initialized using configuration in file:/usr/local/hive-1.2.1/conf/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D

解决办法:

1
sed -i 's#${system:java.io.tmpdir}#/usr/local/hive-1.2.1/iotmp#g' conf/hive-site.xml

报错如下:

1
2
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D
Time taken: 0.01 seconds

解决办法:

1
sed -i 's#${system:user.name}#hive#g' conf/hive-site.xml

问题3

执行$ schematool -dbType postgres -initSchema时报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
hadoop@Master:~$ schematool -dbType postgres -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-1.2.1/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:postgresql://localhost/pymetastore
Metastore Connection Driver : org.postgresql.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.postgres.sql
Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

另外也会有这个错:

1
2
3
4
5
Error: ERROR: relation "txns" already exists (state=42P07,code=0)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

这个问题,我尝试了很久也没有找到原因,网上有说是hive版本的原因,我换了hive-1.2.1hive-1.2.2 等低版本的hive,依然时候有这个问题。
最后是重新创建用户和数据库就没有这个问题了,感觉是数据库有冲突。

问题4

1
2
3
4
5
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
解决:

注意使用MySQL存储元数据的时候,使用root用户有可能权限不够,会报错。另外,$ schematool -dbType postgres -initSchema执行一次就好了。


Hive 安装
https://flepeng.github.io/Hadoop-Hive-安装/
作者
Lepeng
发布于
2021年3月8日
许可协议