Tez引擎
概念
- Tez 是一个Hive 的运行引擎,性能优于 MR。Tez 可以将多个有依赖的作业转换为一个作业,这样只需写一次HDFS,且中间节点较少,从而大大提升作业的计算性能。
安装Tez引擎
- 将tez 安装包拷贝到集群,并解压tar 包
mkdir /opt/module/tez
tar -zxvf /opt/software/tez-0.10.1-SNAPSHOT-minimal.tar.gz -C /opt/module/tez
- 上传tez 依赖到HDFS
hadoop fs -mkdir /tez
hadoop fs -put /opt/software/tez-0.10.1-SNAPSHOT.tar.gz /tez
- 新建tez-site.xml
vim $HADOOP_HOME/etc/hadoop/tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/tez/tez-0.10.1-SNAPSHOT.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>tez.am.resource.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>tez.am.resource.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>tez.container.max.java.heap.fraction</name>
<value>0.4</value>
</property>
<property>
<name>tez.task.resource.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>tez.task.resource.cpu.vcores</name>
<value>1</value>
</property>
</configuration>
- 修改Hadoop 环境变量
vim $HADOOP_HOME/etc/hadoop/shellprofile.d/tez.sh
- 添加Tez 的 Jar 包相关信息
hadoop_add_profile tez
function _tez_hadoop_classpath
{
hadoop_add_classpath "$HADOOP_HOME/etc/hadoop" after
hadoop_add_classpath "/opt/module/tez/*" after
hadoop_add_classpath "/opt/module/tez/lib/*" after
}
- 修改Hive 的计算引擎
vim $HIVE_HOME/conf/hive-site.xml
- 添加
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
<property>
<name>hive.tez.container.size</name>
<value>1024</value>
</property>
- 解决日志Jar 包冲突
rm /opt/module/tez/lib/slf4j-log4j12-1.7.10.jar
安装TezUi
-
下载tomcat
- tomcat官网:https://tomcat.apache.org/download-90.cgi
-
解压tomcat
tar -zxvf apache-tomcat-9.0.76.tar.gz -C /opt/module/
-
在 webapp下创建 tez-ui 目录
mkdir /opt/module//apache-tomcat-9.0.76/webapps/tez-ui
-
在 tez 目录下找到 tez-ui-0.9.2.war,放到tomcat下
jar -xvf tez-ui-0.10.1.war
-
修改解压后的config/configs.env 文件,配置读取yarn的timeline的端口
ENV = { hosts: { /* * Timeline Server Address: * By default TEZ UI looks for timeline server at http://localhost:8188, uncomment and change * the following value for pointing to a different address. */ //timeline: "http://localhost:8188", timeline: "http://hadoop102:8188", /* * Resource Manager Address: * By default RM REST APIs are expected to be at http://localhost:8088, uncomment and change * the following value to point to a different address. */ //rm: "http://localhost:8088", rm: "http://hadoop103:8088", /* * Resource Manager Web Proxy Address: * Optional - By default, value configured as RM host will be taken as proxy address * Use this configuration when RM web proxy is configured at a different address than RM. */ //rmProxy: "http://localhost:8088", },
-
配置yarn-site文件,并启动timelineserver
- 在 yarn-site.xml 中添加如下参数(记得修改主机名),分发节点后重启yarn
<!-- conf timeline server --> <property> <name>yarn.timeline-service.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.hostname</name> <value>tmaster</value> </property> <property> <name>yarn.timeline-service.http-cross-origin.enabled</name> <value>true</value> </property> <property> <name> yarn.resourcemanager.system-metrics-publisher.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.generic-application-history.enabled</name> <value>true</value> </property> <property> <description>Address for the Timeline server to start the RPC server.</description> <name>yarn.timeline-service.address</name> <value>hadoop102:10201</value> </property> <property> <description>The http address of the Timeline service web application.</description> <name>yarn.timeline-service.webapp.address</name> <value>hadoop102:8188</value> </property> <property> <description>The https address of the Timeline service web application.</description> <name>yarn.timeline-service.webapp.https.address</name> <value>hadoop102:2191</value> </property> <property> <name>yarn.timeline-service.handler-thread-count</name> <value>24</value> </property>
-
启动 timelineserver
yarn --daemon start timelineserver
-
在 $HADOOP_HOME/etc/hadoop 目录下,创建 tez-site.xml
- 在 $HADOOP_HOME/etc/hadoop 目录下,创建 tez-site.xml,添加如下配置(记得分发节点),tez-ui地址对应tomcat下的tez-ui路径
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/tez/tez-0.10.1-SNAPSHOT.tar.gz</value> </property> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> <property> <name>tez.am.resource.memory.mb</name> <value>1024</value> </property> <property> <name>tez.am.resource.cpu.vcores</name> <value>1</value> </property> <property> <name>tez.container.max.java.heap.fraction</name> <value>0.4</value> </property> <property> <name>tez.task.resource.memory.mb</name> <value>1024</value> </property> <property> <name>tez.task.resource.cpu.vcores</name> <value>1</value> </property> <property> <name>tez.history.logging.service.class</name> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value> </property> <property> <name>tez.tez-ui.history-url.base</name> <value>http://hadoop102:8080/tez-ui/</value> </property> </configuration>
-
在hive-site.xml文件下添加如下参数,不配置的话,tez-ui中的 All Queries 不会显示数据
<property> <name>hive.exec.pre.hooks</name> <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value> </property> <property> <name>hive.exec.post.hooks</name> <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value> </property> <property> <name>hive.exec.failure.hooks</name> <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value> </property>
-
启动 tomcat ,访问 tez-ui 界面
- 进入tomcat/bin目录下,启动tomcat
sh startup.sh
-
tez-ui界面地址(tomcat地址加/tez-ui后缀):http://hadoop102:8080/tez-ui