Tez引擎

概念

  • Tez 是一个Hive 的运行引擎,性能优于 MR。Tez 可以将多个有依赖的作业转换为一个作业,这样只需写一次HDFS,且中间节点较少,从而大大提升作业的计算性能。

安装Tez引擎

  1. 将tez 安装包拷贝到集群,并解压tar 包
mkdir /opt/module/tez
tar -zxvf /opt/software/tez-0.10.1-SNAPSHOT-minimal.tar.gz -C /opt/module/tez 
  1. 上传tez 依赖到HDFS
hadoop fs -mkdir /tez
hadoop fs -put /opt/software/tez-0.10.1-SNAPSHOT.tar.gz /tez 
  1. 新建tez-site.xml
vim $HADOOP_HOME/etc/hadoop/tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/tez/tez-0.10.1-SNAPSHOT.tar.gz</value>
</property>
<property>
    <name>tez.use.cluster.hadoop-libs</name>
    <value>true</value>
</property>
<property>
    <name>tez.am.resource.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>tez.am.resource.cpu.vcores</name>
    <value>1</value>
</property>
<property>
    <name>tez.container.max.java.heap.fraction</name>
    <value>0.4</value>
</property>
<property>
    <name>tez.task.resource.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>tez.task.resource.cpu.vcores</name>
    <value>1</value>
</property>
</configuration>
  1. 修改Hadoop 环境变量
vim $HADOOP_HOME/etc/hadoop/shellprofile.d/tez.sh 
  • 添加Tez 的 Jar 包相关信息
hadoop_add_profile tez 
function _tez_hadoop_classpath 
{
     hadoop_add_classpath "$HADOOP_HOME/etc/hadoop" after 
      hadoop_add_classpath "/opt/module/tez/*" after 
       hadoop_add_classpath "/opt/module/tez/lib/*" after 
}
  1. 修改Hive 的计算引擎
vim $HIVE_HOME/conf/hive-site.xml
  • 添加
<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>
<property>
    <name>hive.tez.container.size</name>
    <value>1024</value>
</property>
  1. 解决日志Jar 包冲突
rm /opt/module/tez/lib/slf4j-log4j12-1.7.10.jar 

安装TezUi

  1. 下载tomcat

    • tomcat官网:https://tomcat.apache.org/download-90.cgi
  2. 解压tomcat

    tar -zxvf apache-tomcat-9.0.76.tar.gz -C /opt/module/
    
  3. 在 webapp下创建 tez-ui 目录

    mkdir /opt/module//apache-tomcat-9.0.76/webapps/tez-ui
    
  4. 在 tez 目录下找到 tez-ui-0.9.2.war,放到tomcat下

    jar -xvf tez-ui-0.10.1.war
    
  5. 修改解压后的config/configs.env 文件,配置读取yarn的timeline的端口

    ENV = {
      hosts: {
        /*
         * Timeline Server Address:
         * By default TEZ UI looks for timeline server at http://localhost:8188, uncomment and change
         * the following value for pointing to a different address.
         */
        //timeline: "http://localhost:8188",
        timeline: "http://hadoop102:8188",
        /*
         * Resource Manager Address:
         * By default RM REST APIs are expected to be at http://localhost:8088, uncomment and change
         * the following value to point to a different address.
         */
        //rm: "http://localhost:8088",
        rm: "http://hadoop103:8088",
        /*
         * Resource Manager Web Proxy Address:
         * Optional - By default, value configured as RM host will be taken as proxy address
         * Use this configuration when RM web proxy is configured at a different address than RM.
         */
        //rmProxy: "http://localhost:8088",
      },
    
  6. 配置yarn-site文件,并启动timelineserver

    • 在 yarn-site.xml 中添加如下参数(记得修改主机名),分发节点后重启yarn
    <!-- conf timeline server -->
        <property>
            <name>yarn.timeline-service.enabled</name>
            <value>true</value>
       </property>
       <property>
            <name>yarn.timeline-service.hostname</name>
            <value>tmaster</value>
       </property>
       <property>
            <name>yarn.timeline-service.http-cross-origin.enabled</name>
            <value>true</value>
       </property>
       <property>
            <name> yarn.resourcemanager.system-metrics-publisher.enabled</name>
            <value>true</value>
       </property>
       <property>
            <name>yarn.timeline-service.generic-application-history.enabled</name>
            <value>true</value>
       </property>
       <property>
            <description>Address for the Timeline server to start the RPC server.</description>
            <name>yarn.timeline-service.address</name>
            <value>hadoop102:10201</value>
       </property>
       <property>
            <description>The http address of the Timeline service web application.</description>
            <name>yarn.timeline-service.webapp.address</name>
            <value>hadoop102:8188</value>
       </property>
       <property>
            <description>The https address of the Timeline service web application.</description>
            <name>yarn.timeline-service.webapp.https.address</name>
            <value>hadoop102:2191</value>
       </property>
       <property>
            <name>yarn.timeline-service.handler-thread-count</name>
            <value>24</value>
       </property>
    
    
  7. 启动 timelineserver

    yarn --daemon start timelineserver
    
  8. 在 $HADOOP_HOME/etc/hadoop 目录下,创建 tez-site.xml

    • 在 $HADOOP_HOME/etc/hadoop 目录下,创建 tez-site.xml,添加如下配置(记得分发节点),tez-ui地址对应tomcat下的tez-ui路径
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/tez/tez-0.10.1-SNAPSHOT.tar.gz</value>
    </property>
    <property>
        <name>tez.use.cluster.hadoop-libs</name>
        <value>true</value>
    </property>
    <property>
        <name>tez.am.resource.memory.mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>tez.am.resource.cpu.vcores</name>
        <value>1</value>
    </property>
    <property>
        <name>tez.container.max.java.heap.fraction</name>
        <value>0.4</value>
    </property>
    <property>
        <name>tez.task.resource.memory.mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>tez.task.resource.cpu.vcores</name>
        <value>1</value>
    </property>
    
    <property>
            <name>tez.history.logging.service.class</name>
            <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
        </property>
        <property>
            <name>tez.tez-ui.history-url.base</name>
            <value>http://hadoop102:8080/tez-ui/</value>
        </property>
    
    </configuration>
    
  9. 在hive-site.xml文件下添加如下参数,不配置的话,tez-ui中的 All Queries 不会显示数据

    <property>
    	<name>hive.exec.pre.hooks</name>
    	<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
    </property>
    
    <property>
    	<name>hive.exec.post.hooks</name>
    	<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
    </property>
    
    <property>
    	<name>hive.exec.failure.hooks</name>
    	<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
    </property>
    
    
  10. 启动 tomcat ,访问 tez-ui 界面

    • 进入tomcat/bin目录下,启动tomcat
    sh startup.sh
    
  11. tez-ui界面地址(tomcat地址加/tez-ui后缀):http://hadoop102:8080/tez-ui