capacity-scheduler调度器的default队列,执行任务卡在ACCEPTED阶段

四台虚拟机Host0,Host1,Host2,Host3
Host0:rm
Host1,Host2,Host3:nm
我想使用node-label+capacity queue实现下图功能
2.PNG

 
不过在提交到capacity default队列时,任务卡在了accepted阶段,提交到其他队列没事。

在三台nm节点上(Host1,Host2,Host3)分别打上了node-label:
1.PNG

 
capacity queue共有4个队列:default,area0,area1,area2
队列area0只能在node-label:area0的主机上执行
队列area1只能在node-label:area1的主机上执行
队列area2只能在node-label:area2的主机上执行
队列default能在node-label:area0,area1,area2的主机上执行
 并在rm(Host0)上配置了/etc/hadoop/conf/capacity-scheduler.xml文件
/etc/hadoop/conf/capacity-scheduler.xml配置文件如下
 
<configuration>
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,area0,area1,area2</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>25</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area0.capacity</name>
    <value>25</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area1.capacity</name>
    <value>25</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area2.capacity</name>
    <value>25</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area0.maximum-capacity</name>
    <value>100</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area1.maximum-capacity</name>
    <value>100</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area2.maximum-capacity</name>
    <value>100</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
    <value>area0,area1,area2</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area0.accessible-node-labels</name>
    <value>area0</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area1.accessible-node-labels</name>
    <value>area1</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area2.accessible-node-labels</name>
    <value>area2</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.accessible-node-labels.area0.capacity</name>
    <value>33</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.accessible-node-labels.area1.capacity</name>
    <value>33</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.accessible-node-labels.area2.capacity</name>
    <value>34</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area0.accessible-node-labels.area0.capacity</name>
    <value>100</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area1.accessible-node-labels.area1.capacity</name>
    <value>100</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area2.accessible-node-labels.area2.capacity</name>
    <value>100</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default-node-label-expression</name>
    <value> ,area0,area1,area2</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area0.default-node-label-expression</name>
    <value>area0</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area1.default-node-label-expression</name>
    <value>area1</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.area2.default-node-label-expression</name>
    <value>area2</value>
  </property>
</configuration>
 
老师帮忙看下哪出了问题。

wangxiaolei

赞同来自:

mapred-site.xml和yarn-site.xml文件是怎么配置的

张伟

赞同来自:

mapred-site.xml <property>     <name>mapred.job.tracker</name>     <value>Host0:8021</value>   </property>   <property>     <name>mapreduce.job.split.metainfo.maxsize</name>     <value>10000000</value>   </property>   <property>     <name>mapred.local.dir</name>     <value>/home/mapred_local</value>   </property>   <property>     <name>mapreduce.job.counters.max</name>     <value>120</value>   </property>   <property>     <name>mapreduce.output.fileoutputformat.compress</name>     <value>false</value>   </property>   <property>     <name>mapreduce.output.fileoutputformat.compress.type</name>     <value>BLOCK</value>   </property>   <property>     <name>mapreduce.output.fileoutputformat.compress.codec</name>     <value>org.apache.hadoop.io.compress.DefaultCodec</value>   </property>   <property>     <name>mapreduce.map.output.compress.codec</name>     <value>org.apache.hadoop.io.compress.SnappyCodec</value>   </property>   <property>     <name>mapreduce.map.output.compress</name>     <value>true</value>   </property>   <property>     <name>zlib.compress.level</name>     <value>DEFAULT_COMPRESSION</value>   </property>   <property>     <name>mapreduce.task.io.sort.factor</name>     <value>64</value>   </property>   <property>     <name>mapreduce.map.sort.spill.percent</name>     <value>0.8</value>   </property>   <property>     <name>mapreduce.reduce.shuffle.parallelcopies</name>     <value>10</value>   </property>   <property>     <name>mapreduce.task.timeout</name>     <value>600000</value>   </property>   <property>     <name>mapreduce.client.submit.file.replication</name>     <value>3</value>   </property>   <property>     <name>mapreduce.job.reduces</name>     <value>1</value>   </property>   <property>     <name>mapreduce.task.io.sort.mb</name>     <value>30</value>   </property>   <property>     <name>mapreduce.map.speculative</name>     <value>false</value>   </property>   <property>     <name>mapreduce.reduce.speculative</name>     <value>false</value>   </property>   <property>     <name>mapreduce.job.reduce.slowstart.completedmaps</name>     <value>0.8</value>   </property>   <property>     <name>mapreduce.jobhistory.address</name>     <value>Host0:10020</value>   </property>   <property>     <name>mapreduce.jobhistory.webapp.address</name>     <value>Host0:19888</value>   </property>   <property>     <name>mapreduce.framework.name</name>     <value>yarn</value>   </property>   <property>     <name>yarn.app.mapreduce.am.staging-dir</name>     <value>/user</value>   </property>   <property>     <name>yarn.app.mapreduce.am.resource.mb</name>     <value>500</value>   </property>   <property>     <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>     <value>1</value>   </property>   <property>     <name>mapreduce.job.ubertask.enabled</name>     <value>false</value>   </property>   <property>     <name>yarn.app.mapreduce.am.command-opts</name>     <value>-Djava.net.preferIPv4Stack=true -Xmx300m</value>   </property>   <property>     <name>mapreduce.map.java.opts</name>     <value>-Djava.net.preferIPv4Stack=true -Xmx300m</value>   </property>   <property>     <name>mapreduce.reduce.java.opts</name>     <value>-Djava.net.preferIPv4Stack=true -Xmx300m</value>   </property>   <property>     <name>mapreduce.map.memory.mb</name>     <value>500</value>   </property>   <property>     <name>mapreduce.map.cpu.vcores</name>     <value>1</value>   </property>   <property>     <name>mapreduce.reduce.memory.mb</name>     <value>500</value>   </property>   <property>     <name>mapreduce.reduce.cpu.vcores</name>     <value>1</value>   </property>   <property>     <name>mapreduce.application.classpath</name>     <value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH</value>   </property>   <property>     <name>mapreduce.admin.user.env</name>     <value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value>   </property>   <property>     <name>mapreduce.shuffle.max.connections</name>     <value>80</value>   </property>  

张伟

赞同来自:

yarn-site.xml   <configuration>   <property>     <name>yarn.acl.enable</name>     <value>true</value>   </property>   <property>     <name>yarn.resourcemanager.hostname</name>     <value>Host0</value>   </property>   <property>     <name>yarn.admin.acl</name>     <value>*</value>   </property>   <property>     <name>yarn.resourcemanager.address</name>     <value>${yarn.resourcemanager.hostname}:8032</value>   </property>   <property>     <name>yarn.resourcemanager.admin.address</name>     <value>${yarn.resourcemanager.hostname}:8033</value>   </property>   <property>     <name>yarn.resourcemanager.scheduler.address</name>     <value>${yarn.resourcemanager.hostname}:8030</value>   </property>   <property>     <name>yarn.resourcemanager.resource-tracker.address</name>     <value>${yarn.resourcemanager.hostname}:8031</value>   </property>   <property>     <name>yarn.resourcemanager.webapp.address</name>     <value>${yarn.resourcemanager.hostname}:8088</value>   </property>   <property>     <name>yarn.resourcemanager.client.thread-count</name>     <value>50</value>   </property>   <property>     <name>yarn.resourcemanager.scheduler.client.thread-count</name>     <value>50</value>   </property>   <property>     <name>yarn.resourcemanager.admin.client.thread-count</name>     <value>1</value>   </property>   <property>     <name>yarn.scheduler.minimum-allocation-mb</name>     <value>50</value>   </property>   <property>     <name>yarn.scheduler.increment-allocation-mb</name>     <value>50</value>   </property>   <property>     <name>yarn.scheduler.maximum-allocation-mb</name>     <value>8192</value>   </property>   <property>     <name>yarn.scheduler.minimum-allocation-vcores</name>     <value>1</value>   </property>   <property>     <name>yarn.scheduler.increment-allocation-vcores</name>     <value>1</value>   </property>   <property>     <name>yarn.scheduler.maximum-allocation-vcores</name>     <value>2</value>   </property>   <property>     <name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name>     <value>1000</value>   </property>   <property>     <name>yarn.am.liveness-monitor.expiry-interval-ms</name>     <value>600000</value>   </property>   <property>     <name>yarn.resourcemanager.am.max-retries</name>     <value>1</value>   </property>   <property>     <name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name>     <value>600000</value>   </property>   <property>     <name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name>     <value>1000</value>   </property>   <property>     <name>yarn.nm.liveness-monitor.expiry-interval-ms</name>     <value>600000</value>   </property>   <property>     <name>yarn.resourcemanager.resource-tracker.client.thread-count</name>     <value>50</value>   </property>   <property>     <name>yarn.application.classpath</name>     <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>   </property>   <property>     <name>yarn.resourcemanager.scheduler.class</name>     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>   </property>   <property>     <name>yarn.resourcemanager.max-completed-applications</name>     <value>100</value>   </property>   <property>     <name>yarn.nodemanager.resource.memory-mb</name>     <value>3072</value>   </property>   <property>     <name>yarn.nodemanager.resource.cpu-vcores</name>     <value>2</value>   </property>   <property>     <name>yarn.nodemanager.aux-services</name>     <value>mapreduce_shuffle</value>   </property>   <property>     <name>yarn.nodemanager.local-dirs</name>     <value>file:///home/yarn/local</value>   </property>   <property>     <name>yarn.nodemanager.log-dirs</name>     <value>file:///var/log/hadoop-yarn</value>   </property>   <property>     <name>yarn.nodemanager.remote-app-log-dir</name>     <value>hdfs://${yarn.resourcemanager.hostname}:8020/tmp/yarn-log</value>   </property>   <property>      <name>yarn.app.mapreduce.am.staging-dir</name>      <value>/user</value>   </property>   <property>      <name>yarn.log.server.url</name>      <value>http://Host0:19888/history/logs</value>   </property>   <property>      <name>yarn.log-aggregation.retain-seconds</name>      <value>864000</value>   </property>   <property>      <name>yarn.log-aggregation-enable</name>      <value>true</value>   </property>   <property>      <name>yarn.nodemanager.log-aggregation.compression-type</name>      <value>gz</value>   </property> <!--   <property>      <name>yarn.resourcemanager.nodes.include-path</name>      <value>/etc/hadoop/include_datanode</value>   </property>   <property>      <name>yarn.resourcemanager.nodes.exclude-path</name>      <value>/etc/hadoop/exclude_datanode</value>   </property> -->   <property>      <name>yarn.nodemanager.vmem-check-enabled</name>      <value>false</value>      <description>Whether virtual memory limits will be enforced for containers</description>   </property>   <property>      <name>yarn.nodemanager.vmem-pmem-ratio</name>      <value>4</value>      <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>   </property>   <property>        <name>yarn.node-labels.enabled</name>        <value>true</value>   </property>   <property>      <name>yarn.nodemanager.address</name>      <value>0.0.0.0:45454</value>   </property>   <property>      <name>yarn.node-labels.manager-class</name>      <value>org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager</value>   </property>   <property>      <name>yarn.node-labels.fs-store.root-dir</name>      <value>hdfs://Host0:8020/yarn/node-labels</value>      <description>标签数据在HDFS上的存储位置</description>   </property> </configuration>

wangxiaolei

赞同来自:

  <property>     <name>yarn.nodemanager.resource.memory-mb</name>     <value>3072</value>   </property>   <property>     <name>yarn.nodemanager.resource.cpu-vcores</name>     <value>2</value>   </property> 每台机器物理内存和cpu能达到配置的值吗?还有mapreduce.map.memory.mb等配置。  

wangxiaolei

赞同来自:

我怎么才能访问resourcemanager页面

fish - Hadooper

赞同来自:

总是觉得你这里对default的配置有点怪。 比如yarn.scheduler.capacity.root.default-node-label-expression,为什么不是yarn.scheduler.capacity.root.default.default-node-label-expression,是不是都过一下,名字什么的是不是还有些配的不太对的?

wangxiaolei

赞同来自:

确定之前是好的,现在跑wordcount报错 2016-09-13 01:29:32,028 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Host2:50010:DataXceiver error processing WRITE_BLOCK operation  src: /211.68.36.127:55535 dst: /211.68.36.129:50010 java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[connected local=/211.68.36.129:50010 remote=/211.68.36.127:55535]. 59999 millis timeout left.

wangxiaolei

赞同来自:

你在4个队列:default,area0,area1,area2上分别执行的什么命令?

wangxiaolei

赞同来自:

我把/etc/hadoop/conf/capacity-scheduler.xml换成训练营的,原来的备份了。 集群环境跑pi也不能跑。你再看看环境

fish - Hadooper

赞同来自:

请再仔细阅读一下对于yarn.scheduler.capacity.<queue-path>.default-node-label-expression的定义。

fish - Hadooper

赞同来自:

是的,yarn.scheduler.capacity.<queue-path>.default-node-label-expression定义的是:指定到queue中的任务如果没有说明用什么label,使用这个定义所说明的。   你的配置中定义的值是“ ,area0,area1,area2”,你先尝试着将这个值设置成某个单独的lable(比如area0试试)。 前面你说提交到area0的能生效,请恢复你之前使用的配置文件。   另外,对于node-label-expression的定义,解析的代码是:
1473749908057.png
  多label之间,用&&分割。

要回复问题请先登录注册