第三课Spark内部原理剖析与源码阅读

第三课Spark内部原理剖析与源码阅读的相关问题都在下面进行提问回帖
1、大家在这个帖子上回复自己想要提的问题。(相同的问题,请点赞表示自己关注这个问题,不用重复提问)
2、提出的问题,老师在直播课的最后30分钟统一回答。
3、课后会整理出参考答案,给每个问题回帖。
课后调查问卷,请大家填写!
http://wj.qq.com/s/844204/6c93
微信二维码如下:
1476776361806.png

 

@CrazyChao - 人生不止眼前的苟且,还有诗和远方的田野!^.^

赞同来自: crazyant 17090115420 SuperManBack Tomguluson

16/10/18 04:30:59 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
	at cn.chinahadoop.SparkPi$.main(SparkPi.scala:58)
	at cn.chinahadoop.SparkPi.main(SparkPi.scala)
这是IDEA运行一个Spark自带的例子SparkPi,已经在Program arguments中配置了“local”,但是出现如上报错,在代码中设置 setMaster("spark://192.168.141.128:7077") 出现报错:
 WARN AppClient$ClientEndpoint: Failed to connect to master 192.168.141.128:7077
java.io.IOException: Failed to connect to /192.168.141.128:7077
这些是什么原因导致的呢?另外我已经在spark-defaults.conf配置了spark.master!  

@CrazyChao - 人生不止眼前的苟且,还有诗和远方的田野!^.^

赞同来自: crazyant 17090115420

spark加载hadoop本地库的时候出现不能加载的情况,这是什么原因导致的呢? 我64位机器,当时hadoop启动的时候出现不能加载本地类库的这个问题是因为hadoop本身自带的本地库是32位的,编译完hadoop源码后我替换了hadoop-3.0.0本地库为64位的。 解决办法--spark加载hadoop本地库的时候出现不能加载的情况。 vim /etc/profile export LD_LIBRARY_PATH=/data/software/hadoop-3.0/lib/native/:$LD_LIBRARY_PATH source /etc/profile

@CrazyChao - 人生不止眼前的苟且,还有诗和远方的田野!^.^

赞同来自: crazyant

 执行下面命令报错?怎么回事呢?
mvn archetype:generate \
  -DarchetypeGroupId=org.scala-tools.archetypes \
  -DarchetypeArtifactId=scala-archetype-simple \
  -DarchetypeVersion=1.1 \
  -DarmoteRepositories=http://scala-tools.org/repo-releases \
  -DarchetypeCatalog=internal \
  -DinteractiveMode=false \
  -Dversion=1.0-SNAPSHOT \
  -DgroupId=org.training.spark \
  -DartifactId=wordcount

[ERROR] No plugin found for prefix 'archetype' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/root/.m2/repository), central (https://repo.maven.apache.org/maven2)] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoPluginFoundForPrefixException 

 mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /home/myc/Downloads/apache-maven-3.3.9
Java version: 1.8.0_101, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"


luckyzkk

赞同来自: Tomguluson

shuffle过程不产生task吗?

jerry138133

赞同来自: lcg_1023

spark-2.0.1-bin-hadoop-2.7,在这个版本中好像没有找到lib目录和assembly的jar包,是有变化吗?怎么生成这个jar呢?

风雨之间

赞同来自:

spark的累加器是不是使用了分布式锁?会不会影响并行效率?spark的广播数据的大小占整个分配资源的多少比较合理呢?

lwcxks

赞同来自:

董老师,您能分别介绍一下第一课中大数据技术框架图里面的数据挖掘层(数据仓库、OLAP、商务智能)的各个含义么?为什么都属于数据挖掘层,数据仓库、OLAP、商务智能等分别介绍一下,谢谢!

lwcxks

赞同来自:

董老师,您能讲一下spark中的共享变量:累加器和广播变量的用法么?谢谢

ylltw01

赞同来自:

西成老师,您好:   在spark中运行在yarn模式下。num-executor的个数,默认是2个,可以通过num-executor参数设置。   例如:yarn模式下,--executor-cores 2 --num-executors 2 ,那么会启动两个executor,每个为2个cores。    在standalone模式下,executor的个数会根据分配策略(打散《默认》,集中)进行分配。只会在一个work节点上启动一个executor吗?   例如:standalone模式下,--total-executor-cores 4会对有可用的work进行平均分配。如果有2个work,会分别在每个work上启动一个executor,每个2cores。是这样分配的吗?     还有就是,executor的个数在生产中,比较优的方案是分配多少个?和节点(yarn下的nodeManager,standalone下的work)的数量有关系吗?还是在一个节点上启动一个executor,但该executor拥有多个cores?      

周榆杰

赞同来自:

集群添加Federation后,fsck只能delete操作,其他move、blocks等操作失败。 http://192.168.8.17:50070/fsck?ugi=hdfs&path=路径&move=1

freshcandy

赞同来自:

 spark on yarn 方式运行   spark需要启动步骤吗?是不是配置了HADOOP_CONF_DIR就直接可用了?网上看了一些关于spark on yarn方式的集群搭建,都是安装过scala和hadoop后,配置spark集群的slaves,然后再启动,worker不应该是yarn自动分配的吗?

freshcandy

赞同来自:

  在windows上打包了老师的wordcount程序,上传到集群,以yarn-client方式运行,报错: Container exited with a non-zero exit code 10 Failing this attempt. Failing the application.          ApplicationMaster host: N/A          ApplicationMaster RPC port: -1          queue: default          start time: 1476777306177          final status: FAILED          tracking URL: http://namenode:8088/cluster/a ... _0007          user: root Exception in thread "main" org.apache.spark.SparkException: Application application_1472108691286_0007 finished with failed status         at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)         at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)   查看yarn对这个application的日志,提示如下,get不到有用信息。 16/10/18 15:53:24 INFO client.RMProxy: Connecting to ResourceManager at namenode/12.0.0.10:8032 /tmp/logs/root/logs/application_1472108691286_0007 does not exist.   这个运行报错是什么问题呢?  

excelchart

赞同来自:

上一课课件中代码,计算pi那个,   val  count  =  sc.parallelize(1  to  n,  slices).map {  i  =>        val  x  =  random  *  2  -­‐  1        val  y  =  random  *  2  -­‐  1        if  (x*x  +  y*y  <  1)  1  else  0      }.reduce(_  +  _)    对这里的map 语法为什么要用 {} ,而不是()不理解。老师能给解释下吗? 好像换成()就不能运行了!!!

SuperManBack

赞同来自:

董老师,问你个问题,在配置sparksql的时候发现,instance指定太大也不好太小也不好,怎么配置比较合适(怎么设定instance、driver-memery、executor-memory)

ioridong

赞同来自:

在linux的IDEA里运行老师的simplespark-master(wordcount)例子,出现以下错误,是什么问题? masterUrl:local[1], inputPath: /home/hadoop/data/input/readme.txt, outputPath: /home/hadoop/data/output Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/10/18 18:10:10 INFO SparkContext: Running Spark version 1.6.2 16/10/18 18:10:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/10/18 18:10:20 WARN Utils: Your hostname, ubuntu-1 resolves to a loopback address: 127.0.1.1; using 192.168.106.129 instead (on interface ens33) 16/10/18 18:10:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 16/10/18 18:10:20 INFO SecurityManager: Changing view acls to: hadoop 16/10/18 18:10:20 INFO SecurityManager: Changing modify acls to: hadoop 16/10/18 18:10:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 16/10/18 18:10:26 INFO Utils: Successfully started service 'sparkDriver' on port 43912. Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;  at akka.actor.ActorCell$.<init>(ActorCell.scala:336)  at akka.actor.ActorCell$.<clinit>(ActorCell.scala)  at akka.actor.RootActorPath.$div(ActorPath.scala:185)  at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:465)  at akka.remote.RemoteActorRefProvider.<init>(RemoteActorRefProvider.scala:124)  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)  at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)  at scala.util.Try$.apply(Try.scala:192)  at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)  at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)  at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)  at scala.util.Success.flatMap(Try.scala:231)  at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)  at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:585)  at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:578)  at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)  at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)  at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)  at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)  at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52)  at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024)  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015)  at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55)  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266)  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)  at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)  at org.training.examples.WordCount$.main(WordCount.scala:22)  at org.training.examples.WordCount.main(WordCount.scala)  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  at java.lang.reflect.Method.invoke(Method.java:498)  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)   Process finished with exit code 1

SuperManBack

赞同来自:

董老师,再问你一个纠结我很久的问题,就是配置hue用mr可以前端打印实时日志,但是改成saprk就不能前端打印日志了,请问能指导下吗?

李海磊

赞同来自:

董老师,我昨天用打好的jar包在hadoop集群上启动spark作业时,跑了两次发现都失败了,查看了一下日志,yarn-hadoop-nodemanager-slave1.out日志出现如下限制条件,这个要手动在limit.conf中去配置么?该怎么配置啊?具体参数应该怎么调啊?我的两台slave节点全部是虚拟机,另外我在公司的计算云,ubuntn虚拟机中架设hadoop集群时,也出现这种情况,这种事不是指出现在vm中呢?物理机上不出这些相关限制啊? ulimit -a core file size          (blocks, -c) 0 data seg size           (kbytes, -d) unlimited scheduling priority             (-e) 0 file size               (blocks, -f) unlimited pending signals                 (-i) 14802 max locked memory       (kbytes, -l) 64 max memory size         (kbytes, -m) unlimited open files                      (-n) 1024 pipe size            (512 bytes, -p) 8 POSIX message queues     (bytes, -q) 819200 real-time priority              (-r) 0 stack size              (kbytes, -s) 10240 cpu time               (seconds, -t) unlimited max user processes              (-u) 1024 virtual memory          (kbytes, -v) unlimited file locks                      (-x) unlimited  

andyzhang

赞同来自:

spark中的linerage与checkpoint是如何一起工作的呢?

andyzhang

赞同来自:

RDD的容错性为什么不使用副本的机制呢?

xuyifei

赞同来自:

请教董老师,,从redis中读取的数据,如果有500M,映射成rdd,处理时会OOM,是不是只有一个task去处理,应该怎么解决?

zilong230905

赞同来自:

生成物理执行图:划分stage,这小节能否详细讲下?cogrouprdd和unionrdd关系

kaiseu

赞同来自:

yarn cluster 和 yarn client 各有什么优缺点?一般应该怎么选择?

tsingfu

赞同来自:

spark core 中 clean(func),感觉不好理解 ,可以抽空讲一下吗?

dark

赞同来自:

董老师,你好:      我在项目中看到一段代码,代码如下:
 /**
     * 按应用设备数去重
     *
     * @param channelRdd
     * @return
     */
    private JavaRDD<ChannelLogInfo> processDeviceRdd(final JavaRDD<ChannelLogInfo> channelRdd) {
        return channelRdd.mapToPair(new PairFunction<ChannelLogInfo, String, ChannelLogInfo>() {
            @Override
            public Tuple2<String, ChannelLogInfo> call(final ChannelLogInfo log) throws Exception {
                //现在修改逻辑,使用packet_name来做唯一标识.pkg_name#device_id#channel_id
                String logKey = Joiner.on("#").join(log.pkg_name(),log.device_id(),log.channel_id());
                return new Tuple2<>(logKey, log);
            }
        }).reduceByKey(new Function2<ChannelLogInfo, ChannelLogInfo, ChannelLogInfo>() {
            @Override
            public ChannelLogInfo call(final ChannelLogInfo info1, final ChannelLogInfo info2)
                throws Exception {
                return info1.timestamp() > info2.timestamp() ? info1 : info2;
            }
        }).values().groupBy(new Function<ChannelLogInfo, String>() {
            @Override
            public String call(final ChannelLogInfo info) throws Exception {
                
                //现在修改逻辑,使用packet_name来做唯一标识.pkg_name#device_id#channel_id
                String logKey = Joiner.on("#").join(info.pkg_name(),info.device_id(),info.channel_id());
                return logKey;
            }
        }).flatMap(new FlatMapFunction<Tuple2<String, Iterable<ChannelLogInfo>>, ChannelLogInfo>() {
            @Override
            public Iterable<ChannelLogInfo> call(final Tuple2<String, Iterable<ChannelLogInfo>> entry)
                throws Exception {
                return entry._2();
            }
        });
    }
这段代码是对日志进行去重的。方法注释有说明。我阅读这段代码发现。代码在reduceByKey 完成后就已经实现了去掉重复的功能。然而后面的values、groupBy、flatMap 这几个方法的作用是什么?我没有理解。是否与shuffle性能有关呢?请董老师指点!

dark

赞同来自:

董老师,你好:        我在练习编写第二节课中《简易电影受众系统》的第二个问题:年龄段在“18-24”的男性年轻人,最喜欢看哪10部电影。代码片段如下:
 /**
     * Step 3: map-side join RDDs
     */

    val topKmovies = ratingsRdd.map(_.split("::")).map{ x =>
      (x(0), x(1))
    }.filter { x =>
      broadcastUserSet.value.contains(x._1)
    }.map{ x=>
      (x._2, 1)
    }.reduceByKey(_ + _).map{ x =>
      (x._2, x._1)
    }.sortByKey(false).map{ x=>
      (x._2, x._1)
    }.take(10)

    /**
     * Transfrom filmID to fileName
     */
    val movieID2Name = moviesRdd.map(_.split("::")).map { x =>
      (x(0), x(1))
    }.collect().toMap

    topKmovies.map(x => (movieID2Name.getOrElse(x._1, null), x._2)).foreach(println)
最后一行是一个数组和一个Map的操作。我的问题是: 1.这个操作是在dirver 上还是在 executor 上 进行操作的呢? 2. 如果数据量比较大是否会对dirver 或者 executor 有影响

郭亮

赞同来自:

老师好。 请问  您看这 RDD里的值  不对。。  我试了很多次,RDD里条数对,但内容都是最后一行的值。再往下解析OrcStruct,值也不对,前面的RDD值就错了。 能不能不调用SPARK-SQL,像上述这种解析ORC文件。 下面是正确的值。SPARK-SQL也正确。 hive> select * from orctest; OK sadfghjsfdajhkfds       sad sfdahfdashjfdas sfd asdfasdfsdaf    asd 123123dsfdasfdsa        123 Time taken: 0.269 seconds, Fetched: 4 row(s) SPARK-SHELL里RDD的值不对。 scala> import org.apache.hadoop.io.NullWritable import org.apache.hadoop.io.NullWritable scala> import org.apache.hadoop.hive.ql.io.orc.OrcStruct import org.apache.hadoop.hive.ql.io.orc.OrcStruct scala> val glorc=sc.hadoopFile[NullWritable,OrcStruct,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat]("/hive/warehouse/orctest") glorc: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.NullWritable, org.apache.hadoop.hive.ql.io.orc.OrcStruct)] = /hive/warehouse/orctest HadoopRDD[0] at hadoopFile at <console>:26 scala> glorc.collect res0: Array[(org.apache.hadoop.io.NullWritable, org.apache.hadoop.hive.ql.io.orc.OrcStruct)] = Array(((null),{123123dsfdasfdsa, 123}), ((null),{123123dsfdasfdsa, 123}), ((null),{123123dsfdasfdsa, 123}), ((null),{123123dsfdasfdsa, 123}))  

笑着走下去

赞同来自:

董老师您好:    我把您提供的项目git clone https://github.com/XichengDong/simplespark 克隆到本地后,导入到idea中可以正常运行。然后我打包运行,提示class not found。详情如下: 环境:    OS:CentOS 6.5, Jdk:1.7.0_79 Apache Maven:3.3.9 Apache Hadoop:2.6.0 Spark:1.6.1 Scala:2.10.4    IDEA:idea-IC-162.2228.15    程序结构见附件(src/main/scala/org.training.examples.WordCount.scala) 执行正常的打包命令:    mvn clean package  也正常输出 [INFO] META-INF/services/tachyon.underfs.UnderFileSystemFactory already added, skipping [INFO] META-INF/MANIFEST.MF already added, skipping [INFO] META-INF/services/tachyon.underfs.UnderFileSystemFactory already added, skipping [INFO] META-INF/MANIFEST.MF already added, skipping [INFO] META-INF/MANIFEST.MF already added, skipping [INFO] META-INF/MANIFEST.MF already added, skipping [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 41.516 s [INFO] Finished at: 2016-10-20T00:25:29+08:00 [INFO] Final Memory: 19M/113M [INFO] ------------------------------------------------------------------------   然后写了运行脚本 run_wordcount.sh cat run_wordcount.sh  export HADOOP_CONF_DIR=/opt/hadoop_eco/hadoop-2.6.0/etc/hadoop rm -rf /tmp/out1 SPARK_HOME=/opt/hadoop_eco/spark-1.6.1 $SPARK_HOME/bin/spark-submit \   --master spark://192.168.37.31:7077 \   --class org.trainging.examples.WordCount \   /opt/git/simplespark/target/examples-1.0-SNAPSHOT-jar-with-dependencies.jar \   spark://192.168.37.31:7077 /opt/hadoop_eco/hadoop-2.6.0/README.txt /tmp/out1     运行脚本:  sh run_wordcount.sh sh run_wordcount.sh  java.lang.ClassNotFoundException: org.trainging.examples.WordCount     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)     at java.security.AccessController.doPrivileged(Native Method)     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)     at java.lang.Class.forName0(Native Method)     at java.lang.Class.forName(Class.java:274)     at org.apache.spark.util.Utils$.classForName(Utils.scala:174)     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)     试了另外几个包,也是同样的问题。      

dark

赞同来自:

董老师,你好:    我在联系编写第二节课中《简易电影受众系统》的第一个问题时,使用broadcast 方式也得到了相同的结果。 能否对这两种实现方式做一个简单的对比呢? 什么场景使用join 更合适,什么场景使用broadcast更好呢?

luckyzkk

赞同来自:

老师:在单击yarn模式和ha分布式模式下执行 spark-shell --master yarn-client都报超出虚拟内存的错误 应该如何修改配置?@Dong 16/10/26 11:51:36 INFO yarn.Client:      client token: N/A      diagnostics: Application application_1477451989735_0001 failed 2 times due to AM Container for appattempt_1477451989735_0001_000002 exited with  exitCode: -103 For more detailed output, check application tracking page:http://kk-HP-Compaq-Pro-6380-M ... 1Then, click on links to logs of each attempt. Diagnostics: Container [pid=11207,containerID=container_1477451989735_0001_02_000001] is running beyond virtual memory limits. Current usage: 201.6 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1477451989735_0001_02_000001 :     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE     |- 11207 11205 11207 11207 (bash) 0 0 17059840 358 /bin/bash -c /usr/lib/jvm/jdk1.8.0_74/bin/java -server -Xmx512m -Djava.io.tmpdir=/opt/hadoop-2.7.2/tmp/nm-local-dir/usercache/kk/appcache/application_1477451989735_0001/container_1477451989735_0001_02_000001/tmp -Dspark.yarn.app.container.log.dir=/opt/hadoop-2.7.2/logs/userlogs/application_1477451989735_0001/container_1477451989735_0001_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '10.249.12.32:33384' --executor-memory 1024m --executor-cores 1 --properties-file /opt/hadoop-2.7.2/tmp/nm-local-dir/usercache/kk/appcache/application_1477451989735_0001/container_1477451989735_0001_02_000001/__spark_conf__/__spark_conf__.properties 1> /opt/hadoop-2.7.2/logs/userlogs/application_1477451989735_0001/container_1477451989735_0001_02_000001/stdout 2> /opt/hadoop-2.7.2/logs/userlogs/application_1477451989735_0001/container_1477451989735_0001_02_000001/stderr     |- 11211 11207 11207 11207 (java) 544 23 2314432512 51253 /usr/lib/jvm/jdk1.8.0_74/bin/java -server -Xmx512m -Djava.io.tmpdir=/opt/hadoop-2.7.2/tmp/nm-local-dir/usercache/kk/appcache/application_1477451989735_0001/container_1477451989735_0001_02_000001/tmp -Dspark.yarn.app.container.log.dir=/opt/hadoop-2.7.2/logs/userlogs/application_1477451989735_0001/container_1477451989735_0001_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 10.249.12.32:33384 --executor-memory 1024m --executor-cores 1 --properties-file /opt/hadoop-2.7.2/tmp/nm-local-dir/usercache/kk/appcache/application_1477451989735_0001/container_1477451989735_0001_02_000001/__spark_conf__/__spark_conf__.properties Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Failing this attempt. Failing the application.      ApplicationMaster host: N/A      ApplicationMaster RPC port: -1      queue: default      start time: 1477453864186      final status: FAILED      tracking URL: http://kk-HP-Compaq-Pro-6380-M ... _0001      user: kk 16/10/26 11:51:36 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1477451989735_0001 16/10/26 11:51:36 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.     at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)     at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)     at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)     at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)     at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)     at $line3.$read$$iwC$$iwC.<init>(<console>:15)     at $line3.$read$$iwC.<init>(<console>:24)     at $line3.$read.<init>(<console>:26)     at $line3.$read$.<init>(<console>:30)     at $line3.$read$.<clinit>(<console>)     at $line3.$eval$.<init>(<console>:7)     at $line3.$eval$.<clinit>(<console>)     at $line3.$eval.$print(<console>)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:498)     at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)     at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)     at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)     at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)     at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)     at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)     at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)     at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)     at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)     at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)     at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)     at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)     at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)     at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)     at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)     at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)     at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)     at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)     at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)     at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)     at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)     at org.apache.spark.repl.Main$.main(Main.scala:31)     at org.apache.spark.repl.Main.main(Main.scala)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:498)     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)  

要回复问题请先登录注册