第二课_Spark程序设计与企业级应用案例

第二课_Spark程序设计与企业级应用案例的相关问题都在下面进行提问回帖
1、大家在这个帖子上回复自己想要提的问题。(相同的问题,请点赞表示自己关注这个问题,不用重复提问)
2、提出的问题,老师在直播课的最后30分钟统一回答。
3、课后会整理出参考答案,给每个问题回帖。
第二课课后调查问卷:
https://wj.qq.com/s/1259269/fe17
或者扫二维码进行填写
2调查.jpg

 

jhg22

赞同来自: 李思宇isx 张文山4tw

spark正式生产环境中问题:spark实时计算(kafka+sparkStreaming+redis)时,每分钟都会有task skipped的现象,如下图所示:
0.jpg.png
1.jpg.png
2.jpg.png
3.jpg.png
不知道什么原因引起的?董老师帮忙看一下呢?一直没有找到原因。 数据一直都有流入的。

tl_oni

赞同来自: 探照灯儿

我用IntelliJ IDEA构建开发环境时,用提spark-2.10,scala 2.11.8,jdk 1.8,出现 not found: type SparkConf     问题, 如图,是什原因,怎么办?? 环境: spark master ip 116, 三个worker 为101,102,103   我的开发主机IntelliJ IDEA构建在ip 为114上
ca2ce120158f5842b13488acaff42fb4.png

jhg22

赞同来自: tl_oni

引入包 import org.apache.spark._

王云鹏

赞同来自: Lotus丶

我们现在用的是Spark1.6,如果升级到Spark2.1,是否只需要重新配置即可?代码向下兼容?谢谢

kendu

赞同来自: Lotus丶

请问老师SPARK on YARN集群模式下,可以用python接口来写spark应用程序,并提交给SPARK on YARN集群吗?谢谢!

jhg22

赞同来自:

spark计算时,数据会写入磁盘吗?什么条件下写入磁盘?

那小子真帅

赞同来自:

报错日志里的错误是:ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 这个可能的原因是什么?

会飞的象

赞同来自:

看第一课中,有一条SQL,老师说用了4个MR。那么如何区分一条SQL用几个MR呢?

waistcoat100

赞同来自:

resource manager的工作负担重不重?实际部署的时候,是否是将resource manager单独部署在一台机器上还是与name node合并?

逍遥feng

赞同来自:

请问下  sortbykey可以设置reduce个数么  如何设置?hbase的region预分区个数和spark过程中的reduce个数相同么

michaelhadoop

赞同来自:

请问老师container是进程吗?yarnAllocator分配的container是什么形式给到executor的?task是以什么形式传到executor的?kyro的buffer.max指的是什么?

run_psw

赞同来自:

老师,我测试过,现在spark 支持多个sparkcontext了啊?只要multiple.sparkcontext 设置为true就行了。也可以运行。能否谈下,这种方式是为了解决那种类型的需求?

张文山4tw

赞同来自:

想问老师数据收集的问题:   在 日志文件-》flume -》 kafka -》SparkStreaming架构中,使用tail -F收集日志,遇到了以下两个问题:   1. 日志文件在滚动(例如文件大小达到固定上限),切换文件时,flume会丢数据; 2. source输入太快,会丢失数据   请问董老师,有什么解决方案吗?

会飞的象

赞同来自:

老师,请教您一个问题:spark处理的问题必须是可以按行切割的文件么?换句话说:就是内容可以按行随机处理。

BobLee

赞同来自:

1)cache时,超出内存怎么处理? 2)如果设置的core超过cpu的核会怎么处理?

heming621

赞同来自:

(1)如果一个文件200M,存储的时候是两个blcok,两个block是均匀地分为100M、100M,还是一个一个block存储分为128M、72M? (2)如果是一个200M的文件,分为两个partition,是100M、100M,还是128M、72M呢? 如果是后者,会导致负载均衡问题吗?

jhg22

赞同来自:

介绍一下spark程序中,怎样打log4j的日志?

frmark

赞同来自:

董老师,在sparkconf中设置和在submit设置的优先级,还有就是rdd的缓存时间,不释放就一直存在吗?

迷路剑客

赞同来自:

请问下 现在工作中都用JAVA 可不可以不学SCALA直接学JAVA的各种SPARK用法

贰怪兽lyn

赞同来自:

刚才讲到的pipe调用外部脚本,这个外部脚本是否需要每个executor的相同目录下都要有一份这个脚本? 是否有通过spark-submit提交外部脚本的方式?

探照灯儿

赞同来自:

intelij 在windows下,可以本机IDE中执行吗

930523

赞同来自:

yarn-client 模式下 sc.textFile("file:///data") 是读取driver上的本地文件还是executor上的文件

贰怪兽lyn

赞同来自:

spark-submit -jars 提交n个jar包 如果--master yarn-cluster 是否也需要每个executor的相同目录下都要有一份同样的n个jar吗?

苗苗树

赞同来自:

spark-submit 启动应用时,num-executors 和 executor-cores  num-executors=100 executor-cores=1 num-executors=50 executor-cores=2 executors-total 是一样的,有啥区别?

frmark

赞同来自:

再问一下,rdd repartition后,hdfs上的存储不会相应的变化吧,这样的话,rdd是在不同的节点之间shuffle吗?

frmark

赞同来自:

问一下为什么yarn模式下需要指定spark.yarn.jar,同时问一下,这个jar是自己打的吗,如果其中包括自己写的应用程序的话,就不用担心环境的兼容性问题和包冲突的问题了吧,不知道理解的对不对。

掂吾掂

赞同来自:

请问spark on yarn cluster 模式是否对服务器的内存有要求?必须每台服务器的内存必须超过8G?

掂吾掂

赞同来自:

Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.   请问启动yarn集群的时候,出现这个问题?

tsyx163

赞同来自:

董老师,我通过 
./spark-shell --master yarn 启动了spark后,运行以下内容,
 
val sc.textFile("/opt/data/movie/users.dat")
val userrdd = sc.textFile("/opt/data/movie/users.dat")
当调用 userrdd.count 时,出现以下错误,请问是怎么回事呢?
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.5.1/nm-local-dir/usercache/root/filecache/28/__spark_libs__2285737196423591043.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/software/hadoop-2.5.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/04/16 18:54:54 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 24407@master
17/04/16 18:54:54 INFO util.SignalUtils: Registered signal handler for TERM
17/04/16 18:54:54 INFO util.SignalUtils: Registered signal handler for HUP
17/04/16 18:54:54 INFO util.SignalUtils: Registered signal handler for INT
17/04/16 18:54:55 INFO spark.SecurityManager: Changing view acls to: root
17/04/16 18:54:55 INFO spark.SecurityManager: Changing modify acls to: root
17/04/16 18:54:55 INFO spark.SecurityManager: Changing view acls groups to: 
17/04/16 18:54:55 INFO spark.SecurityManager: Changing modify acls groups to: 
17/04/16 18:54:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/04/16 18:54:56 INFO client.TransportClientFactory: Successfully created connection to /10.135.111.231:48282 after 104 ms (0 ms spent in bootstraps)
17/04/16 18:54:56 INFO spark.SecurityManager: Changing view acls to: root
17/04/16 18:54:56 INFO spark.SecurityManager: Changing modify acls to: root
17/04/16 18:54:56 INFO spark.SecurityManager: Changing view acls groups to: 
17/04/16 18:54:56 INFO spark.SecurityManager: Changing modify acls groups to: 
17/04/16 18:54:56 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/04/16 18:54:56 INFO client.TransportClientFactory: Successfully created connection to /10.135.111.231:48282 after 2 ms (0 ms spent in bootstraps)
17/04/16 18:54:56 INFO storage.DiskBlockManager: Created local directory at /opt/hadoop-2.5.1/nm-local-dir/usercache/root/appcache/application_1492328023585_0010/blockmgr-55d5a33c-e918-4dd8-af78-38c7d035c60b
17/04/16 18:54:56 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MB
17/04/16 18:54:57 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.135.111.231:48282
17/04/16 18:54:57 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
17/04/16 18:54:57 INFO executor.Executor: Starting executor ID 1 on host master
17/04/16 18:54:57 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43664.
17/04/16 18:54:57 INFO netty.NettyBlockTransferService: Server created on master:43664
17/04/16 18:54:57 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/04/16 18:54:57 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(1, master, 43664, None)
17/04/16 18:54:57 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(1, master, 43664, None)
17/04/16 18:54:57 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(1, master, 43664, None)
17/04/16 18:54:57 INFO executor.Executor: Using REPL class URI: spark://10.135.111.231:48282/classes
17/04/16 18:56:57 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0
17/04/16 18:56:57 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/04/16 18:56:58 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
17/04/16 18:56:58 INFO client.TransportClientFactory: Successfully created connection to /10.135.111.231:35165 after 2 ms (0 ms spent in bootstraps)
17/04/16 18:56:58 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1889.0 B, free 413.9 MB)
17/04/16 18:56:58 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 179 ms
17/04/16 18:56:58 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 413.9 MB)
17/04/16 18:56:58 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:0+67184
17/04/16 18:56:58 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
17/04/16 18:56:58 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.4 KB, free 413.9 MB)
17/04/16 18:56:58 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 11 ms
17/04/16 18:56:58 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 296.9 KB, free 413.6 MB)
17/04/16 18:56:59 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
17/04/16 18:56:59 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
17/04/16 18:56:59 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
17/04/16 18:56:59 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
17/04/16 18:56:59 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
17/04/16 18:56:59 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1760)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
17/04/16 18:57:00 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
17/04/16 18:57:00 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
17/04/16 18:57:00 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:67184+67184
17/04/16 18:57:00 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:136)
	at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:251)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
17/04/16 18:57:00 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 2
17/04/16 18:57:00 INFO executor.Executor: Running task 0.1 in stage 0.0 (TID 2)
17/04/16 18:57:00 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:0+67184
17/04/16 18:57:00 ERROR executor.Executor: Exception in task 0.1 in stage 0.0 (TID 2)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1760)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
17/04/16 18:57:00 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 3
17/04/16 18:57:00 INFO executor.Executor: Running task 1.1 in stage 0.0 (TID 3)
17/04/16 18:57:00 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:67184+67184
17/04/16 18:57:00 ERROR executor.Executor: Exception in task 1.1 in stage 0.0 (TID 3)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:136)
	at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:251)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
17/04/16 18:57:00 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4
17/04/16 18:57:00 INFO executor.Executor: Running task 0.2 in stage 0.0 (TID 4)
17/04/16 18:57:00 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:0+67184
17/04/16 18:57:00 ERROR executor.Executor: Exception in task 0.2 in stage 0.0 (TID 4)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1760)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
17/04/16 18:57:00 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 5
17/04/16 18:57:00 INFO executor.Executor: Running task 1.2 in stage 0.0 (TID 5)
17/04/16 18:57:00 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:67184+67184
17/04/16 18:57:00 ERROR executor.Executor: Exception in task 1.2 in stage 0.0 (TID 5)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:136)
	at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:251)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
17/04/16 18:57:00 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 6
17/04/16 18:57:00 INFO executor.Executor: Running task 0.3 in stage 0.0 (TID 6)
17/04/16 18:57:00 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:0+67184
17/04/16 18:57:00 ERROR executor.Executor: Exception in task 0.3 in stage 0.0 (TID 6)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:208)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:48)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1760)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
17/04/16 18:57:00 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 7
17/04/16 18:57:00 INFO executor.Executor: Running task 1.3 in stage 0.0 (TID 7)
17/04/16 18:57:00 INFO rdd.HadoopRDD: Input split: hdfs://master:9000/opt/data/movie/users.dat:67184+67184
17/04/16 18:57:00 INFO executor.Executor: Executor is trying to kill task 1.3 in stage 0.0 (TID 7)
17/04/16 18:57:00 ERROR executor.Executor: Exception in task 1.3 in stage 0.0 (TID 7)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
	at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
	at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
	at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:231)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94)
	at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:136)
	at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:251)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

水晶紫爱睡觉

赞同来自:

董老师,您好。请问在使用spark写入hbase数据的时候如何处理中文乱码问题。部分中文乱码,部分中文不乱码。读取的原始数据是不乱码的。在用spark做数据分析的时候怎么处理这种问题呢?

逸秋枫叶

赞同来自:

请问,用Spark Sql做交互式查询时,如何做到上百G的数据在最短的时间内计算出来返回呢?数据存放在什么介质中呢?

要回复问题请先登录注册