关于HDFS的Present Capacity在执行spark任务后不断减少的问题

配置了hadoop集群和spark集群,都是一个主节点,2个从节点,使用--master yarn   --deploy-mode cluster   --driver-memory 1G   --executor-memory 1G   --executor-cores 1  配置提交jar包运行spark任务。
但是运行几个任务后,发现HDFS的Present Capacity不断减少!如下所示,刚开始有8G多,后来运行了10个任务后减少到只有2.73G ! 到最后会因为HDFS空间不够而不行运行spark的任务,报错!请问该修改哪里的配置才能解决这个问题?
Configured Capacity: 15003025408 (13.97 GB)
Present Capacity: 2934223006 (2.73 GB)
DFS Remaining: 1289218117 (1.20 GB)
DFS Used: 1645004889 (1.53 GB)
DFS Used%: 56.06%
Under replicated blocks: 29
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.100.223:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 7501512704 (6.99 GB)
DFS Used: 1159952778 (1.08 GB)
Non DFS Used: 6197431932 (5.77 GB)
DFS Remaining: 144127994 (137.45 MB)
DFS Used%: 15.46%
DFS Remaining%: 1.92%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 11
Last contact: Tue Jan 16 14:00:10 CST 2018


Name: 192.168.100.225:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 7501512704 (6.99 GB)
DFS Used: 485052111 (462.58 MB)
Non DFS Used: 5871370470 (5.47 GB)
DFS Remaining: 1145090123 (1.07 GB)
DFS Used%: 6.47%
DFS Remaining%: 15.26%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 7
Last contact: Tue Jan 16 14:00:11 CST 2018

jane3von

赞同来自: fish

在配置文件yarn-site.xml中添加以下配置: <property>        <name>yarn.nodemanager.local-dirs</name>        <value>file:/var/log/hadoop/tmp/nmlocaldirs/</value> </property> <property>        <name>yarn.nodemanager.log-dirs</name>        <value>file:/var/log/hadoop/tmp/nmlogdirs/</value> </property> <property>        <name>yarn.nodemanager.localizer.cache.target-size-mb</name>        <value>512</value> </property> <property>        <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>        <value>1800000</value> </property> hadoop运行任务是,会产生临时数据等,在目录 file:/var/log/hadoop/tmp/nmlogdirs/下面存在,如果默认不设置,hadoop会在默认路径下的默认文件夹下存在,当数据量达到10G时,才会清除超过10GB的数据,由于之前hadoop配置的HDFS空间很小,因此,跑了一阵任务后会报HDFS空间不足,任务被kill的问题。 另外,由于我使用的是spark on yarn的模式运行任务,因此,在spark的conf文件夹下的spark-env.sh文件中添加 export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=86400"设置,spark任务运行完成后会清除上传到 /user/hadoop/.sparkStaging目录下的运行任务需要的临时jar包数据;

jane3von

赞同来自:

重启dfs后Present Capacity又恢复到原来满的状态了!可是hadoop集群不可能每天重启啊? Configured Capacity: 15003025408 (13.97 GB) Present Capacity: 8663085056 (8.07 GB) DFS Remaining: 7183654912 (6.69 GB) DFS Used: 1479430144 (1.38 GB) DFS Used%: 17.08% Under replicated blocks: 29 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.100.223:50010 (slave1) Hostname: slave1 Decommission Status : Normal Configured Capacity: 7501512704 (6.99 GB) DFS Used: 999641088 (953.33 MB) Non DFS Used: 3170349056 (2.95 GB) DFS Remaining: 3331522560 (3.10 GB) DFS Used%: 13.33% DFS Remaining%: 44.41% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Jan 16 14:31:29 CST 2018 Name: 192.168.100.225:50010 (slave2) Hostname: slave2 Decommission Status : Normal Configured Capacity: 7501512704 (6.99 GB) DFS Used: 479789056 (457.56 MB) Non DFS Used: 3169591296 (2.95 GB) DFS Remaining: 3852132352 (3.59 GB) DFS Used%: 6.40% DFS Remaining%: 51.35% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Tue Jan 16 14:31:31 CST 2018

fish - Hadooper

赞同来自:

使用hdfs du检查下hdfs被什么目录占用了大量空间。   Spark应用中是不是将RDD persist到了HDFS?

要回复问题请先登录注册