Hadoop2.6.0|Zookeeper3.4.6|Hbase0.98.13集群搭建

今天小梦从阿里云ECS裸机开始给大家一步一步演示一下Hadoop2.6.0|Zookeeper3.4.6|Hbase;0.98.13集群搭建。

我的集群情况如下:

120.24.83.53 10.169.132.145 kks1 namenode hmatser zookeeper ResourceManager

120.24.50.76 10.45.162.55 kks2 datanode regionserver zookeeper NodeManager

120.24.50.27 10.45.162.0 kks3 datanode regionserver zookeeper NodeManager SecondaryNameNode

120.24.51.109 10.45.165.59 kks4 datanode regionserver NodeManager
1:修改hostname

vim /etc/sysconfig/network
修改:HOSTNAME=kks1
sudo hostname kks1

重启即可。
2:挂载数据盘

fdisk -l
fdisk /dev/xvdb
依次输入“n”,“p” “1”,两次回车,“wq"
fdisk -l

mkfs.ext3 /dev/xvdb1
echo '/dev/xvdb1 /mnt ext3 defaults 0 0' >> /etc/fstab
mount -a
df -h


3:修改/etc/hosts

vim /etc/hosts
10.169.132.145 kks1
10.45.162.55 kks2
10.45.162.0 kks3
10.45.165.59 kks4
4:SSH免密码登录:

A为本地主机(即用于控制其他主机的机器) ;
B为远程主机(即被控制的机器Server), 假如ip为172.24.253.2 ;
A和B的系统都是Linux

在A上的命令:
# ssh-keygen -t rsa (连续三次回车,即在本地生成了公钥和私钥,不设置密码)
# ssh root@172.24.253.2 "mkdir .ssh;chmod 0700 .ssh" (需要输入密码, 注:必须将.ssh的权限设为700)
# scp ~/.ssh/id_rsa.pub root@172.24.253.2:.ssh/id_rsa.pub (需要输入密码)

ssh root@kks1 "mkdir .ssh;chmod 0700 .ssh"
scp ~/.ssh/id_rsa.pub root@kks1:.ssh/id_rsa.pub
在B上的命令:
# touch /root/.ssh/authorized_keys (如果已经存在这个文件, 跳过这条)
# chmod 600 ~/.ssh/authorized_keys (# 注意: 必须将~/.ssh/authorized_keys的权限改为600, 该文件用于保存ssh客户端生成的公钥,可以修改服务器的ssh服务端配置文件/etc/ssh/sshd_config来指定其他文件名)
# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys (将id_rsa.pub的内容追加到 authorized_keys 中, 注意不要用 > ,否则会清空原有的内容,使其他人无法使用原有的密钥登录)
5:安装JDK1.7:

yum install java-1.7.0-openjdk-devel.x86_64 -y
6:安装maven:(只在kks1节点上安装即可)

wget http://ftp.tsukuba.wide.ad.jp/ ... ar.gz
maven下载到home目录下,则运行:
echo export PATH='$PATH':/home/maven/bin >> /etc/profile
配置好环境变量后,运行:source /etc/profile
运行:mvn --version,如系统打印maven版本信息,则配置成功
7:下载Hadoop:

wget http://apache.cs.utah.edu/hado ... ar.gz
8:修改hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HADOOP_PID_DIR=/var/hadoop/pids
注1:为了避免以后 sbin/stop-dfs.sh等命令失效,强烈建议设置HADOOP_PID_DIR

注2:以后所有配置的目录尽量在启动集群前都建立好。

9:core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://kks1:8020</value>
</property>
</configuration>
10:hdfs-site.xml

<configuration>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hdfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hdfs/data</value>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>kks3:9001</value>
</property>

</configuration>
11:yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>kks1</value>
</property>

<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>

<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>

<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>

<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>

<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>

<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/home/hadoop/etc/hadoop/fairscheduler.xml</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/yarn/local</value>
</property>

<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>

<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
12:mapred-site.xml

<configuration>

<!-- MR YARN Application properties -->

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>kks2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>kks2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>

</configuration>
13:slaves

kks2
kks3
kks4
14:fairscheduler.xml

<?xml version="1.0"?>
<allocations>

<queue name="infrastructure">
<minResources>102400 mb, 50 vcores </minResources>
<maxResources>153600 mb, 100 vcores </maxResources>
<maxRunningApps>200</maxRunningApps>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
<aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
</queue>

<queue name="tool">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>

<queue name="sentiment">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>

</allocations>
15:yarn-env.sh

export YARN_PID_DIR=/var/hadoop/pids

16:复制到其余主机

scp -r hadoop root@kks2:/home
scp -r hadoop root@kks3:/home
scp -r hadoop root@kks4:/home
17:启动集群

注意:所有操作均在Hadoop部署目录下进行。

在kks1上,对其进行格式化,并启动:
bin/hdfs namenode -format

sbin/hadoop-daemon.sh start namenode

在kks1上,启动所有datanode
sbin/hadoop-daemons.sh start datanode

启动YARN:
sbin/start-yarn.sh

至此,Hadoop 搭建完毕。 可用jps命令查看jvm进程。

以后关闭,停闭集群可以使用以下命令:

sbin/stop-dfs.sh

sbin/start-dfs.sh

sbin/start-yarn.sh

sbin/stop-yarn.sh
18:zookeeper安装:

wget http://mirrors.cnnic.cn/apache ... ar.gz

tar -zvxf zookeeper-..jar
cd zookeeper 目录
mkdir data
mkdir datalog
19:在data目录下创建myid文件

在kks1,kks2,kks3上依次为1,2,3。

20:在conf目录下建立zoo.cfg文件

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/zookeeper/data
dataLogDir=/home/zookeeper/datalog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/do ... nance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=kks1:2888:3888
server.2=kks2:2888:3888
server.3=kks3:2888:3888
21:在每个节点上启动zookeeper

bin/zkServer.sh stop
bin/zkServer.sh start
bin/zkServer.sh status 查看节点状态
至此zookeeper安装成功。

22:Hbase安装

wget http://mirrors.koehn.com/apach ... ar.gz
23:hbase-env.sh

vim hbase-env.sh
export HBASE_MANAGES_ZK=false
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HBASE_PID_DIR=/var/hadoop/pids
24:hbase-site.xml

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://kks1:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>kks1,kks2,kks3</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hbase/data</value>
</property>
</configuration>
25:regionservers

kks2
kks3
kks4
26:复制到其他节点

scp -r hbase root@kks4:/home
27:启动集群

bin/start-hbase.sh

bin/stop-hbase.sh
至此hbase集群搭建完毕。

但是hbase的编程环境还远没有完成。请看下文:

[button href=http://www.bcmeng.com/hbasemr color=red]Hadoop2.6.0|Hbase0.98.13的Mapreduce编程环境搭建[/button]

0 个评论

要回复文章请先登录注册