ubuntu配置hadoop

Ubuntu配置Hadoop步骤
1. 确定安装并打开ssh:
安装:
sudo apt-get openssh-server
查看是否运行:
ps -e |grep ssh
如果看到sshd那说明ssh-server已经启动了。
如果没有则可以这样启动:
sudo /etc/init.d/ssh start
ssh-server配置文件位于/ etc/ssh/sshd_config,在这里可以定义SSH的服务端口,默认端口是22,你可以自己定 义成其他端口号,如222。
然后重启SSH服务:
sudo /etc/init.d/ssh stop
sudo /etc/init.d/ssh start
2. 修改/etc/hosts:
若是本机,IP地址设为127.0.0.1即可。IP地址后的名字均为计算机名。(可用hostname命令查看本机的计算机名)
示例如下:
192.168.18.62 DC-001
192.168.18.63 DC-002
127.0.0.1 czxttkl
3.设置ssh,在命令行中输入:
ssh-keygen -t rsa
会在~/.ssh/ 中生成两个文件:id_ rsa.pub和id_ rsa
使用命令:
cat ~/.ssh/id_ rsa.pub >> ~/.ssh/authorized_keys
然后将authorized_keys复制到所有机器的~/.ssh目录下,主机就可以在所有机器中ssh免密码登陆了。
示例authorized_keys文件:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDA20Rn8qNN76R5eyeAjC6eRGjzBs095/VkPMYNqW+JqyfNI3hQ97j+ivyapzCG7HnVjGzCBDVJzfAx2twteHjwWfUzysdI38ee0aRrfesEpe/t11HeeyO3GUj7iDJUzKA4j55fIJwrMQOE6mBzxt2e331Zh7awNDXZSE7xGzsk0wWW6yBQUzoW0g2tWyL3Ipp2ZnNawvUU0rNrw7m4cqwuvLXrSEkOZ3IKAGFwXgT+PUZWSRE81YC8IRINDgQFu8ktg6/n5KHl9bVbWNFe7tIedUAErXN0xt5H/yyAyAWBHtxCqRnnsd4G8NM4GmxvPmuQ7nmYvNM3BleO1/2TICez czxttkl@czxttkl
@前为登录时的用户名,@后为计算机名
验证ssh设置成功:
在命令行中输入:ssh czxttkl, 首次登录会出现如下提示,忽略提示,直接yes:
The authenticity of host 'czxttkl (127.0.0.1)' can't be established.
ECDSA key fingerprint is dd:55:3f:79:81:0b:39:89:7b:30:c7:60:99:a7:ef:4e.
Are you sure you want to continue connecting (yes/no)? yes
登录成功后的提示:
Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-17-generic x86_64)
* Documentation:  https://help.ubuntu.com/
New release '13.04' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Tue May  7 09:31:58 2013 from ubuntu.ubuntu-domain
4. 确保防火墙关闭、jdk1.6已经安装
5. 解压hadoop安装包至某一位置。将命令行定位至HADOOP解压位置/conf文件夹内,进行文件配置
core-site.xml添加fs.default.name属性:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hadoop-env.sh添加JAVA_HOME路径:
# The java implementation to use.  Required.
export JAVA_HOME=/usr/local/java/jdk1.6.0_38
hdfs-site.xml添加以下属性:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2047</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/czxttkl/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/czxttkl/hadoop/hdfs/data</value>
<final>true</final>
</property>
</configuration>
mapred-site.xml添加mapred.job.tracker属性:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
6. 添加 HADOOP解压位置/bin 文件夹至PATH路径。
7.格式化namenode:
hadoop namenode -format
8.运行HADOOP解压位置/bin/start-all.sh 开启Hadoop, 更多使用方法:
* start-all.sh 启动所有的Hadoop守护。包括namenode, datanode, jobtracker, tasktrack
* stop-all.sh 停止所有的Hadoop。
* start-mapred.sh 启动Map/Reduce守护。包括Jobtracker和Tasktrack。
* stop-mapred.sh 停止Map/Reduce守护
* start-dfs.sh 启动Hadoop DFS守护.Namenode和Datanode
* stop-dfs.sh 停止DFS守护
9. 简单测试
输入命令:jps,查看目前进程,应如下所示
4350 Jps
3965 SecondaryNameNode
4055 JobTracker
4295 TaskTracker
3501 NameNode
在HDFS建立目录:
hadoop dfs -mkdir testdir
复制文件到HDFS:
hadoop dfs -put /SOME/WHERE/large.zip testfile.zip
查看HDFS的现有文件:
hadoop dfs –ls /

0 个评论

要回复文章请先登录注册