Shell写Wordcount程序

Mapper:
#! /bin/sh
while read LINE;do
   for word in $LINE
   do
        echo "$word 1"
   done
done


Reducer:

#! /bin/sh
count=0
started=0
word="" 
while read LINE;do
  newword=`echo $LINE | cut -d ' ' -f 1`
  if [ "x" == x"$newword" ];then
        continue
  fi    
  if [ "$word" != "$newword" ];then
        [ $started -ne 0 ] && echo -e "$word\t$count"
        word=$newword 
        count=1
        started=1
  else 
        count=$(( $count + 1 ))
  fi
done


测试:cat test | sh mapper.sh | sort | sh reducer.sh

但是用haoop jar 执行,报一下错误:
13/12/08 00:24:07 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201312072336_0008_m_000000
13/12/08 00:24:07 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

0 个评论

要回复文章请先登录注册