카테고리 없음

Hadoop single node cluster (1)

우와신난다 2008. 8. 19. 18:25

1. Prerequisites

Java 1.5.0 higher~

Hadoop requirs a working Java 1.5.x installation. However, using Java 1.6.x is recommended for running Hadoop.

I wrote "Java installation" at http://alloe.tistory.com/entry/How-to-Install-java

2. Adding a dedicatedHadoop system user

(now root privilege)
#> groupadd hadoop    <enter>
#> adduser --ingroup hadoop hadoop    <enter>
#> passwd hadoop   <enter>
Input new password...
#> su - hadoop    <enter>
(user level privilege)

This will add the user hadoop and the group hadoop to your local machine.

3. Configuring SSH
Hadoop requires SSH access to manage its nodes.

I wrote "SSH auto login" at http://alloe.tistory.com/entry/ssh-auto-login

4. Hadoop installation
You have to download hadoop from the Apache download mirrors and extract the contents of the hdoop pckage to a location of your choice. Piked /home/hadoop. Make sure to change the owner of all the files to the hadoop user and group.

$> cd /home/hadoop
$> tar xfzv hadoop-0.17.1.tar.gz
$> mv hadoop-0.17.1 hadoop
$> chown -R hadoop:hadoop hadoop

5. configuration
Goal is a single-node setup o hadoop.

- Modify hadoop-env.sh
- Goto last line.

export HADOOP_HOME=/home/hadoop/hadoop
export JAVA_HOME=/usr/local/java
export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
<save & exit>

6. Single node hadoop-site.xml setting

<description> is option.


  <description>A base for other temporary directories.</description>
  <description>The name of the default file system. A URI whose
  scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.

7. Formatting the name node
To format the filesystem, run the command
$> $<HADOOP_INSTALL>/hadoop/bin/hadoop namenode -format    <enter>

if success then you will show under message.

 07/09/21 12:00:25 INFO dfs.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntu/
STARTUP_MSG:   args = [-format]
07/09/21 12:00:25 INFO dfs.Storage: Storage directory [...] has been successfully formatted.
07/09/21 12:00:25 INFO dfs.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/

8. Starting single-node cluster
Run the command.

$> /bin/start-all.sh

if success then you will show under message

starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-<hostname>.out
localhost: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-<hostname>.out
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-<hostname>.out
starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-<hostname>.out
localhost: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-<hostname>.out

Run the jps.

19811 TaskTracker
19674 SecondaryNameNode
19735 JobTracker
19497 NameNode
20879 TaskTracker$Child
21810 Jps