Created Hadoop Benchmark (markdown)

Chris Lu 2020-07-14 11:01:29 -07:00
parent 5a35188501
commit 1e6f89ef0b

48
Hadoop-Benchmark.md Normal file

@ -0,0 +1,48 @@
## Setup Hadoop Benchmark
Here are my steps. First, checkout hadoop 2.10.0 binary, untar, and cd in to the hadoop directory.
```
wget http://apache.mirrors.hoobly.com/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
tar xvf hadoop-2.10.0.tar.gz
cd hadoop-2.10.0
```
Modify the file `./etc/hadoop/core-site.xml`
```
<configuration>
<property>
<name>fs.seaweedfs.impl</name>
<value>seaweed.hdfs.SeaweedFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>seaweedfs://localhost:8888</value>
</property>
</configuration>
```
Then get the seaweedfs hadoop client jar.
```
cd share/hadoop/common/lib/
wget https://oss.sonatype.org/service/local/repositories/releases/content/com/github/chrislusf/seaweedfs-hadoop2-client/1.3.2/seaweedfs-hadoop2-client-1.3.2.jar
```
Start the TestDFSIO tests:
```
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar TestDFSIO -write -nrFiles 64 -fileSize 16GB -resFile /tmp/TestDFSIOwrite.txt
...
20/07/14 10:41:02 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/07/14 10:41:02 INFO fs.TestDFSIO: Date & time: Tue Jul 14 10:41:02 PDT 2020
20/07/14 10:41:02 INFO fs.TestDFSIO: Number of files: 64
20/07/14 10:41:02 INFO fs.TestDFSIO: Total MBytes processed: 1048576
20/07/14 10:41:02 INFO fs.TestDFSIO: Throughput mb/sec: 381.28
20/07/14 10:41:02 INFO fs.TestDFSIO: Average IO rate mb/sec: 383.42
20/07/14 10:41:02 INFO fs.TestDFSIO: IO rate std deviation: 28.81
20/07/14 10:41:02 INFO fs.TestDFSIO: Test exec time sec: 2756.42
20/07/14 10:41:02 INFO fs.TestDFSIO:
```