Updated Hadoop Benchmark (markdown)

Chris Lu 2020-07-15 11:42:20 -07:00
parent 1e6f89ef0b
commit 3474f6cd06

@ -1,4 +1,4 @@
## Setup Hadoop Benchmark
# Setup Hadoop Benchmark
Here are my steps. First, checkout hadoop 2.10.0 binary, untar, and cd in to the hadoop directory.
```
@ -29,20 +29,136 @@ cd share/hadoop/common/lib/
wget https://oss.sonatype.org/service/local/repositories/releases/content/com/github/chrislusf/seaweedfs-hadoop2-client/1.3.2/seaweedfs-hadoop2-client-1.3.2.jar
```
Start the TestDFSIO tests:
# TestDFSIO Benchmark
The TestDFSIO benchmark is used for measuring I/O (read/write) performance.
Start the TestDFSIO write tests:
```
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar TestDFSIO -write -nrFiles 64 -fileSize 16GB -resFile /tmp/TestDFSIOwrite.txt
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar TestDFSIO -write -nrFiles 16 -fileSize 16GB -resFile /tmp/TestDFSIOwrite.txt
...
20/07/14 18:27:34 INFO mapreduce.Job: map 100% reduce 100%
20/07/14 18:27:34 INFO mapreduce.Job: Job job_local485352420_0001 completed successfully
20/07/14 18:27:34 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=27928633
FILE: Number of bytes written=35966385
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
SEAWEEDFS: Number of bytes read=17111
SEAWEEDFS: Number of bytes written=2611340146618
SEAWEEDFS: Number of read operations=0
SEAWEEDFS: Number of large read operations=0
SEAWEEDFS: Number of write operations=0
Map-Reduce Framework
Map input records=16
Map output records=80
Map output bytes=1276
Map output materialized bytes=1532
Input split bytes=2054
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=1532
Reduce input records=80
Reduce output records=5
Spilled Records=160
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=151632
Total committed heap usage (bytes)=39777730560
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1798
File Output Format Counters
Bytes Written=84
20/07/14 18:27:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/07/14 18:27:34 INFO fs.TestDFSIO: Date & time: Tue Jul 14 18:27:34 PDT 2020
20/07/14 18:27:34 INFO fs.TestDFSIO: Number of files: 16
20/07/14 18:27:34 INFO fs.TestDFSIO: Total MBytes processed: 262144
20/07/14 18:27:34 INFO fs.TestDFSIO: Throughput mb/sec: 310.47
20/07/14 18:27:34 INFO fs.TestDFSIO: Average IO rate mb/sec: 315.63
20/07/14 18:27:34 INFO fs.TestDFSIO: IO rate std deviation: 43.43
20/07/14 18:27:34 INFO fs.TestDFSIO: Test exec time sec: 847.32
20/07/14 18:27:34 INFO fs.TestDFSIO:
```
Start the TestDFSIO read tests:
```
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar TestDFSIO -read -nrFiles 64 -fileSize 16GB -resFile /tmp/TestDFSIOwrite.txt
...
20/07/14 10:41:02 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/07/14 10:41:02 INFO fs.TestDFSIO: Date & time: Tue Jul 14 10:41:02 PDT 2020
20/07/14 10:41:02 INFO fs.TestDFSIO: Number of files: 64
20/07/14 10:41:02 INFO fs.TestDFSIO: Total MBytes processed: 1048576
20/07/14 10:41:02 INFO fs.TestDFSIO: Throughput mb/sec: 381.28
20/07/14 10:41:02 INFO fs.TestDFSIO: Average IO rate mb/sec: 383.42
20/07/14 10:41:02 INFO fs.TestDFSIO: IO rate std deviation: 28.81
20/07/14 10:41:02 INFO fs.TestDFSIO: Test exec time sec: 2756.42
20/07/14 10:41:02 INFO fs.TestDFSIO:
20/07/14 21:48:00 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=27928585
FILE: Number of bytes written=36004955
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
SEAWEEDFS: Number of bytes read=2611340133079
SEAWEEDFS: Number of bytes written=30649
SEAWEEDFS: Number of read operations=0
SEAWEEDFS: Number of large read operations=0
SEAWEEDFS: Number of write operations=0
Map-Reduce Framework
Map input records=16
Map output records=80
Map output bytes=1252
Map output materialized bytes=1508
Input split bytes=2054
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=1508
Reduce input records=80
Reduce output records=5
Spilled Records=160
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=145687
Total committed heap usage (bytes)=38852886528
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1798
File Output Format Counters
Bytes Written=83
20/07/14 21:48:00 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
20/07/14 21:48:00 INFO fs.TestDFSIO: Date & time: Tue Jul 14 21:48:00 PDT 2020
20/07/14 21:48:00 INFO fs.TestDFSIO: Number of files: 16
20/07/14 21:48:00 INFO fs.TestDFSIO: Total MBytes processed: 262144
20/07/14 21:48:00 INFO fs.TestDFSIO: Throughput mb/sec: 22.14
20/07/14 21:48:00 INFO fs.TestDFSIO: Average IO rate mb/sec: 22.91
20/07/14 21:48:00 INFO fs.TestDFSIO: IO rate std deviation: 3.79
20/07/14 21:48:00 INFO fs.TestDFSIO: Test exec time sec: 11871.4
20/07/14 21:48:00 INFO fs.TestDFSIO:
```
# MRbench Benchmark
```
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar mrbench -inputLines 10000000 -inputType random -maps 10 -reduces 5
...
```