Updated Hadoop Benchmark (markdown)

Chris Lu 2020-07-15 11:42:20 -07:00
parent 1e6f89ef0b
commit 3474f6cd06

@ -1,4 +1,4 @@
## Setup Hadoop Benchmark # Setup Hadoop Benchmark
Here are my steps. First, checkout hadoop 2.10.0 binary, untar, and cd in to the hadoop directory. Here are my steps. First, checkout hadoop 2.10.0 binary, untar, and cd in to the hadoop directory.
``` ```
@ -29,20 +29,136 @@ cd share/hadoop/common/lib/
wget https://oss.sonatype.org/service/local/repositories/releases/content/com/github/chrislusf/seaweedfs-hadoop2-client/1.3.2/seaweedfs-hadoop2-client-1.3.2.jar wget https://oss.sonatype.org/service/local/repositories/releases/content/com/github/chrislusf/seaweedfs-hadoop2-client/1.3.2/seaweedfs-hadoop2-client-1.3.2.jar
``` ```
Start the TestDFSIO tests: # TestDFSIO Benchmark
The TestDFSIO benchmark is used for measuring I/O (read/write) performance.
Start the TestDFSIO write tests:
``` ```
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar TestDFSIO -write -nrFiles 64 -fileSize 16GB -resFile /tmp/TestDFSIOwrite.txt bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar TestDFSIO -write -nrFiles 16 -fileSize 16GB -resFile /tmp/TestDFSIOwrite.txt
...
20/07/14 18:27:34 INFO mapreduce.Job: map 100% reduce 100%
20/07/14 18:27:34 INFO mapreduce.Job: Job job_local485352420_0001 completed successfully
20/07/14 18:27:34 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=27928633
FILE: Number of bytes written=35966385
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
SEAWEEDFS: Number of bytes read=17111
SEAWEEDFS: Number of bytes written=2611340146618
SEAWEEDFS: Number of read operations=0
SEAWEEDFS: Number of large read operations=0
SEAWEEDFS: Number of write operations=0
Map-Reduce Framework
Map input records=16
Map output records=80
Map output bytes=1276
Map output materialized bytes=1532
Input split bytes=2054
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=1532
Reduce input records=80
Reduce output records=5
Spilled Records=160
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=151632
Total committed heap usage (bytes)=39777730560
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1798
File Output Format Counters
Bytes Written=84
20/07/14 18:27:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/07/14 18:27:34 INFO fs.TestDFSIO: Date & time: Tue Jul 14 18:27:34 PDT 2020
20/07/14 18:27:34 INFO fs.TestDFSIO: Number of files: 16
20/07/14 18:27:34 INFO fs.TestDFSIO: Total MBytes processed: 262144
20/07/14 18:27:34 INFO fs.TestDFSIO: Throughput mb/sec: 310.47
20/07/14 18:27:34 INFO fs.TestDFSIO: Average IO rate mb/sec: 315.63
20/07/14 18:27:34 INFO fs.TestDFSIO: IO rate std deviation: 43.43
20/07/14 18:27:34 INFO fs.TestDFSIO: Test exec time sec: 847.32
20/07/14 18:27:34 INFO fs.TestDFSIO:
```
Start the TestDFSIO read tests:
```
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar TestDFSIO -read -nrFiles 64 -fileSize 16GB -resFile /tmp/TestDFSIOwrite.txt
... ...
20/07/14 10:41:02 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 20/07/14 21:48:00 INFO mapreduce.Job: Counters: 35
20/07/14 10:41:02 INFO fs.TestDFSIO: Date & time: Tue Jul 14 10:41:02 PDT 2020 File System Counters
20/07/14 10:41:02 INFO fs.TestDFSIO: Number of files: 64 FILE: Number of bytes read=27928585
20/07/14 10:41:02 INFO fs.TestDFSIO: Total MBytes processed: 1048576 FILE: Number of bytes written=36004955
20/07/14 10:41:02 INFO fs.TestDFSIO: Throughput mb/sec: 381.28 FILE: Number of read operations=0
20/07/14 10:41:02 INFO fs.TestDFSIO: Average IO rate mb/sec: 383.42 FILE: Number of large read operations=0
20/07/14 10:41:02 INFO fs.TestDFSIO: IO rate std deviation: 28.81 FILE: Number of write operations=0
20/07/14 10:41:02 INFO fs.TestDFSIO: Test exec time sec: 2756.42 SEAWEEDFS: Number of bytes read=2611340133079
20/07/14 10:41:02 INFO fs.TestDFSIO: SEAWEEDFS: Number of bytes written=30649
SEAWEEDFS: Number of read operations=0
SEAWEEDFS: Number of large read operations=0
SEAWEEDFS: Number of write operations=0
Map-Reduce Framework
Map input records=16
Map output records=80
Map output bytes=1252
Map output materialized bytes=1508
Input split bytes=2054
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=1508
Reduce input records=80
Reduce output records=5
Spilled Records=160
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=145687
Total committed heap usage (bytes)=38852886528
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1798
File Output Format Counters
Bytes Written=83
20/07/14 21:48:00 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
20/07/14 21:48:00 INFO fs.TestDFSIO: Date & time: Tue Jul 14 21:48:00 PDT 2020
20/07/14 21:48:00 INFO fs.TestDFSIO: Number of files: 16
20/07/14 21:48:00 INFO fs.TestDFSIO: Total MBytes processed: 262144
20/07/14 21:48:00 INFO fs.TestDFSIO: Throughput mb/sec: 22.14
20/07/14 21:48:00 INFO fs.TestDFSIO: Average IO rate mb/sec: 22.91
20/07/14 21:48:00 INFO fs.TestDFSIO: IO rate std deviation: 3.79
20/07/14 21:48:00 INFO fs.TestDFSIO: Test exec time sec: 11871.4
20/07/14 21:48:00 INFO fs.TestDFSIO:
```
# MRbench Benchmark
```
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.0-tests.jar mrbench -inputLines 10000000 -inputType random -maps 10 -reduces 5
...
``` ```