mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2024-01-19 02:48:24 +00:00
Updated Independent Benchmarks (markdown)
parent
c624ce79cd
commit
51b424d209
|
@ -20,7 +20,7 @@ The basic configuration information of cluster is as follows:
|
|||
+ Total disk capacity: 799TB
|
||||
+ Replication policy: 010
|
||||
|
||||
Here are the details and results of our test. At the beginning of the test, we put our data to both HDFS and HCFS. The amount of the data is 100 million records, and stroed in 200 parquet files. The size of each parquet file is about 89 MB. We ran spark on yarn with 20 executors. In spark, we got two DataFrames by reading parquet from HDFS and HCFS separately, then executed `count`, `group by` and `join` by 100 times , and `write` by 10 times, on each DataFrame.
|
||||
Here are the details and results of our test. At the beginning of the test, we put our data to both HDFS and HCFS. The amount of the data is 100 million records, and stored in 200 parquet files. The size of each parquet file is about 89 MB. We ran spark on yarn with 20 executors. In spark, we got two DataFrames by reading parquet from HDFS and HCFS separately, then executed `count`, `group by` and `join` by 100 times , and `write` by 10 times, on each DataFrame.
|
||||
|
||||
As for `count`, HCFS's advantage is obvious. The average time of the DataFrame from HDFS is 4.05 seconds, while HCFS is only 0.659. Following is the result:
|
||||
|
||||
|
|
Loading…
Reference in a new issue