Updated Independent Benchmarks (markdown)

2024-01-19 02:48:24 +00:00 · 2020-11-18 10:48:03 -08:00 · 2020-11-18 10:48:03 -08:00 · 51b424d209
parent c624ce79cd
commit 51b424d209
1 changed files with 1 additions and 1 deletions
--- a/Independent-Benchmarks.md
+++ b/Independent-Benchmarks.md
@ -20,7 +20,7 @@ The basic configuration information of cluster is as follows:
  + Total disk capacity: 799TB
  + Replication policy: 010

-Here are the details and results of our test. At the beginning of the test, we put our data to  both HDFS and HCFS.  The amount of the data is 100 million records, and stroed in 200 parquet files. The size of each parquet file is about 89 MB. We ran spark on yarn with 20 executors. In spark, we got two DataFrames by reading parquet from HDFS and HCFS separately, then executed `count`, `group by` and `join`  by 100 times , and `write` by 10 times, on each DataFrame.
+Here are the details and results of our test. At the beginning of the test, we put our data to  both HDFS and HCFS.  The amount of the data is 100 million records, and stored in 200 parquet files. The size of each parquet file is about 89 MB. We ran spark on yarn with 20 executors. In spark, we got two DataFrames by reading parquet from HDFS and HCFS separately, then executed `count`, `group by` and `join`  by 100 times , and `write` by 10 times, on each DataFrame.

 As for `count`,  HCFS's advantage  is obvious. The average time of the DataFrame from HDFS is 4.05 seconds, while HCFS is only 0.659. Following is the result: