diff --git a/Super-Large-Directories.md b/Super-Large-Directories.md index 97bfe7f..b7a2775 100644 --- a/Super-Large-Directories.md +++ b/Super-Large-Directories.md @@ -11,7 +11,7 @@ For example, for Cassandra filer store, each entry has this schema: PRIMARY KEY (directory, name) ) WITH CLUSTERING ORDER BY (name ASC); ``` -The directory is the partitioning key. So the entries with the same directory is partitioned to the same data node. This is fine for most cases. However, if there are billions of child entries for one directory, the data node would not perform well. +The directory is the partitioning key. So the entries with the same directory is partitioned to the same data node. This is fine for most cases. However, if there are billions of direct child entries under one directory, the data node would not perform well. This is actually a common case when user name, id, or UUID are used as child entries. Usually a separate index is built to translate names to file id, and use file id to access data directory, giving up all the convenience from a file system.