Summary: | Random Forest has become a standard data analysis tool in computational biology.
However, extensions to existing implementations are often necessary to handle the
complexity of biological datasets and their associated research questions. The
growing size of these datasets requires high performance implementations. We describe
CloudForest, a Random Forest package written in Go, which is particularly well suited
for large, heterogeneous, genetic and biomedical datasets. CloudForest includes
several extensions, such as dealing with unbalanced classes and missing values. Its
flexible design enables users to easily implement additional extensions. CloudForest
achieves fast running times by effective use of the CPU cache, optimizing for
different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest.
|