High Performance Distributed File System and Parallel Data Processing Engine

Sector/Sphere supports distributed data storage, distribution, and processing over large clusters of commodity computers. Sector is a high performance, scalable, and secure distributed file system. Sphere is a high performance parallel data processing engine that can process Sector data files with very simple programming interfaces. Sector/Sphere can be broadly compared to Google's GFS/MapReduce stack, but differes several key design choices and provides better performance. (Presentation: PPT 2.2MB / Poster: PDF 283KB )

Why Sector/Sphere?

High Performance. Sector and Sphere are highly optimized for data intensive applications, even if the data is located across wide area networks (Sector uses UDT to enable high speed data transfer). Sector/Sphere can effectively process very large datasets that traditional tools (e.g., databases) cannot handle. Sector/Sphere is significantly faster than Hadoop (see benchmark).

Reliable. Data is automatically replicated in Sector for high reliability and availability. Meanwhile, both Sector slaves and masters can be removed and inserted at run time. Sector supports multiple active masters for high performance and availability.

Scalable. There is no single IO bottleneck in the system as clients communicate with storage nodes directly for data IO. Data is processed at local nodes to reduce IO cost.

Easy to Use. Sector/Sphere works at user space and only the OpenSSL library is required to install the software. There are only a small number of parameters to configure and performance-wise, Sector/Sphere requires no tuning. Programming with Sector/Sphere is also simple and legacy applications can be called by Sphere as well (no need to rewrite).

Inexpensive. The open source Sector/Sphere software (BSD license) can be installed on clusters of commodity computers.

Feb. 11, 2010: Sector 2.0 Release! We have reached a major milestone with the release of version 2.0. Version 2 reorganized the software code structure so that it is easier to deploy and expand (e.g., different security and metadata management modules can be supported). We also included several new features to improve performance, including support of in-memory objects.

Nov. 21, 2009: Sector featured in SC09 demos and BWC winning entry We have successfully presented and demonstrated Sector/Sphere at SC09. Sector/Sphere was used in one of our BWC demos to showcase the ability of distributing and processing large dataset across wide area networks. We won the SC09 BWC together with Caltech and University of Tokyo. If you are interested in more details about Sector/Sphere, please email gu # lac.uic.edu.

Sep. 24, 2009: Sector is one of the finalists of SC09 Disruptive Technologies "Generally speaking, a disruptive technology is a technological innovation or product that eventually overturns the existing dominant technology or product in the marketplace [SC09]." With Sector, we demonstrate the disruptive capacity to support large wide area data cloud. More to follow...

Home | Contact Us | © 2009 National Center for Data Mining. All rights reserved.