High Performance Distributed File System and Parallel Data Processing Engine

Sector/Sphere supports distributed data storage, distribution, and processing over large clusters of commodity computers, either within a data center or across multiple data centers. Sector is a high performance, scalable, and secure distributed file system. Sphere is a high performance parallel data processing engine that can process Sector data files on the storage nodes with very simple programming interfaces. (Presentation: PDF 608KB / Poster: PDF 283KB )

Why Sector/Sphere?

High Performance. Sector and Sphere are highly optimized for data intensive applications. Sphere supports massive parallel in-storage data processing, supported by Sector's unique application-aware data placement mechanism. In our benchmarks, Sphere runs constantly 2 - 4 times faster than Hadoop MapReduce (see benchmark).

WAN Support. Sector is one of the few file systems that can effectively support multiple data centers across wide area networks. Sector uses UDT to enable high speed data transfer, while its data placement strategy can make Sector effectively work as a content distribution network over WAN.

Software Level Fault Tolerance. Sector does not require hardware RAID for reliability; instead, data is automatically replicated in Sector for high reliability and availability. Meanwhile, both Sector slaves and masters can be removed and inserted at run time. Sector also supports multiple active masters for high performance and availability.

Rule-based Data Management. For each file, users can control its replication factor, replication distance, and replication locations (when necessary). The rules can be changed at run time.

Compatible with Legacy Systems. Many existing applications or job schedulers can continue to work with Sector files with little modification.

Apr. 15, 2011: Sector 2.6 Released! Sector 2.6 is released 6 months after 2,5. Version 2.6 siginificantly improves the code quality and software reliability. In addition, we have added more usability features so it is much easier now to manage and use the system in real world settings. In the meantime, we also identified a list of features that we will gradually introduce in the near future. The current architecture has been updated to easily accept new algorithms for each module.

Mar. 29, 2011: 147TB Scientific Data Delivered using Sector A group of astronomers recently moved 147TB data within 10 days from Knoxville TN to Baltimore, MD using a Sector system installed in Chicago IL (no direct link between Knoxville and Baltimore). The data is uploaded to Sector from the source in parallel, while it is downloaded in parallel in the mean time. Sector provides a file system interface for both sides to conviniently handle the data files, in addition to the awsome performance!

June 10, 2010: Using Sector/Sphere for graph analysis We have successfully used Sector/Sphere to effectively perform a set of graph analysis jobs, including breadth-first search and enumerating cliques. These are the core algorithms for many applications involved with graph data structure, such as social network analysis. We have posted part of the results in the benchmark section.

Home | Contact Us | © 2009 - 2010 The Sector Alliance. All rights reserved.