SDSS Data Distribution using Sector and UDT

We have used Sector to distribute Sloan Digital Sky Survey (SDSS) data to astronomers around the world. SDSS produces a data releases each year. Currently SDSS DR7 has about a 4TB catalog and more than 10TB of images. We installed Sector on the Teraflow Network as a content distribution network. The download speed varies between 5Mb/s (India) to 900Mb/s (Russia). For more information, please visit http://sdss.ncdm.uic.edu.

In addition to SDSS, we have set up a Sector Public Cloud so that you can use this wide area file system as well. Click here to test drive now!

The Terasort Benchmark

The table below lists the performance (total processing time in seconds) of the Terasort benchmark of both Sphere and Hadoop. (Terasort benchmark: suppose there are N nodes in the system, the benchmark generates a 10GB file on each node and sorts the total N*10GB data. Data generation time is excluded.) Note that it is normal to see a longer processing time for more nodes because the total amount of data also increases proportionally.

The performance value listed in this page was achieved using the Open Cloud Testbed. Currently the testbed consists of 4 racks. Each rack has 32 nodes, including 1 NFS server, 1 head node, and 30 compute/slave nodes. The head node is a Dell 1950, dual dual-core Xeon 3.0GHz, 16GB RAM. The compute nodes are Dell 1435s, single dual core AMD Opteron 2.0GHz, 4GB RAM, and 1TB single disk. The 4 racks are located in JHU (Baltimore), StarLight (Chicago), UIC (Chicago), and Calit2(San Diego). The inter-rack bandwidth is 10GE, supported by CiscoWave deployed over National Lambda Rail.

 
Sphere
Hadoop (3 replicas)
Hadoop (1 replica)
UIC
1265 2889 2252
UIC + StarLight
1361 2896 2617
UIC + StarLight + Calit2
1430 4341 3069
UIC + StarLight + Calit2 + JHU
1526 6675 3702

The benchmark uses the testfs/testdc examples of Sphere and randomwriter/sort examples of Hadoop. Hadoop parameters were tuned to reach good results.

Updated on Sep. 22, 2009: We have benchmarked the most recent versions of Sector/Sphere (1.24a) and Hadoop (0.20.1) on a new set of servers. Each server node costs $2,200 and consits of a single Intel Xeon E5410 2.4GHz CPU, 16GB RAM, 4*1TB RAID0 disk, and 1Gb/s NIC. The 120 nodes are hosted on 4 racks within the same data center and the inter-rack bandwidth is 20Gb/s.

The table below lists the performance of sorting 1TB data using Sector/Sphere version 1.24a and Hadoop 0.20.1. Related Hadoop parameters have been tuned for better performance (e.g., big block size), while Sector/Sphere does not require tuning. In addition, to achieve the highest performance, replication is disabled in both systems.

Number of Racks
Sphere
Hadoop
1
28m 25s 85m 49s
2
15m 20s 37m 0s
3
10m 19s 25m 14s
4
7m 56s 17m 45s

The MalStone Benchmark

MalStone is a set of benchmarks developed by the Open Cloud Consortium.

The MalStone A-10 and B-10 benchmarks each consist of 10 billion records and the timestamps are all within a one year period. The MalStone A-10 benchmark computes a ratio for each site w as follows: for each site w, aggregate all entities that visited the site at any time, and compute the percent of visits for which the entity became compromised at any future time subsequent to the visit.

MalStone B-10 is similar except that the ratio is computed each week d, and computes: for each site w, and for all entities that visited the site at week d or earlier, the percent of visits for which the entity became compromised at any time between the visit and the end of the week d.

The tables below list the results of three different implementations: 1) Hadoop; 2) Hadoop streaming with Python code of Malstone; 3) Sector/Sphere. The results are obtained from a cluster of 20 nodes from the Open Cloud Testbed.

 
MalStone A
MalStone B
Hadoop
454m 13s 840m 50s
Hadoop Streaming/Python
87m 29s 142m 32s
Sector/Sphere
33m 40s 43m 44s

The MalStone benchmarks are open source and are available from http://code.google.com/p/malgen.

Home | Contact Us | © 2009 National Center for Data Mining. All rights reserved.