"Flash" has many meanings but
I use it in two specific ways:
(1) Noun: A brief news dispatch or transmission and as an
(2) Adverb : Happening suddenly or very quickly.
Thus, as applied to flash dissemination, it means dissemination of a relatively small amount of data very quickly. What is small is relative, but in my research, I consider anything less than 25MB as small. This is in comparison to 'large' files such as movies and OS-distros that are hundreds of MBs of GBs in size.
Why is Flash Dissemination (FD) useful and where can it be applied?
When you think about it (and you can verify this), close to 90% of files (even excluding system files) sitting on your computer are small files (in terms of numbers). Thus, most information that people generate and store is in small files. Even cell phone generated video clips (which are becoming increasingly prevelant) are small files (less than 10 MB); so are YouTube clips. And when a piece of information becomes suddenly popular, you have to handle its distribution scalably. The premise of my thesis is that the optimizations and the systems you need for scalably disseminating small files require a different approach and different set of optimizations as compared to large file content distribution (aka BitTorrent, Joost, etc.).
In my PhD research, I've demonstarted two uses for FD. In one appplication, RapID (PDF file warning) distributes Earthquake ShakeMaps to emergency response organizations extremely fast. In experiments run on a Internet Emulator , we show that RapID can disseminate shakemaps (of 200KBs) almost twice as fast as any other system currently available.
In another application, Flashback ( research paper ), we show another reason why distributing small content fast is a unique and interesting problem. Again, Flashback performs much better than simply adapting a large content distribution protocol (e.g. BitTorrent) for the purpose.
How do I believe your numbers (aka, show me the money)?
To make our numbers as convincing as possible we built an Internet Testbed which we call COIE (for Crisis on Infinite Earths or Cluster Of Ibm E-servers) using Modelnet . The testbed was built on a cluster of 15 IBM e-server nodes running Debian and SystemImager . This testbed is capable of real-time emulation of upto 200 virtual internet hosts and one can even individually set each virtual node's bandwidth and also inter-node latency.
We build real systems and compare them to other real deployed systems on this testbed. As far as numbers go, we feel this is as close as you can get without actually deploying your system on the wide-area internet. Of course, the final proof of the pudding will be in real-world deployment. We are currently readying Flashback for such a deployment. You can help us in visiting the Flashback web-page and giving it a try and letting us know how it worked for you.
From a science perspective, I'm also extremely curious about the fundamental
properties and characteristics of P2P networks and their relation to other types
of networks such as ecological, social and technological (WWW, router-network). The
physics modeling of these networks is another topic that I try to follow to the
best of my understanding :)
In a small study, I investigated how different networks evolve over time. using well-known models of network formation such as random graphs and power-law graphs. My primary focus was on P2P overlay networks but the results that emerged showed that resulting networks are also sometimes observed in nature.
With regards to P2P overlays, the main conclusion that emerged is that, it is better to design network protocols that do not implicitly or explicitly encourage power-law (or rich get richer) behavior. The link to this paper can be found here .
Previously, I've been involved in enhancing scalability of CORBA middleware part of the
DOC Group led by
Prof. Douglas Schmidt.
I co-designed and implemented the first server-side Asynchronous Method Handling (AMH) mechanism for CORBA middleware in TAO AMH solves the problem of stack-blowup on server-side due to large number of long-standing requests on middle-tier servers. The solution involved explicitly encapsulating each server request into a heap-stored object that freed up stack-space of the server-thread while also allowing it to do other work. AMH improves throughput of middle-servers by over 10% as compared to other traditional synchronous threading designs.
Implementation involved designing a new specification for CORBA server-side, changes to the IDL-compiler, and the ORB-core of TAO, an industry-strength CORBA middleware that is over 100K lines of C++ code.
Building and Testing the Next Killer P2P App
How Many Servers Does Google Have?