Tera-scale Computing - A Parallel Path to the Future
Issuing in the Era of Tera
Last fall, Paul Otellini and I announced initial results from Intel’s Tera-scale Research Program and the Polaris processor, the first in a series of many-core research processors. The goal of the Polaris project was to develop design technologies and methodologies tuned toward rapid tera-scale silicon development. The design team created Polaris as 80 cores in a tiled two-dimensional array interconnected through routers built into the silicon. The cores were far simpler than today’s Intel® processors, so we could focus on the challenges of building a lot of cores in a single package. Other objectives of Polaris were to minimize the global clock network design effort, reduce the clocking power budget, and bring fine-grain power management to a many-core processor.
After receiving first silicon, within two hours, Polaris delivered 1 tera-FLOP of performance, consuming less than 62 watts of power to do so – more than the design power of our latest dual-core server processors. With Polaris, we achieved our first major objectives for a tera-scale processor.
When we announced the Polaris research, there were numerous questions about what someone would do with all those cores, and who needs a TFLOP of computing power.
At Intel, we’ve been considering for several years what future applications will look like as part of our Tera-scale Research Program. Tomorrow’s applications will process terabytes of data at TFLOP rates. That demands a level of computing that only exists in supercomputing. But it will need to be available at the desktop to support tomorrow’s tera-scale applications.
Recognition, Mining, SynthesisWe’ve categorized a whole new breed of software under what we call Recognition, Mining, and Synthesis (RMS) applications. These are applications that not only benefit from tera-scale computing, they require it. RMS means:
- Recognition allows computers to examine data and construct mathematical models based on what they identify, like a person’s face in a single picture.
- Mining extracts one or more instances of a specific model from massive amounts of environmental data, such as finding a person’s face occurring in a large number of picture frames in various resolutions, lightings, and so on.
- Synthesis constructs new instances of the models, allowing what-if scenarios or projecting the model in new environments.
Consider the following example, which is an actual software project one of our research teams developed with RMS and tera-scale computing in mind.
If you want to see sports highlights of your favorite team, you have to wait for the sports segment of your local TV news to come on, or visit a sports website and watch a video playing in a small window. Sports summarization takes hours for computer vision software to mine the hundreds of thousands of video frames for a short segment of action. With a tera-scale processor, it could be done in real-time as the game plays. You decide what to summarize – sport, team, player – the recognition code creates models from a frame, and the mining code finds instances of those models through the rest of the frames, combining them in a summary ‘reel’ for you.
But what about the synthesis part?We’ve demonstrated RMS in a motion capture research application that recognizes a person and his movements in a 3D space using four cameras and no markers on the person’s body, extracts a skeletal model of the person, and then uses ray tracing to synthesize the model in an entirely new environment, with lighting, shadows, and a new skin. Today, we have to do this offline. With a tera-scale processor, we could do it all in real-time.
Imagine the possibilities of RMS applications on tera-scale computers. These kinds of applications could have profound impacts on education and training, entertainment, scientific research, and birthday parties.
With tera-scale computing and RMS applications:
Learners could be immersed into an environment, and their real actions part of the scenario – the ultimate learn by doing approach.
Game players actually become part of the excitement and adrenalin of the story without wearing a motion sensing device.
Consolidating 50 years worth of photos and home movies into a few minutes for a family member’s birthday celebration could be done at home in a short while.
Of course, there are many more possibilities, such as real-time analytics impacting government, energy, and retail; personal health visualization in medicine, and a host of other industries. The really interesting applications for computing have yet to be imagined. Tera-scale computing will enable the innovators.
Intel’s Tera-scale Research Program has taken the first steps. There’s still a ways to go.
The level of computing at tera-scale cannot be done with just a few cores, or a few multi-core processors. Tera-scale computing requires tens and hundreds of cores working in parallel to handle the terabytes of data at TFLOP rates. Supporting those cores will require some new and unique technologies to keep them from starving for memory access and I/O bandwidth, or waiting for messages to pass among the core array. The Tera-scale Research Program’s teams are working on some of these issues, including a new approach to a stacked memory/processor package, integrating a new network-on-chip, and exploring optical signaling. But the real enabling of tera-scale computing will come in the cool codes required to run massively parallel processing on many-core chips. That means changing the way software is designed today, from the BIOS code to virtual machines, operating systems, and end-user applications.
The Future is ParallelMany-core chips, parallel processing, and tera-scale computing require a paradigm shift. But that shift gives us the next level in what computing can and will do for our world. It places many challenges before us and opens a vast horizon of opportunities. Think in terms of when PCs first entered the marketplace decades ago and the inspiring applications that followed.
What will future tera-scale workloads look like? What part of these workloads can be parallelized? And how will they benefit on a tera-scale processor and platform? The tera-scale research teams at Intel have engaged with industry and academia to explore these topics
Comments