|
Conference PapersJournal PapersTechnical MemosTechnical NotesTheses
"Gregarious data restructuring in a many core architecture"
Conference Papers In The 17th IEEE International Conference on High Performance Computing and Communications, New York, USA, August, 2015. S. Shrestha, J. Manzano, A. Marquez, S. Song , S. Zuckerman, and G. R. Gao. "Locality Aware Concurrent Start for Stencil Applications" In the Proceedings of the 13th International Symposium on Code Generation and Optimization, San Francisco, USA, February, 2015. S. Shrestha, J. Manzano, A. Marquez, J. Feo, and G. R. Gao. "ACDT: Architected Composite Data Types Trading-in Unfettered Data Access for Improved Execution" In The 20th IEEE International Conference on Parallel and Distributed Systems, Hsinchu, Taiwan, December, 2014. A. Marquez, J. Manzano, S. Song, B. Meister, S. Shrestha, T. St. John and G. R. Gao. "Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading" In the 27th International Workshop on Languages and Compilers for Parallel Computing, Hillsboro, OR, USA, September, 2014. S. Shrestha, J. Manzano, A. Marquez, J. Feo and G. R. Gao. "On the Feasibility of a Codelet Based Multi-core Operating System" In 4th Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM'14). August 24, 2014, Edmonton, Alberta, Canada. Jack B. Dennis and Guang R. Gao. "Toward a Self-Aware Codelet Execution Model" In 4th Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM'14). August 24, 2014, Edmonton, Alberta, Canada. Stéphane Zuckerman, Aaron Landwehr, Kelly Livingston, and Guang R. Gao. "Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading" In Proceedings of Workshop on Multi-Threaded Architectures and Applications (MTAAP 2014), May 2014. Jaime Arteaga, Stephane Zuckerman, Elkin Garcia, and Guang R. Gao. "ASAFESSS: A Scheduler-driven Adaptive Framework for Extreme Scale Software Stacks" In Proceedings of the 4th International Workshop on Adaptive Self-Tuning Computing Systems (ADAPT'14); 9th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC'14), Vienna, Austria. January 20-22, 2014. Best Paper Award Tom St. John, Benoit Meister, Andres Marquez, Joseph B. Manzano, Guang R. Gao, and Xiaoming Li. "A Dynamic Schema to increase performance in Many-core Architectures through Percolation operations" In Proceedings of the 2013 IEEE International Conference on High Performance Computing (HiPC 2013), Hyderabad, India, December 18 - 21, 2013. Elkin Garcia, Daniel Orozco, Rishi Khan, Ioannis Venetis, Kelly Livingston, and Guang Gao. "Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture" In Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2013), Santa Clara, CA, September 25-27, 2013. Elkin Garcia, Jaime Arteaga, Robert Pavel, and Guang R. Gao. "COStream: A Dataflow Programming Language and Compiler for Multi-Core Architecture" In Proceedings of Data-Flow Models (DFM) for extreme scale computing Workshop 2013 in conjunction with Parallel Architectures and Compilation Technologies (PACT 2013), Edinburgh, Scotland, September 8, 2013. Haitao Wei, Guang R. Gao, Weiwei Zhang, Junqing Yu. "The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices" In Proceedings of the 16th Euromicro Conference on Digital System Design, Santander, Spain, September 4-6, 2013. Marco Solinas, Rosa M. Badia, François Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Sylvain Girbal, Daniel Goodman, Behran Khan, Souad Koliai, Feng Li, Mikel Luján, Laurent Morin, Avi Mendelson, Nacho Navarro, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Mateo Valero, Sebastian Weis, Ian Watson, Stéphane Zuckermann, Roberto Giorgi. "An Implementation of the Codelet Model" In Proceedings of 19th International European Conference on Parallel and Distributed Computing (Euro-Par 2013), Aachen, Germany. August 26th, 2013. Joshua Suettlerlein, Stephane Zuckerman, Guang R. Gao. "Toward a Self-aware System for Exascale Architectures" In Proceedings of Euro-Par 2013: Parallel Processing Workshops; the 1st Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2013), Aachen, Germany. August 26th, 2013. Aaron Landwehr, Stephane Zuckerman, and Guang R. Gao. "Automatic Locality Exploitation in the Codelet Model" In Proceedings of 11th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA-13), Melbourne, Australia, July, 2013. Chen Chen, Yao Wu, Joshua Sutterlein, Long Zheng, Minyi Guo, and Guang R. Gao. "Towards Memory-Load Balanced Fast Fourier Transformations in Fine-grain Execution Models" In Proceedings of Workshop on Multithreaded Architectures and Applications (MTAAP 2013), May 24, 2013, Boston, Massachusetts USA Chen Chen, Yao Wu, Stephane Zuckerman, and Guang R. Gao. "Strategies for improving Performance and Energy Efficiency on a Many-core" In Proceedings of 2013 ACM International Conference on Computer Frontiers (CF 2013), May 14-16, Ischia, Italy, ACM, 2013. Elkin Garcia and Guang R. Gao. "Towards An Energy-Efficient Scheduler in the Codelet Model" Poster Paper. In Proceedings of IEEE Symposium on Low-Power and High-Speed Chips (IEEE COOL Chips XVI), April 17-19. 2013, Yokohama, Japan. C. Chen, Y. Wu, J. Suetterlein, L. Zheng and G. Gao. "Determinacy and Repeatability of Parallel Program Schemata" In Proceedings of Second Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2012), Minneapolis, MN, USA, September 23, 2012. Jack B. Dennis, Guang R. Gao, and Vivek Sarkar. "Demystifying Performance Predictions of Distributed FFT3D Implementations" In Proceedings of the 9th IFIP International Conference on Network and Parallel Computing (NPC 2012), Gwangju. Korea. September 6 - 8, 2012. Daniel Orozco, Elkin Garcia, Robert Pavel, Orlando Ayala, Lian-Ping Wang and Guang R. Gao. "MODA: A Framework for Memory Centric Performance Characterization" In Proceedings of the 2nd International Workshop on High-Performance Infrastructure for Scalable Tools (WHIST 2012); 26th International Conference of Supercomputing (ICS'12), Venice, Italy. June 29, 2012. Sunil Shrestha, Chun-Yi Sun, Amanda White, Joseph Manzano, Andres Marquez, Jhon Feo, Kirk Cameron and Guang R. Gao. "A discussion in favor of Dynamic Scheduling for regular applications in Many-core Architectures" In Proceedings of 2012 Workshop on Multithreaded Architectures and Applications (MTAAP 2012); 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), Shanghai, China. May 21 - 25, 2012. Elkin Garcia, Daniel Orozco, Robert Pavel and Guang R. Gao. "Dynamic Percolation: A case of study on the shortcomings of traditional optimization in Many-core Architectures" In Proceedings of 2012 ACM International Conference on Computer Frontiers (CF 2012), Cagliari, Italy. May 15 - 17, 2012. Elkin Garcia, Daniel Orozco, Rishi Khan, Ioannis Venetis, Kelly Livingston and Guang R. Gao. "Massively Parallel Breadth First Search Using a Tree-Structured Memory Model" In Proceedings of International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2012); 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), New Orleans, LA, USA. February 25-29, 2012. Tom St. John, Jack B. Dennis and Guang R. Gao. "Toward High Throughput Algorithms on Many Core Architectures" In Proceedings of 7th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC 2012), Paris, France. January 23-25, 2012. Daniel Orozco, Elkin Garcia, Rishi Khan, Kelly Livingston and Guang R. Gao. "TIDeFlow: The Time Iterated Dependency Flow Execution Model" In Proceedings of Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2011); 20th International Conference on Parallel Architectures and Compilation Techniques (PACT 2011), Galveston Island, TX, USA. October 10 - 14, 2011. Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang R. Gao "Exploring Fine-Grained Task-based Execution on Multi-GPU Systems" In Proceedings of Workshop on Parallel Programming on Accelerator Clusters (PPAC 2011); IEEE Cluster 2011. Austin, TX, USA. September 26, 2011. Long Chen, Oreste Villa and Guang R. Gao "Towards an integrated multiscale simulation of turbulent clouds on PetaScale computers" In Proceedings of 13th European Turbulence Conference (ETC13), Warsaw, Poland. September 12-15, 2011. Lian-Ping Wang, Orlando Ayala, Hossein Parishani, Wojciech W Grabowski, Andrzej A Wyszogrodzki, Zbigniew Piotrowski, Guang R Gao, Chandra Kambhamettu, Xiaoming Li, Louis Rossi, Daniel Orozco and Claudio Torres. "Polytasks: A Compressed Task Representation for HPC Runtimes" In Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2011), Fort Collins, CO, USA. September 8-10, 2011. Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang R. Gao "OPELL and PM: A Case Study on Porting Shared Memory Programming Models to Accelerators Architectures" In Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2011), Fort Collins, CO, USA. September 8-10, 2011. Joseph B. Manzano, Ge Gan, Juergen Ributzka, Sunil Shrestha and Guang R. Gao "Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures" In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'11), Bordeaux, France. August 29 - September 2, 2011. Yonghong Yan, Sanjay Chatterjee, Daniel Orozco, Elkin Garcia, Zoran Budimlic, Jun Shirako, Robert Pavel, Guang R. Gao and Vivek Sarkar "Experiments with the Fresh Breeze Tree-Based Memory Model" In Proceedings of International Supercomputing Conference (ISC'11), Hamburg, Germany, June 19 - 23, 2011. Jack B. Dennis, Guang R. Gao and Xiao X. Meng "Position Paper: Using a "Codelet" Program Execution Model for Exascale Machines" In Proceedings of ACM SIGPLAN 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (EXADAPT 2011); Programming Language Design and Implementation (PLDI 2011). San Jose, CA, USA. June 5, 2011. Stephane Zuckerman, Joshua Suetterlein, Rob Knauerhase and Guang R. Gao "The Elephant and the Mice: Non-Strict Fine-Grain Synchronization for Many-Core Architectures" In Proceedings of 25th International Conference on Supercomputing (ICS'11), Tucson, AZ, USA. May 31 - June 4, 2011. Juergen Ributzka, Joseph B. Manzano, Yuhei Hayashi and Guang R. Gao "DEEP: An Iterative FPGA-based Many-Core Emulation System for Chip Verification and Architecture Research" In Proceedings of 19th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'11), Monterrey, CA, USA. February 27 - March 1, 2011. Juergen Ributzka, Yuhei Hayashi, Fei Chen and Guang R. Gao "Energy efficient tiling on a Many-Core Architecture" In Proceedings of 4th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG 2011); 6th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Heraklion, Greece. January 23, 2011. Elkin Garcia, Daniel Orozco and Guang R. Gao "Locality Optimization of Stencil Applications using Data Dependency Graphs" In Proceedings of the 23rd International Workshop on Languages and Compilers for Parallel Computing (LCPC 2010), Houston, TX, USA. October 7-9, 2010. Daniel Orozco, Elkin Garcia and Guang R. Gao "Optimized Dense Matrix Multiplication on a Many-Core Architecture" In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'10), Ischia, Italy. August 31- September 3, 2010. Elkin Garcia, Ioannis E. Venetis, Rishi Khan and Guang R. Gao "A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures" In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'10), Ischia, Italy. August 31- September 3, 2010. Chen Chen, Joseph B Manzano, Ge Gan, Guang R. Gao and Vivek Sarkar "TiNy threads on BlueGene/P: Exploring many-core parallelisms beyond The traditional OS" In Proceedings of Workshop on Multithreaded Architecures and Applications (MTAAP); 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, USA. April 23, 2010. Handong Ye, Robert Pavel, Aaron Landwehr and Guang Gao "Minimizing Communication in Rate-Optimal Software Pipelining for Stream Programs" In Proceedings of Symposium on Code Generation and Optimization (CGO 2010), Toronto, Canada. April 24-28, 2010. Haitao Wei, Junqing Yu, Huafei Yu and Guang R. Gao "Dynamic Load Balancing on Single- and Multi-GPU Systems" In Proceedings of the 24th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, USA. April 19-23, 2010. Long Chen, Oreste Villa, Sriram Krishnamoorthy, and Guang R. Gao "Performance Analysis of Cooley-Tukey FFT Algorithms for a Many-core Architecture " In Proceedings of The High Performance Computing Symposium (HPC 2010), Orlando, FL, USA. April 12-15, 2010. Long Chen and Guang R. Gao "MODA: A Memory Centric Performance Analysis Tool" In Proceedings of 11th LCI International Conference on High-Performance Clustered Computing, Pittsburgh, PA, USA. March 9-11, 2010 Joseph B. Manzano, Andres Marquez and Guang R. Gao "Iterative Layer-Based Raytracing on CUDA" In Proceedings of 28th IEEE International Performance Computing and Communications Conference (IPCCC 2009), Phoenix, AZ, USA. December 14-16, 2009. Alejandro Segovia, Xiaoming Li and Guang R. Gao "Mapping the FDTD Application to Many-Core Chip Architectures" In Proceedings of the 38th International Conference on Parallel Processing (ICPP 2009), Vienna, Austria. September 22-25, 2009. Daniel Orozco and Guang R. Gao "Tile Percolation: an OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor" In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'09), Delft, The Netherlands. August 25-28, 2009 Ge Gan, Xu Wang, Joseph Manzano and Guang R. Gao "Tile reduction: the first step towards Openmp tile aware parallelization" In Proceedings of the 5th International Workshop on OpenMP (IWOMP'09), Dresden, Germany, June 3-5, 2009 Ge Gan, Xu Wang, Joseph Manzano, Guang R. Gao "Mapping the LU Decomposition on a Many Core Architecture: Challenges and Solutions" In Proceedings of ACM International Conference on Computing Frontiers (CF 2009), Ischia, Italy. May 18-20, 2009 Ioannis E. Venetis and Guang R. Gao "Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture" In Proceedings of The 21st Annual Languages and Compilers for Parallel Computing Workshop (LCPC 2008), Alberta, Canada. July 31 - August 2, 2008 Guangming Tan, Vugranam Sreedhar, Guang R. Gao "Experience on Optimizing Irregular Computation for Memory Hierarchy in Manycore Architecture " Poster Paper. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), Salt Lake City, UT, USA. February 20-23, 2008 Guangming Tan, Dongrui Fan, Junchao Zhang, Andrew Russo, Guang R. Gao "Performance Tuning of the Fast Fourier Transform on a Multi-core Architecture" In Proceedings of First Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG 2008), Goteborg, Sweden. January 27, 2008. Liping Xue, Long Chen, Ziang Hu, Guang R. Gao "Server I/O Acceleration Using an Embedded Multi-core Architecture" In Proceedings of Workshop on Application Specific Processors (WASP 2007), Salzburg, Austria. October 4-5, 2007. Lurng-Kuo Liu, Fei Chen, Christos J. Georgiou and Guang R. Gao "Software-Pipelining on Multi-Core Architectures" In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), Brasov, Romania. September 15-19, 2007. Alban Douillet and Guang R. Gao "Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers" In Proceedings of The 20th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2007), Urbana, IL, USA. October 11-13, 2007 Yuan Zhang, Evelyn Duesterwald and Guang R. Gao "Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization for Many-Core Architectures" In Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), San Diego, CA, USA. June 9-13, 2007 Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu, and Guang R. Gao Available in pdf format "A Parallel Dynamic Programming Algorithm on a Multi-core Architecture" In Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2007), San Diego, CA, USA. June 9-11, 2007 Guangming Tan, Ninghui Sun, and Guang R. Gao "ParalleX: A Study of A New Parallel Computation Model" In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007. Guang R. Gao, Thomas Sterling, Rick Stevens, Mark Hereld and Weirong Zhu "On the Role of Deterministic Fine Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era" In Proceedings of First Workshop on Multithreaded Architecures and Applications in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007. Weirong Zhu, Ziang Hu, and Guang R. Gao "Exploring a multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture" In Proceedings of First Workshop on Multithreaded Architectures and Applications in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007. Ge Gan, Ziang Hu, Juan del Cuvillo, and Guang R. Gao Also available in pdf format "Experience of Optimizing FFT on Intel Core Architecture" In Proceedings of Workshop on Performance Optimization for High-Level Languages and Libraries in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007. Daniel Orozco, Liping Xue, Murat Bolat, Xiaoming Li and Guang Gao Also available in pdf format "Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement" In Proceedings of Workshop on Performance Optimization for High-Level Languages and Libraries in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007. Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Yingping Zhang, Murat Bolat, Xiaoming Li and Guang Gao Also available in pdf format "Optimizing Fast Fourier Transform on a Multi-core Architecture" In Proceedings of Workshop on Performance Optimization for High-Level Languages and Libraries in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007. Long Chen and Ziang Hu Also available in pdf format "Optimized lock assignment and allocation: a method for exploiting concurrency among critical sections" In the Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP 2007), San Jose, CA, USA, March 14 - 17, 2007. Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar and Guang R. Gao "Exploring Financial Applications on Many-core-on-a-chip Architecture: A First Experiment" In Proceedings of Workshop on Frontiers of High Performance Computing and Networking (FHPCN2006), 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA 2006) , Sorrento, Italy. December 4-7, 2006. Weirong Zhu, Parimala Thulasiraman, Ruppa K. Thulasiram and Guang R. Gao Available in pdf format "Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences" In Proceedings of the 12th International European Conference on Parallel Processing (Euro-Par 2006), Dresden, Germany. August 29 - September 1, 2006. Ziang Hu, Juan del Cuvillo, Weirong Zhu, and Guang R. Gao Also available in pdf format "Multi-Dimensional Kernel Generation for Loop Nest Software Pipelining" In Proceedings of the 12th International European Conference on Parallel Processing (Euro-Par 2006), Dresden, Germany. August 29 - September 1, 2006. Alban Douillet, Hongbo Rong, and Guang R. Gao Also available in pdf format "A User-Friendly Methodology for Automatic Exploration of Compiler Options" In Proceedings of The International Conference on Programming Languages and Compilers (PLC06). Las Vegas, Nevada. June 26-29, 2006. Haiping Wu, Long Chen, Joseph Manzano and Guang R. Gao Also available in pdf format "A User-Friendly Methodology for Automatic Exploration of Compiler Options: A Case Study on the Intel XScale Microarchitecture" In Proceedings of The International Conference on Programming Languages and Compilers (PLC06). Las Vegas, Nevada. June 26-29, 2006. Haiping Wu, Eunjung Park, Long Chen, Juan del Cuvillo and Guang R. Gao Also available in pdf format "Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture" In Proceedings of the 2nd International Workshop on OpenMP (IWOMP2006), Remis, France. June 12-15 2006. Weirong Zhu, Juan del Cuvillo and Guang R. Gao Also available in pdf format "Towards a Software Infrastructure for the Cyclops-64 Cellular Architecture" In Proceedings of the 20th International Symposium on High Performance Computing Systems and Applications (HPCS'06), St. John's, Canada. May 14 - 17, 2006. Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao Also available in pdf format "Landing OpenMP on Cyclops-64: An Efficient Mapping of OpenMP to a many-core System-on-a-chip" In Proceedings of the 3rd ACM International Conference on Computing Frontiers, Ischia, Italy. May 2-5, 2006. Juan del Cuvillo, Weirong Zhu and Guang R. Gao Also available in pdf format "A Study of the On-Chip Interconnection Network for the IBM Cyclops-64 Multi-Core Architecture" In Proceedings of 20th International Parallel and Distributed Processing Symposium (IPDPS2006), Rhodes Island, Greece. April 25 - 29, 2006. Ying M. P. Zhang, Taikyeong Jeong, Fei Chen, Haiping Wu, Ronny Nitzsche and Guang R. Gao Also available in pdf format "Hierarchical Multithreading: Programming Model and System Software" In Proceedings of Workshop on NSF Next Generation Software Program (NSFNGS'06), in conjunction with 20th International Parallel and Distributed Processing Symposium (IPDPS2006), Rhodes Island, Greece. April 25 - 29, 2006. Guang R. Gao, Thomas Sterling, Rick Stevens, Mark Hereld and Weirong Zhu "Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops-64" In Proceedings of Network and Parallel Computing (NPC 2005), Beijing, China. November 30 - December 3, 2005. Yanwei Niu, Ziang Hu, Kenneth Barner and Guang R. Gao Also available in pdf format "Register Pressure in Software-Pipelined Loop Nests: Fast Computation and Impact on Architecture Design" In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2005), Hawthorne, NY, USA. October 20-22, 2005. Alban Douillet and Guang R. Gao Also available in pdf format "Identifying Multiply-Add Operations in Kylin Compiler" In the proceedings of the 2005 International Conference on Embedded Systems and Applications (ESA'05), Las Vegas, NV, USA. June 27-30, 2005. Haiping Wu, Ziang Hu, Joseph Manzano Yingping Zhang and Guang R. Gao "Register Allocation for Software Pipelined Multi-dimensional Loops" In Proceedings of Conference on Programming Language Design and Implementation (PLDI 2005), Chicago, IL, USA. June 11 - 15, 2005. Hongbo Rong, Alban Douillet and Guang R. Gao Also available in pdf format "FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture" In Proceedings of Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), Madison, WI, USA. June 4, 2005. Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao Also available in pdf format "P3I: The Delaware Programmability, Productivity and Proficiency Inquiry" In Proceedings of the Second International Workshop On Software Engineering for High Performance Computing System Applications (SE-HPCS '05), St. Louis, MO, USA. May 15, 2005 Joseph B. Manzano, Yuan Zhang and Guang R. Gao "Atomic Section: Concept and Implementation" In Proceedings of Mid-Atlantic Student Workshop on Programming Languages and Systems (MASPLAS '05), Newark, DE, USA. April 30, 2005. Yuan Zhang, Joseph B. Manzano and Guang R. Gao "TiNy Threads: a Thread Virtual Machine for the Cyclops-64 Cellular Architecture" In Proceedings of the Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), Denver, CO, USA. April 3 - 8, 2005 Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao Also available in pdf format "Performance Portability on EARTH: A Case Study across Several Parallel Architectures" In Proceedings of the 4th International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS'05), conjuncted with IPDPS 2005, Denver, CO, USA. April 4 - 8, 2005. Weirong Zhu, Yanwei Niu and Guang Gao "Sequential Consistency Revisited: The Sufficient Conditions and Method to Reason Consistency Model of a Multiprocessor-on-a chip Architecture" In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2005), Innsbruck, Austria. February 15 - 17, 2005. Yuan Zhang, Weirong Zhu, Fei Chen, Ziang Hu and Guang R. Gao "If-Conversion in SSA Form" In Proceedings of the International European Conference on Parallel and Distributed Computing (Euro-Par 2004), Pisa, Italy. August 31 - September 3, 2004. Arthur Stoutchinin and Guang R. Gao "Single-Dimension Software Pipelining for Multi-Dimensional Loops" In Proceedings of International Symposium on Code Generation and Optimization (CGO 2004), San Jose, CA. March 21 -24, 2004. Hongbo Rong, Zhizhong Tang, R. Govindarajan, Alban Douillet and Guang R. Gao Also available in pdf format "Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops" In Proceedings of International Symposium on Code Generation and Optimization (CGO 2004), San Jose, CA. March 21 -24, 2004. Hongbo Rong, Alban Douillet, R. Govindarajan and Guang R. Gao Also available in pdf format "DIMES: An Iterative Emulation Platform for Multiprocessor-System-on-Chip Designs" In Proceedings of IEEE International Conference on Field-Programmable Technology (FPT'03), Tokyo, Japan. December 15 - 17, 2003. Hirofumi Sakane, Levent Yakay, Vishal Karna, Clement Leung and Guang R. Gao "Code Size Oriented Memory Allocation for Temporary Variables" In Proceedings of Fifth Workshop on Media and Streaming Processors (MSP-5/MICRO-36), San Diego, CA, USA. December 1, 2003. Ziang Hu, Yan Xie and Guang R. Gao "Code Size Reduction with Global Code Motion" In Proceedings of Workshop on Compilers and Tools for Constrained Embedded Systems (CTCES/CASES) 2003, San Jose, CA, USA. October 29, 2003. Ziang Hu, Yuan Zhang, Hongbo Yang and Guang R. Gao "Performance Study of a Whole Genome Comparison Tool on a Hyper-Threading Multiprocessor" In Proceedings of Fifth International Symposium on High Performance Computing, Tokyo, Japan. October 20 - 22, 2003. Juan del Cuvillo, Xinmin Tian, Guang R. Gao and Millind Girkar "CARE: Overview of an Adaptive Multithreaded Architecture" In Proceedings of Fifth International Symposium on High Performance Computing, Tokyo, Japan. October 20 - 22, 2003. Andres Marquez and Guang R. Gao "Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation" In Proceedings of 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003), College Station, TX, USA. October 2 - 4, 2003. Hongbo Yang, R. Govindarajan, Guang R. Gao and Ziang Hu "A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model" In Proceedings of Fifth IEEE International Conference on Cluster Computing (CLUSTER2003), Hong Kong, China. September 20-23, 2003. Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao "An Executable Analytical Performance Evaluation Approach for Early Performance Prediction" In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France. April 22 - 26, 2003. Adeline Jacquet, Vincent Janot,Clement Leung, Guang R. Gao, R. Govindarajan, Thomas L. Sterling "Programming Models and System Software for Future High-End Computing Systems: Work-in-Progress" In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France. April 22 - 26, 2003. Guang R. Gao, Kevin B. Theobald, R. Govindarajan, Clement Leung, Ziang Hu, Haiping Wu, Jizhu Lu, Juan del Cuvillo, Adeline Jacquet, Vincent Janot and Thomas L. Sterling "On Achieving Balanced Power Consumption in Software Pipelined Loops" In Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES 2002), Grenoble, France. Octuber 8 - 11, 2002. Hongbo Yang, Guang R. Gao, Clement Leung, R. Govindarajan and Haiping Wu Available as gzipped Postscript. "Exploiting Schedule Slacks for Rate-Optimal Power-Minimum Software Pipelining" In Proceedings of 3rd Workshop on Compilers and Operating Systems for Low Power (COLP), held in conjunction with The 11th International Conference on Parallel Architecture and Compilation Techniques (PACT), Charlottesville, VA, USA. September 22 - 25, 2002. Hongbo Yang, R. Govindarajan, Guang R. Gao, George Cai and Ziang Hu Available as gzipped Postscript. "Power-Performance Trade-offs for Energy-Efficient Architectures: A Quantitative Study" In Proceedings of 20th International Conference on Computer Design (ICCD) 2002, Freiburg, Germany. September 16 - 18, 2002. Hongbo Yang, R. Govindarajan, Guang R. Gao and Kevin B. Theobald Available gzipped Postscript. "Whole Genome Alignment using a Multithreaded Parallel Implementation" In Proceedings of Symposium on Computer Architecture and High Performance Computing, Pirenopolis, Brazil. September 10 - 12, 2001. Wellington S. Martins, Juan del Cuvillo, Wenwu Cui and Guang R. Gao "Next Generation System Software for Future High-End Computing Systems" In Proceedings of the 16th International Parallel and Distributed Processing Symposium(IPDPS '02). IEEE Computer Society, Washington, DC, USA. April 15, 2002. Guang R. Gao, Kevin B. Theobald, Ziang Hu, Haiping Wu, Jizhu Lu, Keshav Pingali, Paul Stodghill, Thomas L. Sterling, Rick Stevens, and Mark Hereld "Power and Energy Impact by Loop Transformations" In Proceedings of Workshop on Compilers and Operating Systems for Low Power 2001, in conjunction with Parallel Architecture and Compilation Techniques 2001, Barcelona, Spain. September 8, 2001. Hongbo Yang, Guang R. Gao, Andres Marquez, George Cai and Ziang Hu Available as gzipped Postscript. "A Multi-Threaded Runtime System for a Multi-Processor/Multi-Node Cluster" In Proceedings of 15th Annual International Symposium on High Performance Computing Systems and Applications, Windsor, ON, Canada. June 18 - 20, 2001. Christopher J. Morrone, Jose N. Amaral, Guy Tremblay, and Guang R. Gao "Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs" In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, CA, USA. April 24 - 28, 2001. R. Govindarajan, Hongbo Yang, C. Zhang, Jose N. Amaral and Guang R. Gao Available as gzipped Postscript. "Multithreaded Algorithms for Pricing a Class of Complex Options" In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, CA, USA. April 24 - 28, 2001. Ruppa K. Thulasiram, Lubomir Litov, Hassan Nojumi, Christopher T. Downing and Guang R. Gao Available as gzipped Postscript. "Compiling Several Classes of Reductions on a Multithreaded Architecture" In Proceedings of the Mid-Atlantic Student Workshop on Programming Languages and Systems, Yorktown Heights, New York, IBM T. J. Watson Research Center, April 2001. Rishi Kumar, Gagan Agrawal, Kevin Theobald, Gary M. Zoppetti, and Guang R. Gao "Speculative Prefetching of Induction Pointers" In Proceedings of International Conference on Compiler Construction (CC 2001), Genova, Italy. April 2 - 6, 2001. Artour Stoutchinin, Jose N. Amaral, Guang R. Gao, Jim Dehnert, Suneel Jain and Alban Douillet Available as gzipped Postscript. "Computer Detection of Single Nulcleotide Polymorphisms (SNPs) in Maize ESTs" In Proceedings of Plant & Animal Genome IX Conference (PAG-IX), San Diego, CA, USA. January 13 - 17, 2001. F. Useche, M. Morgante, M.Hanafey, Scott Tingey, Guang R. Gao and Antoni Rafalski "A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison" In Proceedings of Pacific Symposium on Biocomputing (PSB 2001), pp. 311-322, Hawaii, HI, USA. January 3 - 7, 2001. W.S. Martins, J.B. del Cuvillo, F.J. Useche, K.B. Theobald and Guang R. Gao "Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path" In Proceedings of Super Computing (SC2000), Dallas, TX, USA. November 4-10, 2000. Kevin B. Theobald, Gagan Agrawal, Rishi Kumar, Gerd Heber, Guang R. Gao, Paul Stodghill and Keshav Pingali "Developing a Communication Intensive Application on the EARTH Multithreaded Architecture" In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par 2000), Munchen, Germany. August 28 - September 1, 2000. Kevin B. Theobald, Rishi Kumar, Gagan Agrawal Gerd Heber, Ruppa K. Thulasiram and Guang R. Gao "Multithreaded Algorithms for the Fast Fourier Transform" In Proceedings of the 12th Annual ACM Symposium on Parallel Algorithms and Architectures, Bar Harbor, Maine, pp. 176-185, July 2000. Parimala Thulasiraman, Kevin B. Theobald, Ashfaq A. Khokhar, and Guang R. Gao "Parallel FEM Simulation of Crack Propagation --Challenges, Status, and Perspectives" In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS'00), pp. 443-449 Cancun, Mexico. May 1-5, 2000. Bruce Carter, Chuin-Shan Chen, L. Paul Chew, Nikos Chrisochoides, Guang R. Gao, Gerd Heber, Antony R. Ingraffea, Roland Krause, Chris Myers, Demian Nave, Keshav Pingali, Paul Stodghill, Stephen Vavasis, Paul A. Wawrzynek "Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System" In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS'00), pp. 589-594, Cancun, Mexico. May 1-5, 2000. Wen-Yen Lin, Jean-Luc Gaudiot, Jose N. Amaral and Guang R. Gao "Performance Analysis of the I-Structure Software Cache on Multi-Threading Systems" In Proceedings of 19th IEEE International Performance, Computing and Communication Conference (IPCCC2000), Phoenix, AZ, USA. February 20-22, 2000. Wen-Yen Lin, Jean-Luc Gaudiot, Jose N. Amaral and Guang R. Gao "A Comparative Performance Study of Fine-Grain Multi-threading on Distributed Memory Machines" In Proceedings of 19th IEEE International Performance, Computing and Communication Conference (IPCCC2000), Phoenix, AZ, USA. February 20-22, 2000. Prasad Kakulavarapu, Christopher J. Morrone, Kevin B. Theobald, Jose N. Amaral and Guang R. Gao "Coping With Very High Latencies in Petaflops Computer Systems" In Proceedings of High Performance Computing, Second International Symposium, ISHPC'99, Kyoto, Japan. May 26-28, 1999. Sean Ryan, Jose N. Amaral, Guang R. Gao, Zachary Ruiz, Andres Marquez and Kevin Theobald. "A Multithreading Parallel Computational Approach for Valuing Derivatives" In Proceedings of First WAFA Finance Research Conference, Fairfax, VA, USA. April 30, 1999. R.K. Thulasiram and Guang R. Gao "Load Adaptive Algorithms and Implementations for the 2D Discrete Wavelet Transform on Fine-Grain Multithreaded Architectures" In Proceedings of Workshop on SPDP '99, San Juan, Puerto Rico, April 12-16, 1999. Ashfaq A. Khokhar, Gerd Heber, Parimala Thulasiraman and Guang R. Gao Available as gzipped Postscript. "A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes" In Proceedings of Workshop on SPDP '99, San Juan, Puerto Rico, April 12-16, 1999. Gerd Heber, Rupak Biswas and Guang R. Gao. Available as gzipped Postscript. "Self-Avoiding Walks over Adaptive Unstructured Grids" In Proceedings of Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR 1999), San Juan, Puerto Rico, April 12-16, 1999. Gerd Heber, Rupak Biswas and Guang R. Gao. Available as gzipped Postscript. "Efficient State-Diagram Construction Methods for Software Pipelining" In Proceedings of International Conference on Compiler Construction (CC 1999), Amsterdam, The Netherlands. March 20-28, 1999. Chihong Zhang, R. Govindarajan, Sean Ryan and Guang R. Gao. Available as gzipped Postscript. "HTMT-C: Proposing A Programming Language For A Petaflop Machine" In Proceedings of the Mid-Atlantic Student workshop on Programming Languages and Systems (MASPLAS 1999), pp 53-68, Baltimore, MD. March 27. 1999 Sean Ryan, Jose Nelson. Amaral, Zachary Ruiz and Guang R. Gao "Superconducting Processors for HTMT: Issues and Challenges" In Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS '99), pp 260-267, Annapolis, MD, USA. February 21-25, 1999. Kevin B. Theobald, Guang R. Gao and Thomas L. Sterling. Available as gzipped Postscript. "Performance Prediction for the HTMT: A Programming Example" TFP3 '99, Annapolis, Maryland, February 22, 1999 Jose Nelson Amaral, Guang R. Gao, Phillip Merkey, Thomas Sterling, Zachary Ruiz and Sean Ryan. "A Superstrand Architecture and its Compilation" In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation, held in conjunction with HPCA-V, Orlando, FL, USA. January 9-12, 1999. Andres, Marquez, Kevin B. Theobald, Xinan Tang and Guang R. Gao "Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model" In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation, held in conjunction with HPCA-V, Orlando, FL, USA. January 9-12, 1999. Haiying Cai, Olivier Maquelin, Prasad Kakulavarapu and Guang R. Gao. "An Implementation of a Hopfield Network Kernel on EARTH" Brazilian Symposium on Computer Architecture and High Performance Processing , Buzios, Brazil, September, 1998. Jose N. Amaral, Guang R. Gao and Xinan Tang Available as gzipped Postscript. "Using Multithreading for the Automatic Load Balancing of Adaptive Finite Element Meshes" In Proceedings of Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR 1998), Berkeley, CA, USA. August 9-11, 1998. Gerd Heber, Rupak Biswas, Parimala Thulasiraman and Guang R. Gao Available as gzipped Postscript. "Elastic History Buffer: A Low-Cost Method to Improve Branch Prediction Accuracy" In Proceedings of International Conference on Computer Design: VLSI in Computers & Processors (ICCD 1997), Austin, TX, USA. October 12-15, 1997 Guang R. Gao, Maria-Dana Tarlescu and Kevin B. Theobald. Available as gzipped Postscript. "Thread Partitioning and Scheduling Based on Cost Model" In Proceedings of 9th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA 1997), Newport, RI, USA. June 22 - 25, 1997. Guang R. Gao, Xinan Tang, Jian Wang and Kevin B. Theobald. Available as gzipped Postscript.
The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining
Journal Papers International Journal of Parallel Programming, July 2015, ISSN 0885-7458. Daniel Orozco, Elkin Garcia, Robert Pavel, Jaime Arteaga, and Guang R. Gao TERAFLUX: Harnessing dataflow in next generation teradevices Microprocessors and Microsystems - Journal, Available online 18 April 2014, ISSN 0141-9331. Roberto Giorgi, Rosa M. Badia, Françs Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Rahul Gayatri, Sylvain Girbal, Daniel Goodman, Behran Khan, Souad Koliai, Joshua Landwehr, Nhat Minh LêFeng Li, Mikel Lujà Avi Mendelson, Laurent Morin, Nacho Navarro, Tomasz Patejko, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Ian Watson, Sebastian Weis, Stéphane Zuckerman, Mateo Valero Exploitation of Locality for Energy Efficiency for Breadth First Search in Fine-grain Execution Models Tsinghua Science and Technology - Journal, Volume 18, Number 3, June 2013. Chen Chen, Souad Koliai and Guang R. Gao. StreamTMC: Stream Compilation for Tiled Multi-core Architectures Elsevier Journal of Parallel and Distributed Computing (JPDC), Volume 73, Issue 4, April 2013, Pages 484-494. Haitao Wei, Mingkang Qin, Junqing Yu, Dongrui Fan and Guang R. Gao. Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No.12, Dec. 2012, pp. 2338-2350. Haitao Wei, Junqing Yu, Huafei Yu, Mingkang Qin and Guang R. Gao Toward High Throughput Algorithms on Many Core Architectures ACM Transactions on Architecture and Code Optimization (TACO), Volume 8, Issue 4, January 2012, Article No. 49. Daniel Orozco, Elkin Garcia, Rishi Khan, Kelly Livingston and Guang R. Gao. Experiments with the Fresh Breeze tree-based memory model Computer Science - Research and Development, June 2011, Volume 26, Issue 3-4, pp 325-337. Jack B. Dennis, Guang R. Gao, Xiao X. Meng. Analysis and Performance Results of Computing Betweenness Centrality on IBM Cyclops64 ACM Journal of Supercomputing, Vol. 56, No.1, April 2011, pp. 1-24. Guangming Tan, Vugranam C. Sreedhar and Guang R. Gao. Improving Performance of Dynamic Programming via Parallelism and Locality on Multi-core Architectures IEEE Transactions on Parallel and Distributed Systems, Vol.20, No.2, 2009, pp. 261-274. Guangming Tan, Ninghui Sun and Guang R. Gao Register allocation for software pipelined multidimensional loops ACM Trans. Program. Lang. Syst. 30(4), July 2008. Hongbo Rong, Alban Douillet and Guang R. Gao EnGENIUS - Environmental Genome Informational Utility System Journal of Bioinformatics and Computational Biology, JBCB-119R1, July 2008 M. Kaplarevic, A.E. Murray and Guang R. Gao Single-Dimension Software Pipelining for Multidimensional Loops ACM Transactions on Architecture and Code Optimization (TACO), Volume 4, Issue 1, March 2007, Article No. 7. Hongbo Rong, Zhizhong Tang, R. Govindarajan, Alban Douillet and Guang R. Gao Performance Portability on EARTH: A Case Study across Several Parallel Architectures Cluster Computing, Volume 10, Number 2, June, 2007, page 115-126. Weirong Zhu, Yanwei Niu and Guang R. Gao Madd Operation Aware Redundancy Elimination International Journal of Software Engineering and Knowledge Engineering, Vol. 15, No. 2, 2005, pp357-362 Haiping Wu, Ziang Hu, Joseph Manzano and Guang. R. Gao. Improving Power Efficiency with Compiler-Assisted Cache Replacement Journal of Embedded Computing, 2005 Hongbo~Yang, R. Govindarajan, Guang R. Gao and Ziang Hu A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model International Journal of High Performance Computing and Networking, Vol 2, Issue 2/3/4, 2004 Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao, An Improved Hidden Markov Model for Transmembrane Protein Topology Prediction and Its Applications to Complete Genomes Bioinformatics, Volume 21, Number 9, pp. 1853-158, 2005 Robel Kahsay, Li Liao and Guang Gao Quasi-Consensus Based COMParison of Profile Hidden Markov Models for Protein Sequences Bioinformatics, Volume 21, Number 10, pp. 2287-2293, 2005 Robel Kahsay, Guoli Wang, Guang R. Gao, Li Liao and Roland Dunbrack. Efficient Multithreaded Algorithms for the Fast Fourier Transform Parallel and Distributed Computing Practices, Vol. 5, No. 2, Pages: 177-191, 2004 Parimala Thulasiraman, Kevin B. Theobald, Ashfaq A. Khokhar and Guang R. Gao A Fine-Grain Load Adaptive Algorithm of the 2D Discrete Wavelet Transform for Multithreaded Architectures Journal of Parallel and Distributed Computing (JPDC), Vol.64, No.1, Pages: 68-78, January 2004 Parimala Thulasiraman, Ashfaq A. Khokhar, Gerd Heber and Guang R. Gao Evaluation and Choice of Various Branch Predictors for Low-Power Embedded Processor Journal of Computer Science and Technology, Vol. 18, No. 6, Pages: 833-838, November, 2003 Dong Rui Fan, Hongbo Yang, Gaung R. Gao and Rong Cai Zhao Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures IEEE Transactions on Computers, Vol. 52, No. 1, Pages: 4-20, January 2003 Ramaswamy Govindarajan, Hongbo Yang, Jose N Amaral, Chihong Zhang and Guang R. Gao Implementation of the EARTH Programming Model on SMP Clusters: a Multi-Threaded Language and Runtime System Concurrency and Computation: Practice and Experience, Vol. 15, No. 9, Pages: 821-844, August 2003 Guy Tremblay, Christopher J. Morrone, Jose N. Amaral and Guang R.Gao Minimizing Buffer Requirements in Rate-Optimal Schedules in Regular Dataflow Networks Journal of VLSI Signal Processing, Vol. 31, No. 3, Pages: 207-229, Jul 2002 Ramaswamy Govindarajan and Guang R. Gao Implementation and Evaluation of a Communication Intensive Application on the EARTH Multithreaded System Concurrency and Computation: Practice and Experience, 14(3):183-201, March 2002 Kevin B. Theobald, Rishi Kumar, Gagan Agrawal, Gerd Heber, Ruppa K. Thulasiram, and Guang R. Gao A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors Design Automation for Embedded Systems, Vol. 6, No. 3, Pages: 243-275, March 2002 Ramaswamy Govindarajan, Erik R. Altman and Guang R. Gao CASA: A Server for The Critical Assessment of Sequence Alignment Accuracy Bioinformatics, Vol. 18, No. 3, Pages: 496-497, March 2002 Robel Y. Kahsay, Nataraj Dongre, Guang R. Gao, Guoli Wang and Roland L. Dunbrack Jr. TROLL--Tandem Repeat Occurrence Locator Bioinformatics, Vol. 18, No. 4, Pages: 634-636, April 2002 Adalberto T. Castelo, Wellington S. Martins and Guang R. Gao Exploiting Locality in single Assignment Data Structures Updated through Split Phase Transactions Cluster Computing, Special issue on Internet Scalability: Advances in Parallel, Distributed and Mobile Systems, Vol. 4, No. 4, Pages: 281-293, October 2001 Jose N. Amaral, Wen-Yen Lin, Jean-Luc Gaudiot and Guang R. Gao Dynamic Load Balancers for a Multithreaded Multiprocessor System Parallel Processing Letters, Vol. 11, No. 1, Pages: 169-184, March 2001 Prasad Kakulavarapu, Olivier Maquelin, Jose N. Amaral and Guang R. Gao A New Memory Model and Cache Consistency Protocol IEEE Transactions on Computers, Vol. 49, No. 8, Pages: 798-813, August 2000 Guang R. Gao and Vivek Sarkar, Location Consistency Automatically Partitioning Threads for Multithreaded Architectures Special Issues on Compilation and Architectural Support for Parallel Applications, Journal of Parallel and Distributed Computing, Vol. 58, No. 2, Pages: 159-189, August 1999 Xinan Tang and Guang R. Gao Advances in the Dataflow Computational Model Parallel Computing , Vol. 25, No.13 - 14, Pages: 1907 . 1927, 1999 Walid A. Najjar , Edward A Lee and Guang R. Gao A New Framework for Elimination Based Data Flow Analysis Using DJ Graphs ACM Transaction on Programming Languages and Systems, Vol. 20, No. 2, Pages 388-435, March 1998 Vugranam C. Sreedhar, Guang R. Gao, and Yong-Fong Lee Optimal Modulo Scheduling Through Enumeration International Journal on Parallel Programming, Vol. 26, No.2, Pages: 313-344, 1998 Erik R. Altman and Guang R. Gao A Unified Framework for Instruction Scheduling and Mapping for Function Units with Structural Hazards Journal of Parallel and Distributed Computing, Vol. 49, No. 2, Pages: 259-293, 1998 Erik R. Altman, Ramaswamy Govindarajan and Guang R. Gao Incremental Computation of Dominator Trees ACM Transactions on Programming Languages and Systems, Vol. 19, No. 2, Pages: 239-252, March 1997 Vugranam C. Sreedhar, Guang R. Gao and Yong-fong Lee A Quadratic Time Algorithm for Computing Multiple Node Immediate Dominators Journal of Programming Languages, 1996 Vugranam C. Sreedhar, Guang R. Gao and Yongfong Lee The W-Network: A Low-Cost Fault-Tolerant Multistage Interconnection Network for Fine-Grain Multiprocessing Concurrency: Practice and Experience, 8(6):415-428, July-August 1996 Kevin B. Theobald A Framework for Resource-constrained Rate-optimal Software Pipelining IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 11, Pages: 1133-1149, November 1996 Ramaswamy Govindarajan, Erik R. Altman and Guang R. Gao A Study of the EARTH-MANNA Multithreaded System International Journal of Parallel Programming, Vol. 24, No. 4, Page 319-347, August 1996 Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao and Laurie J. Hendren Identifying Loops Using DJ Graphs ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 18, No. 6, Pages: 649 . 658, November 1996 Vugranam Sreedhar, Guang R. Gao and Yongfong Lee A Linear Time Algorithm for Placing OE-nodes Journal of Programming Languages, 1995. Accepted Vugranam C. Sreedhar and Guang R. Gao Automatic Data and Computation Decomposition for Distributed Memory Machines Parallel Processing Letters, Vol. 5, No. 4, Pages: 539-550, April 1995 Qi Ning, Vincent V. Dongen and Guang R. Gao Computing phi-nodes in Linear Time Using DJ Graphs Journal of Programming Languages, Vol. 3, Pages: 191-213, April 1995 Vugranam C. Sreedhar and Guang R. Gao ABC++: Concurrency by Inheritance in C++ IBM Systems Journal, Vol. 34, No. 1, Pages: 120-137, 1995 Eshrat Arjomandi, William O'Farrell, Ivan Kalas,Gita Koblents, Frank Ch. Eigler and Guang. R. Gao Rate-optimal Schedule for Multi-rate DSP Computations Journal of VLSI Signal Processing, Vol. 9, No.3, Pages: 211-232, April 1995 Ramaswamy Govindarajan and Guang R. Gao An Efficient Hybrid Dataflow Architecture Model Journal of Parallel and Distributed Computing, Vol. 19, No. 4, Pages: 293-307, December 1993 Guang. R. Gao A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs The Journal of Programming Languages, Vol. 1, No. 3, Pages: 155-185, 1993 Laurie J. Hendren, Guang R. Gao, Erik R. Altman and Chandrika Mukerji Optimal Loop Storage Allocation for Argument-fetching Dataflow Machines International Journal of Parallel Programming, Vol. 21, No. 6, Pages: 421-448, December 1992 Qi Ning and Guang R. Gao A High-speed Memory Organization for Hybrid Dataflow/von Neumann Computing Future Generation Computer Systems, Vol. 8, Pages: 287-301, 1992 Herbert H. J. Hum and Guang. R. Gao Toward Efficient Fine-grain Software Pipelining and the Limited Balancing Techniques International Journal of Mini and Microcomputers, Vol. 13, No. 2, Pages: 57-68, 1991 Guang. R. Gao, Herbert H. J. Hum and Yue-Bong Wong Exploiting Fine-grain Parallelism on Dataflow Architectures Parallel Computing, Vol. 13, No. 3, Pages: 309-320, March 1990 Guang R. Gao
CAPSL Technical Memo 140:
Technical Memos clCodeletPipe API Documentation : Implementation of Codelet Pipe on Intel Iris Pro Architecture (Available upon request -- Please contact sraskar -AT- udel.edu) Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Ga May 2021 CAPSL Technical Memo 139: iRealization of Dataflow Software Pipelining for Codelet Model using Hardware-Software Co-design (Available upon request -- Please contact sraskar -AT- udel.edu) Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao May 2021 CAPSL Technical Memo 138: DECARD & DEMAC, A Distributed Runtime and a Modular Cluster for Embedded Systems (Available upon request -- Please contact diegor -AT- udel.edu) Diego A. Roa Perdomo, Jose M. Monsalve Diaz, and Guang Gao December 2020 CAPSL Technical Memo 137: Towards Surrogate Model aware Program Execution Model (Available upon request -- Please contact sraskar -AT- udel.edu) Siddhisanket Raskar and Guang Gao January 2021 CAPSL Technical Memo 136: DEMAC and CODIR: A whole stack solution for a HW/SW co-design using an MLIR Codelet Model Dialect D. Roa Perdomo, R. Kabrick, J. Monsalve Diaz, S. Raskar, D. Fox, G. Gao May 2020 CAPSL Technical Memo 135: Study of Dataflow Software Pipelining under Codelet Model using Cannons Algorithm (Available upon request -- Please contact sraskar -AT- udel.edu) Siddhisanket Raskar, Jose M Monsalve Diaz, Thomas Applencourt, Kalyan Kumaran, and Guang Gao February 2020 CAPSL Technical Memo 134: Brain-Flow : A brain inspired dataflow implementation using DEMAC Diego Roa, Ryan Kabrick, Siddhisanket Raskar, Jose M Monsalve Diaz and Guang Gao October 2019 CAPSL Technical Memo 133: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-design (Available upon request -- Please contact sraskar -AT- udel.edu) Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao June 2019 CAPSL Technical Memo 132: Sequential Codelet Model for Parallel Execution (Available upon request -- Please contact josem -AT- udel.edu) Jose M Monsalve Diaz, and Guang R Gao June 2019 CAPSL Technical Memo 131: Toward A Parallel Turing Machine Model Peng Qu, Jin Yan, and Guang Gao July 2016 CAPSL Technical Memo 130: Multigrain Parallelism: Compiling Coarse-Grain Parallel Programs for Fine-Grain Execution Jaime Arteaga, Stephane Zuckerman, and Guang Gao April 2016 CAPSL Technical Memo 129: A Multiscale Modeling and Simulation Methodology for Financial Market Stability and Risk Analysis Guang Gao, Paul Laux, Bintong Chen, Xiaoming Li and Stephane Zuckerman March 2016 CAPSL Technical Memo 128: Massively Multi-Core Systems and Virtual Memory Guang Gao and Jack B. Dennis April 2014 CAPSL Technical Memo 127: Architecture and Programming Model for High Performance Interactive Computation Jack B. Dennis, Arvind, Guang R. Gao, Xiaoming Li, and Lian-Ping Wang April 2014 Full Document available on request CAPSL Technical Memo 126: SPARTA: a Stream-based Processor And Run-Time Architecture Jean-Luc Gaudiot, Ahmed Louri, Guang R. Gao March 2014 Available on request CAPSL Technical Memo 125: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading Jaime Arteaga, Elkin Garcia, Stephane Zuckerman, Robert Pavel and Guang Gao January 2014 CAPSL Technical Memo 124: Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture Elkin Garcia, Jaime Arteaga, Robert Pavel and Guang R. Gao July 2013 CAPSL Technical Memo 123: Toward a Self-aware System for Exascale Architectures Aaron Landwehr, Stéphane Zuckerman and Guang R. Gao June 2013 CAPSL Technical Memo 122: SMART: a Stream-based Multi-core Architecture & Runtime Technology Jean-Luc Gaudiot, Guang R. Gao, Elkin Garcia, Ganghee Jang, Souad Koliai and Haitao Wei February 2013 Available on request CAPSL Technical Memo 121: Toward Exascale Systems: from Applications to Architectures Jean-Luc Gaudiot, Guang R. Gao, Elkin Garcia, Ganghee Jang and Souad Koliai October 2012 Available on request CAPSL Technical Memo 120: Leveraging Dataflow Execution Models for Exascale Performance Stephane Zuckerman, Marco Solinas, Souad Koliai, Guang R. Gao and Roberto Giorgi August, 2012 Available on request CAPSL Technical Memo 119: Determinacy and Repeatability of Parallel Program Schemata Jack B Dennis, Guang R. Gao and Vivek Sarkar August, 2012 Available on request CAPSL Technical Memo 118: Performance Modeling of Fine Grain Task Execution Models with Resource Constraints on Many-core Architectures Elkin Garcia, Robert Pavel, Daniel Orozco and Guang R. Gao June, 2012 Available on request CAPSL Technical Memo 117: Demystifying Performance Predictions of Distributed FFT3D Implementations Daniel Orozco, Elkin Garcia, Robert Pavel, Orlando Ayala, Lian-Ping Wang and Guang R. Gao June, 2012 CAPSL Technical Memo 116: MACO: MetadatA Coalescing and Optimizing Framework Juergen Ributzka, Aaron M. Landwehr, Sunil Shrestha and Guang R. Gao June, 2012 Available on request CAPSL Technical Memo 115: Design Manual for the Fresh Breeze Simulator Xiaoxuan Meng, Tom St. John, Jack B. Dennis and Guang R. Gao April, 2012 CAPSL Technical Memo 114: Massively Parallel Breadth-First Search Using a Tree-Structured Memory Model Tom St. John, Jack B. Dennis and Guang R. Gao April, 2012 CAPSL Technical Memo 113: Toward a Highly Parallel Framework for Discrete-Event Simulation Robert Pavel, Elkin Garcia, Daniel Orozco and Guang R. Gao April, 2012 Available on request CAPSL Technical Memo 112: A Fresh Foundation for Software/Hardware Co-Design of Exascale Computing Systems Jack B. Dennis, Guang R. Gao, Chengmo Yang, Xiaoming Li, Robert Pavel, Aaron Landwehr, Daniel Orozco and Kelly Livingston February, 2012 Available on request CAPSL Technical Memo 111: Toward Efficient Fine-grained Dynamic Scheduling on Many-Core Architectures Elkin Garcia, Daniel Orozco, Robert Pavel and Guang R. Gao February, 2012 CAPSL Technical Memo 110: SHF:Large:Collaborative Research: Power-Efficient Fault Resilience in Massively Parallel Computing Guang R. Gao, Jack B. Dennis and Chengmo Yang November, 2011 Available on request CAPSL Technical Memo 109: Comparative Evaluation of Alternative Program Execution Models Jack B. Dennis, Robert Pavel and Guang R. Gao September, 2011 Available on request CAPSL Technical Memo 108: Code Partition and Overlays: A Reintroduction to High Performance Computing Joseph B. Manzano, Ge Gan, Juergen Ributzka, Sunil Shrestha and Guang R. Gao August, 2011 CAPSL Technical Memo 107: TIDeFlow: The Time Iterated Dependency Flow Execution Model Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang R. Gao August, 2011 CAPSL Technical Memo 106: C64prof: A Parallel Pro?ling Environment for the Cyclops64 Architecture Mark Pellegrini and Guang R. Gao June, 2011 CAPSL Technical Memo 105: Polytasks: A Compressed Task Representation for HPC Runtimes Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang Gao June, 2011 CAPSL Technical Memo 104: Toward an Execution Model for Extreme-Scale Systems-Runnemede and Beyond Guang R. Gao, Joshua Suetterlein and Stephane Zuckerman April, 2011 Available on request CAPSL Technical Memo 103: High Throughput Queue Algorithms Daniel Orozco, Elkin Garcia, Rishi Khan, Kelly Livingston and Guang R. Gao January, 2011 CAPSL Technical Memo 102: Energy efficient tiling on a Many-Core Architecture Elkin Garcia, Daniel Orozco and Guang R. Gao October, 2010 CAPSL Technical Memo 101: Locality Optimization of Stencil Applications using Data Dependency Graphs Daniel Orozco, Elkin Garcia and Guang R. Gao October, 2010 CAPSL Technical Memo 100: Experiments with the Fresh Breeze Tree-Based Memory Model Jack B. Dennis, Guang R. Gao and Xiao X. Meng October, 2010 CAPSL Technical Memo 99 Revised: The Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures Juergen Ributzka, Yuhei Hayashi and Guang R. Gao April, 2011 CAPSL Technical Memo 99: The Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures Juergen Ributzka, Yuhei Hayashi and Guang R. Gao June, 2010 CAPSL Technical Memo 98: Dynamic Percolation - Mapping Dense Matrix Multiplication on a Many-Core Architecture Elkin Garcia, Rishi Khan, Kelly Livingston, Ioannis E. Venetis and Guang R. Gao June, 2010 CAPSL Technical Memo 97: TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, and Guang R. Gao May, 2010 CAPSL Technical Memo 96: Many-Core Chip Architecture - A Report on a Novel Architecture/Software Co-Verification Platform Juergen Ributzka, Yuhei Hayashi and Guang R. Gao April, 2010 CAPSL Technical Memo 95: Optimized Dense Matrix Multiplication on a Many-Core Architecture Elkin Garcia, Ioannis E. Venetis, Rishi Khan and Guang R. Gao February, 2010 CAPSL Technical Memo 94: Synchronization for Dynamic Task Parallelism on Manycore Architectures Yonghong Yan, Sanjay Chatterjee, Daniel Orozco, Elkin Garcia, Jun Shirako, Zoran Budimlic, Vivek Sarkar and Guang Gao February, 2010 CAPSL Technical Memo 93: A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures Chen Chen, Joseph B Manzano, Ge Gan, Guang R. Gao, Vivek Sarkar February, 2010 CAPSL Technical Memo 92: Establishing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs Chen Chen, Wenguang Chen, Vugranam Sreedhar, Rajkishore Barik, Vivek Sarkar and Guang Gao January, 2010 CAPSL Technical Memo 91: Diamond Tiling: A Tiling Framework for Time-iterated Scientific Applications. Daniel Orozco and Guang Gao December, 2009 CAPSL Technical Memo 90: Analysis and Performance Results of Computing Betweenness Centrality on IBM Cyclops64 Guangming Tan, Vugranam Sreedhar, Guang R. Gao October, 2009 CAPSL Technical Memo 89: Formalizing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs Chen Chen, Wenguang Chen, Vugranam Sreedhar, Rajkishore Barik, Vivek Sarkar and Guang Gao July, 2009 CAPSL Technical Memo 88: Collaborative Research: Programming Models and Storage System for High Performance Computation with Many-Core Processors Jack B. Dennis, Guang R Gao and Vivek Sarkar May 11th, 2009 CAPSL Technical Memo 87: Mapping the FDTD Application to Many-Core Chip Architectures Daniel A. Orozco and Guang R. Gao. March 3rd, 2009 CAPSL Technical Memo 86: A Study of Different Instantiations of the OpenMP Memory Model and Their Software Cache Implementations Chen Chen, Joseph B Manzano, Ge Gan, Guang R. Gao and Vivek Sarkar. January, 2009 CAPSL Technical Memo 85: Tile Reduction: an OpenMP Extension for Tile Aware Parallelization Ge Gan, Xu Wang, Joseph B Manzano and Guang R. Gao December, 2008 CAPSL Technical Memo 84: Optimizing the LU Benchmark for the Cyclops-64 Architecture. Ioannis E. Venetis and Guang R. Gao July 8th, 2009 CAPSL Technical Memo 83: Analysis and Performance Results of Computing Betweeness Centrality on IBM Cyclops64 Guangming Tan, Andrew Russom Vugranam Sreedhar and Guang R Gao April 9th, 2008 CAPSL Technical Memo 82: A New Cache Protocol Based on the Order Free Consistency Memory Model Chen Chen, Joseph B Manzano, Ge Gan, Guang R Gao and Vivek Sarkar May, 2008 CAPSL Technical Memo 81: Performance Tuning of the Fast Fourier Transform on a Multicore Architecture Liping Xue, Long Chen, Ziang Hu and Guang R Gao Febraury 8th, 2008 CAPSL Technical Memo 80: Order Free Consistency: Towards a Fully Asynchronous Memory Model Chen Chen, Joseph B Manzano, Wenguang Chen and Guang R Gao November, 2007 CAPSL Technical Memo 79: Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers Yuan Zhang, Evelyn Duesterwald and Guang R Gao November, 2007 CAPSL Technical Memo 78: Implementation of the Smith-Waterman Algorithm on A Reconfigurable Supercomputing Platform Peiheng Zhang, Guangming Tan and Guang R. Gao April 16th, 2007 CAPSL Technical Memo 77: A Study of Parallel Betweenness Centrality Algorithm on a Many-core architecture Guangming Tan and Guang R. Gao June 27th, 2007 CAPSL Technical Memo 76: FAME: Financial Application with Many-core-on-a-chip Architecture Weirong Zhu, Parimala Thulasiraman, Ruppa K. Thulasiram and Guang R. Gao February 17th, 2006 CAPSL Technical Memo 75: Optimizing the LU Benchmark for the Cyclops-64 Architecture Ioannis E. Venetis and Guang R. Gao February, 2007 CAPSL Technical Memo 74: Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the IBM Cyclops-64 Multithreaded Architecture Ge Gan, Ziang Hu, Juan del Cuvillo and Guang R. Gao January, 2007 CAPSL Technical Memo 73: A Parallel Dynamic Porgramming Algorithm on a Multi-core Architecture Guangming Tan and Guang R. Gao February, 2007 CAPSL Technical Memo 72: Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Yingping Zhang, Murat Bolat and Guang R. Gao December 30, 2006 CAPSL Technical Memo 71: An Automatic Methodology for Program Segment-based Compiler Optimization Search Haiping Wu, Eunjung Park, Murat Bolat, Mihailo Kaplarevic, Yingping Zhang, Xiaoming Li and Guang R. Gao November 14, 2006 CAPSL Technical Memo 70: Handling Massive Parallelism Efficiently: Introducing Batches of Threads Ioannis E. Venetis, Theodore S. Papatheodorou and Guang R. Gao October 18, 2006 CAPSL Technical Memo 69: Software Pipelining On Multi-core Chip Architectures: A case study on IBM Cyclops-64 Chip Architure Alban Douillet, Junmin Lin and Guang R. Gao February 14, 2006 CAPSL Technical Memo 68: Server I/O Acceleration Using an Embedded Multi-core Architecture Lurng-Kuo Liu, Fei Chen, Christos J. Georgiou and Guang R. Gao May 12, 2006 CAPSL Technical Memo 67 Revised: Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu and Guang R. Gao November 20, 2006 CAPSL Technical Memo 67: Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look Weirong Zhu, Ziang Hu, and Guang R. Gao July 17, 2006 CAPSL Technical Memo 66: An Efficient Communication Infrastructure for the IBM Cyclops-64 Computer System Ge Gan, Ziang Hu, Juan del Cuvillo and Guang R. Gao June 12, 2006 CAPSL Technical Memo 65: Optimized Lock Assignment and Allocation for Productivity: A Method for Exploiting Concurrency among Critical Sections Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar and Guang R. Gao May 10th, 2006 CAPSL Technical Memo 64: Multidimensional Kernel Generation for Loop Nest Software Pipelining Alban Douillet, Hongbo Rong and Guang R. Gao Febraury 13th, 2006 CAPSL Technical Memo 63: A New Framework for Analysis and Optimization of Shared Memory Parallel Programs" Vugranam C. Sreedhar, Yuan Zhang and Guang R. Gao July 18th, 2005 CAPSL Technical Memo 62: " FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture" Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao June 17th, 2005 CAPSL Technical Memo 61: "P3I: Delaware's Programmability, Productivity and Proficiency Inquiry" Joseph B. Manzano, Yuan Zhang and Guang R. Gao June 10th, 2005 CAPSL Technical Memo 60: "Performance Analysis of Interconnection Network of Cyclops-64 Chip Architecture" Yingping Zhang, Taikyeong Jeong, Fei Chen, Ronny Nitzsche and Guang R. Gao June 1st, 2005 CAPSL Technical Memo 59: "Concurrency Analysis and Its Applications" Yuan Zhang and Guang Gao May 28th, 2005 CAPSL Technical Memo 58: "Register Pressure in Software Pipelined Loop Nests: Fast Computation and Impact on Architecture Design" Alban Douillet, Hongbo Rong and Guang R. Gao May 3rd, 2005 CAPSL Technical Memo 57: "Parallel Reconstruction for Parallel Imaging SPACERIP on Cellular Architecture" Yuanwei Niu, Ziang Hu and Guang R. Gao June 15, 2004 CAPSL Technical Memo 56: "Quasi consensus based comparison of profile hidden Markov models for protein sequences" Robel Y. Kahsay, Guoli Wang, Li Liao, Roland Dunbrack and Guang R. Gao May 28, 2004 CAPSL Technical Memo 55: "Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture" Juan B. del Cuvillo, Ziang Hu, Weirong Zhu, Fei Chen and Guang R. Gao April 26, 2004 CAPSL Technical Memo 54: "Speeding up CG on Cluster with Two Dimensional Blocking Method and EARTH Runtime Support" Fei Chen, Kevin B. Theobald and Guang R. Gao April 23, 2004 CAPSL Technical Memo 53: "Lamport Order Revisit: A Study on How to Efficiently Achieve Sequential Consistency on a Modern Multiprocessor-on-a-Chip Architecture" Yuan Zhang, Weirong Zhu, Fei Chen, Ziang Hu and Guang R. Gao March 01, 2004 CAPSL Technical Memo 52: "Analyzable Atomic Sections: Integrating Fine-Grained Synchronization and Weak Consistency Models for Scalable Parallelism" Vivek Sarkar and Guang R. Gao February 09, 2004 CAPSL Technical Memo 51: "Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops" Hongbo Rong, Alban Douillet, R.Govindarajan and Guang R. Gao September 26, 2003 CAPSL Technical Memo 49: "Single-Dimension Software Pipelining for Multi-Dimensional Loops" Hongbo Rong, Zhizhong Tang, R.Govindarajan, Alban Douillet and Guang R. Gao September 26, 2003 CAPSL Technical Memo 48: "Programming Method and software Infrastructure for Cellular Architecture" Guang R. Gao, Juan del Cuvillo, Ziang Hu, Robert Klosiwicz, Clement Leung, Jason McGuiness, Hirofumi Sakane, Yingping Zhang September 16, 2003 CAPSL Technical Memo 47: "Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation" Hongbo Yang, R. Govidarajan, Guang R. Gao and Ziang Hu September 9, 2003 CAPSL Technical Memo 45: "Selective Slim Scheduling: On Software Pipelining of Loop Nests" Hongbo Rong, Zhizhong Tang, R. Govidarajan, Guang R. Gao June 8, 2003 CAPSL Technical Memo 44: "Algorithms, Applications, and Environments for Emerging Petascale Architectures" R. Govindarajan, H. Tufo, S. Thomas, R. Loft, Guang R. Gao, J. Moreira and J.Castanos March 6, 2003 CAPSL Technical Memo 43: "Executable Performance Model and Evaluation of High Performance Architectures with Percolation" Adeline Jacquet, Vincent Janot, R. Govindarajan, Clement Leung, Guang R. Gao and Thomas Sterling November 21, 2002 CAPSL Technical Memo 42: "A Quantitative Study on Performance-Power Impact of Dual-Speed Pipeline Architectures" Hongbo Yang, R.Govindarajan, Guang R. Gao and Kevin B. Theobald June 13, 2002 CAPSL Technical Memo 41: "Maximizing Pipelined Functional Units Usage for Minimum Power Software Pipelining" Hongbo Yang, R.Govindarajan, Guang R. Gao and George Cai September 27, 2001 CAPSL Technical Memo 40: "New Normalization Method and Error Analysis for Gene Expression Microarray Data" Stanley D. Luck, Francisco Jose Useche G., Wellington S. Martins and Guang R. Gao December 11, 2000 CAPSL Technical Memo 39: "Threaded-C Language Reference Manual (Release 2.0)" Guy Tremblay, Kevin B.Theobald, Christopher J.Morrone, Mark D.Butala, Jose Nelson Amaral and Guang R. Gao September 23, 2000 CAPSL Technical Memo 38: "Automatic Prefetching of Induction Pointers" Artour Stouctchinin, Jose Nelson Amaral, Guang R. Gao, Jim Dehnert, Suneel Jain and Alban Douillet April 18, 2000 CAPSL Technical Memo 37: "Automatic Prefetching of Induction Pointers for Software Pipelining" Artour Stoutchinin, Jose Nelson Amaral, Guang R. Gao, Jim Dehnert and Suneel Jain November 12, 1999 CAPSL Technical Memo 36: "Minimum Register Instruction Sequence Problem: Revisiting Large Optimal" R. Govindarajan, Hongbo Yang, Chihong Zhang, Jose Nelson Amaral and Guang R. Gao November 12, 1999 CAPSL Technical Memo 35: "A Comparative Performance Study of Fine-Grain Multi-Threading on Distributed Memory Machines" Prasad Kakulavarapu, Christopher J. Morrone, Kevin B. Theobald, Jose Nelson Amaral and Guang R. Gao November 11, 1999 CAPSL Technical Memo 34: "Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System" Wen-Yen Lin, Jose Nelson Amaral, Jean-Luc Gaudiot and Guang Gao October 13, 1999 CAPSL Technical Memo 33: "Definition of the EARTH Model" Kevin B. Theobald October 6, 1999 CAPSL Technical Memo 32: "The Benefits of Hardware-Assisted Fine-Grain Multithreading" Kevin B. Theobald and Guang R. Gao July 20, 1999 CAPSL Technical Memo 31: "HTMT Phase 2 Report" Guang R Gao, Jose Nelson Amaral, Andres Marquez, Kevin B. Theobald, Sean Ryan, Zachary Ruiz, Thomas Geiger and Christopher J. Morrone July 19, 1999 CAPSL Technical Memo 30: "Design and Implementation of an Eefficient Thread Partitioning Algorithm" Jose Nelson Amaral, Guang R. Gao, Erturk Dogan Kocalar, Patrick O'Neil and Xiang Tang July 1, 1999 CAPSL Technical Memo 29: "Advances in Dataflow Computational Model" Walid A Najjar, Edward A. Lee and Guang R. Gao April 1, 1999 CAPSL Technical Memo 28: "Efficient State-Diagram Construction Methods for Software Pipelining" Chihong Zhang, R. Govindarajan, Sean Ryan and Guang R. Gao March 5, 1999 CAPSL Technical Memo 27: "SEMi: A Simulator for EARTH, MANNA, and i860" Kevin Theobald March 1, 1999 CAPSL Technical Memo 26: "An HTMT Performance Prediction Case Study: Implementing Cannon's Dense Matrix Multiply Algorithm" Jose Nelson Amaral, Guang R. Gao, Phillip Merkey, Thomas Sterling, Zachary Ruiz and Sean Ryan February 17, 1999 CAPSL Technical Memo 25: "Option Pricing Problem on a Multithreaded Parallel Architecture" Ruppa K. Thulasiram and Guang R.Gao November 11, 1998 CAPSL Technical Memo 24: "Design of the Runtime System for the Portable Threaded-C Language" Prasad Kakulavarapu, Olivier Maquelin and Guang R. Gao July 21, 1998 CAPSL Technical Memo 23: "Automatically Partitioning Threads Based on Remote Paths" Xinan Tang and Guang R. Gao July 20, 1998 CAPSL Technical Memo 22: "A Refinement of the HTMT Program Execution Model" Guang Gao, Jose Nelson Amaral, Andres Marquez and Kevin Theobald" July 13, 1998 CAPSL Technical Memo 21: "Self-Avoiding Walks Over Two-Dimensional Adaptive Unstructured Grids" Gerd Heber, Rupak Biswas and Guang R. Gao April 20, 1998 CAPSL Technical Memo 20: "Using Multithreading for the Automatic Load Balancing of 2-D Adaptive Finite Element Meshes" Gerd Heber, Rupak Biswas,Parimala Thulasiraman and Guang R. Gao March 16, 1998 CAPSL Technical Memo 19: "Overview of the Threaded-C Language" Kevin B. Theobald, Jose Nelson Amaral, Gerd Herber, Oliver Maquelin, Xinan Tang and Guang R. Gao March 16, 1998 CAPSL Technical Memo 18: "A Superstrand Architecture" Andres Marquez, Kevin B. Theobald, Xinan Tang, Thomas L. Sterling and Guang R. Gao March 14, 1998 CAPSL Technical Memo 17: "An Enhanced Co-Scheduling Method Using Reduced MS-State Diagrams" R. Govindarajan, N.S.S. Narasimha Rao, Erik R. Altman and Guang R. Gao February 18, 1998 CAPSL Technical Memo 16: "Location Consistency -- A New Memory Model and Cache Consistency Protocol" Guang R. Gao and Vivek Sarkar February 16, 1998 CAPSL Technical Memo 15: "Superconducting Processors for HTMT: Issues and Challenges" Kevin B. Theobald, Guang R. Gao and Thomas L. Sterling December 15, 1997 CAPSL Technical Memo 14: "A Superstrand Architecture" Andres Marquez, Kevin B. Theobald, Xinan Tang and Guang R. Gao December 1, 1997 CAPSL Technical Memo 13: "Partial Sampling with Reverse State Reconstruction: A New Technique for Branch Predictor Performance Estimation" Darren E. Vengroff and Guang R. Gao CAPSL Technical Memo 11: "Heap Analysis and Optimizations for Threaded Programs" Xinan Tang, Rakesh Ghiya, Laurie J. Hendren and Guang R. Gao November 7, 1997 CAPSL Technical Memo 10: "A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors" Raul Silvera, Jian Wang and Guang R. Gao CAPSL Technical Memo 09: "The HTMT Program Execution Model" Guang R. Gao, Kevin B. Theobald, Andres Marquez and Thomas Sterling July 18, 1997 CAPSL Technical Memo 08: "Benefits of Efficient Multithreading on Distributed Memory for the Parallelization of Communication-Intensive Applications" Angela C. Sodan and Guang R. Gao CAPSL Technical Memo 07: "An Interger Linear Programming Model of Software Pipelining for the MIPS R8000 Processor" Artour Stoutchinin CAPSL Technical Memo 06: "A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops" Sylvain Lelait, Guang R. Gao and Christine Eisenbeis CAPSL Technical Memo 05: "Design and Evaluation of Dynamic Load Balancing Schemes under A Multithreaded Execution Model" Haiying Cai, Olivier Maquelin and Guang R. Gao CAPSL Technical Memo 04: "Non-Clustered Statistical Trace Sampling for Large Cache Design Space Exploration" Darren E. Vengroff, Kenneth Simpson and Guang R. Gao CAPSL Technical Memo 03: "Thread Partitioning and Scheduling Based on Cost Model" Xinan Tang, Jian Wang, Kevin B. Theobald and Guang R. Gao April 15, 1997 CAPSL Technical Memo 02: "Elastic History Buffer: A Low-Cost Method to Improve Branch Prediction Accuracy" Maria-Dana Tarlescu, Kevin B. Theobald and Guang R. Gao November 14, 1996 CAPSL Technical Memo 01: "Hybrid Technology Multithreaded Architecture" Guang R. Gao, Konstantin K. Likharev, Paul C. Messina and Thomas L. Sterling
CAPSL Technical Note 23:
Technical Notes "A Brief Overview of the PICASim Model and Framework - Draft" Robert Pavel May, 2014 CAPSL Technical Note 22: "Overview of the UHPC Execution Model" Guang Gao June, 2009 CAPSL Technical Note 21: "Experiences Porting Mstack to ParalleX" Mark Pellegrini August, 2008 CAPSL Technical Note 20: "The EDIF2KSF Converter" Jonathan Barton August, 2007 CAPSL Technical Note 19: "Mrs. Clops Tool Chain Manual" Matthew Wells March, 2006 CAPSL Technical Note 18: "ASAP Low-Level Connection Library" Inanc Dogru March, 2006 CAPSL Technical Note 17: "C64 DDR Verification and Critical Path Reduction" Michael Bodnar September, 2005 CAPSL Technical Note 16: "The Cyclops-E Emulation Environment" Juan del Cuvillo and Nathaniel Merritt. August, 2005 CAPSL Technical Note 15: "SLICED: a Source Level Interacting Cyclops-64 Effective Debugger" Geoff Gerfin and Ziang Hu. August 26, 2004 CAPSL Technical Note 14: "DISC64: A Disassembler for the Instruction Set of Cyclops-64" John Tully August 5, 2004 CAPSL Technical Note 13: "Generate the Multiple and Add Operation during the WHIRL Lowering Phase Joseph Bryant Manzano Franco and Haiping Wu May 31, 2004 CAPSL Technical Note 12: "Integrate EBO with Pattern Matching" Divya Parthasarathi May 28, 2004 CAPSL Technical Note 11: "A DIMES Demonstration Application: Mandelbot-Set Generation Using a Work-Stealing Algorithm" Jason M. McGuiness June 15, 2002 CAPSL Technical Note 10 Revised: "A Software Development Kit for CeDIMES" Juan del Cuvillo, Robert Klosiewicz and Yingping Zhang March 15, 2005 CAPSL Technical Note 10: "A Software Development Kit for CeDIMES" Juan del Cuvillo, Robert Klosiewicz and Yingping Zhang September 30, 2002 CAPSL Technical Note 09: "Threaded-C Release 2.0: Motivation, Description, and Rationale" Guy Tremblay June 15, 2000 CAPSL Technical Note 08: "Runtime Locality Transformations for NAS Conjugate Gradient (Sparse Matrix Computation)" Rishi Kumar, Nathaniel Johnson, Ruppa K. Thulasiram, Gagan Agrawal, Guang R. Gao December 17, 1999 CAPSL Technical Note 07: "Computational Financial Derivatives ---A Primer" Ruppa K. Thulasiram, Guang R. Gao October 9, 1998 CAPSL Technical Note 06: "Debugging: The `Feedback' Way" James P. Durbano October 9, 1998 CAPSL Technical Note 05: "Portable Threaded-C Release 1.1" Jos'e Nelson Amaral, Zachary Ruiz, Sean Ryan, Andres Marquez, Christopher Morrone, Prasad Kakulavarapu, Guang R. Gao October 8, 1998 CAPSL Technical Note 04: "Implementation of I-Structures as a Library of Functions in Portable Threaded-C" Jos'e Nelson Amaral, Guang R. Gao June 15, 1998 CAPSL Technical Note 03: "Proposed Changes to Threaded-C" Kevin B. Theobald January 20, 1998 CAPSL Technical Note 02: "A Portable Threaded-C Language for EARTH Multiprocessors" Xinan Tang, Olivier Maquelin, Kevin B. Theobald, Guang R. Gao, Prasad Kakulavarapu January 6, 1998 CAPSL Technical Note 01: "An Overview of the Threaded-C Language" Guang R. Gao, Xinan Tang, Parimala Thulasiraman, Kevin B. Theobald July 25, 1997
CAPSL Theses Ph.D. ThesesMasters Theses"Toward High Performance and Energy Efficiency on Manycore Architectures" Elkin Garcia Summer 2014 "Concurrency and Synchronization in the Modern Many-Core Era: Challenges and Opportunities" Juergen Ributzka Spring 2013 "TIDeFlow: A Dataflow-inspired execution model for high performance computing programs" Daniel Orozco Spring 2012 "A comparison between virtual code management techniques" Joseph B. Manzano Summer 2011 "Exploring novel many-core architectures for scientific computing" Long Chen Fall 2010 "Programming Model and Execution Model for OpenMP on the Cyclops-64 many-core processor" Ge Gan Spring 2010 "Enabling System Validation for the many-core Supercomputer" Fei Chen Summer 2009 Available on request "Breaking away from the OS Shadow: A Program Execution Model Aware Thread Virtual Machine for Multicore Architectures" Juan del Cuvillo Summer 2008 "Static Analyses and Optimizations for Parallel Programs with Synchronization" Yuan Zhang Summer 2008 "Efficient Synchronization for a Large-Scale Mult-Core Chip Architecture" Weirong Zhu Spring 2007 "Advanced Protein Sequence Analysis Methods for Structure and Function Prediction" Robel Y. Kahsay Spring 2005 "The CARE Architecture" Andrés Marquez Winter 2004 "Power-Aware Compilation Techniques for High Performance Processors" Hongbo Yang Fall 2003 "Irregular Computations on Fine-Grain Multithreaded Architecture" Parimala Thulasiraman Fall 2000 "Compiling for Multithreaded Architectures" Xinan Tang Fall 1999 "EARTH: An Efficient Architecture for Running Threads" Kevin Bryan Theobald Spring 1999
Spring 2014 "Memory Optimization in Codelet Execution Model on Many-Core Architectures" Yao Wu Spring 2014 "Tapestry: Weaving Execution and Synchronization Models" Joshua Landwehr Winter 2013 "Parallel Low-Overhead Data Collection Framework for a Resource Centric Performance Analysis Tool" Sunil Shrestha Spring 2012 "Memory State Flow Analysis and Its Application" Xiaomi An Winter 2011 "Toward a software pipelining framework for many-core chips" Juergen Ributzka Summer 2009 "Optimizing the Fast Fourier Transform on a Many core Architecture" Long Chen Winter 2008 "Design and Implementation of Tool-chain framework to support OpenMP Single Source Compilation on CELL platform" Yi Jiang Winter 2007 "A Study of Simulation and Verification of a Many-core Architecture on two modern reconfigurable platforms" Dimitrij Krepis Summer 2007 "Methodology of Dynamic Compiler Option Selection Based on Static Program Analysis - Implementation and Evaluation" Eun Jung Park Summer 2007 "Efficient Mapping of Fast Fourier Transform on the Cyclops-64 Multithreaded Architecture" Liping Xue Summer 2007 "Tower Methodology for Verification of Multi-Core Architecture - A Case Study" Divya Parthasarathi Summer 2005 "A Study of Architecture and Performance of IBM Cyclops-64 Interconnection Network" Yingping Zhang Summer 2005 "Quantitive Study of Human-Computer interaction in adaptive search on Mobile Handsets and its Localization for Mandarin Chinesse" Xing Wang Fall 2004 "A Parallel Debugger for the Cyclops Architecture" Robert S. Klosiewic Jr. Summer 2004 "Multithreaded Parallel Implementation of HPMMPFAM on EARTH" Weirong Zhu Spring 2004 "Implementing Parallel CG Algorithm on the EARTH Multithreaded Architecture" Fei Chen Spring 2004 "Code Size Oriented Memory Allocation for Temporary Variables" Yan Xie Winter 2004 "Binary Diffing" Kapil Khosla Fall 2003 "A Portable Runtime System and its Derivation for the Hardware SU Implementation" Chuan Shen Fall 2003 "A Interconnect Architecture for Commodity Off-the-thelf Multiprocessor Emulation Testbed" Mark Lawrence Legutko Spring 2002 "A Visual Perspective to Motif/Pattern Analysis" Praveen R Thiagarajan Summer 2001 "Automated Single Nucleotide Ploymorphism Discovery Pipeline" Francisco Jose Useche Gomez Summer 2001 "Efficient Parallelization of Reductions and Loop Based Programs on EARTH" Rishi Kumar Summer 2001 "Whole Genome Comparison Using A Multithreaded Parallel Implementation" Juan Del Cuvillo Summer 2001 "A EARTH Runtime System For Multi-Processor/Multi-Node Beowulf Cluster" Christopher Jason Morrone Spring 2001 "Implementation Issues of a Hardware-Based EARTH Synchronization Unit" Thomas Geiger Spring 2001 "Register Stack and Optimal Allocation Instruction Placement" Alban Douillet Spring 2001 "Advanced Compilers, Architectures and Parallel Systems" ShaoHua Han Spring 2001 "Dynamic Load Balancing Issues in the EARTH Runtime System" Kamala Prasade Kakulavarapu Fall 1999 "Towards a Custom EARTH Synchronization Unit" Ian Stuart MacKenzie Walker Summer 1999 "Static Instruction Schedule For Dynamic Issue Processor" Raul E. Silvera Muñoz Spring 1997 |
© CAPSL
1996-2013. All Rights Reserved.
capslwww@capsl.udel.edu
|