[Projects][Publications][Misc][Search SSD]

Third Conference on Parallel Computing Technologies (PaCT-95)

Abstracts


Authors
Daniel Etiemble, Cecile Germain
Title
Standard microprocessors versus custom processing elements for massively parallel architectures
Paper 321
Abstract
Choosing a standard microprocessor or a custom processing element as CPU of a massively parallel architecture has major impact on the hardware and software costs. The standard microprocessor have a high performance/cost ratio, but the associated cache hierarchy leads to completely reconsider the programming of the applications that have been developed for vector supercomputers. Low hardware cost is associated to a high software development cost. New approaches, as decoupled or multithread architectures, have low software costs, but expensive hardware costs.

Authors
Franco Gasperoni, Uwe Schwiegelshohn, John Ture
Title
Optimal Loop Scheduling on Multiprocessors: a Pumping Lemma for p-Processor Schedules.
Paper 51
Abstract
This paper addresses the problem of optimally scheduling a cyclic set of interdependent operations (or tasks), representing for instance a program loop. While the existence of optimum schedules has been demonstrated when processors are plentiful, the corresponding problem when the number of available processors is fixed, remains unanswered.

In this work we show that if the operations' dependence graph is strongly connected, then there exists a p-processor optimum schedule, for any p, which is expressible in the form of a loop. To prove this result we have established a general pumping lemma for p-processor schedules akin to the classical pumping lemma for regular languages.


Authors
B. Goossens and D.T. Vu
Title
Further pipelining and multithreading to improve RISC processor speed. A proposed architecture and simulation results.
Paper 327
Abstract
This paper presents a new pipeline architecture which should give today's RISC processors like MIPS R4x00 or DEC 21064 a nearly 60\% speed improvement. The pipe stage critical path, imposed by the 64 bits integer unit, has been cut in half by data slicing and pipelining the four basic arithmetic operators. Moreover, because these features should impose to such a processor a very long latency for external accesses, a multithreading structure has been included. Up to four threads may be simultaneously run with a no delay context switch. Thus, multithreading is mainly used as a latency hiding technique for external accesses. In order to estimate the real benefit of the construct, a simulator has been built. Simulation results show the impact of pipeline improvements without multithreading (29\%) and with it (59\% with four threads).

Authors
Helmar Burkhart
Title
HOW TO TEACH PARALLEL PROCESSING ?
Paper
Abstract
Parallel processing is a field that has shown continuous technology changes during the past years. While hardware technology improved fast, software concepts and above all, abstract machine models, made less progress. New generations of systems with ever-changing low-level software interfaces made it hard for programmers to produce software in time. Thus, parallel processing is still an immature field that lacks widely accepted and standardized concepts. Teaching a field that makes such rapid developments is hard. This paper will analyze the present situation and propose a more structured approach that allows separation of programming concepts and architectural details without losing contacts to practice.

Authors
A.R. Hurson and B.U. Jun
Title
Optimization Scheme on Execution of Logic Program in a Dataflow Environment
Paper 204
Abstract
We have developed a technique that maps logic programs (database queries) onto a dataflow graph to support static scheduling in a multiprocessor environment. A dataflow graph explicitly shows the execution paths, data dependence, and synchronization points in a query. The scheme attempts to properly group fine grain operations Q select and join Q into coarser grains as a means to exploit parallelism while minimizing the communication costs. This leads to a higher hardware utilization and performance.

In this paper, we expand the scope of our scheme by using a set of heuristic rules to assign processes to available processors more efficiently. This is made possible by analyzing the probability of success of each branch in the logic program. In a logic program, the early scheduling and execution of some branches with higher probability of failure leads to a higher possibility to eliminate many other branches. On the other hand, early execution of some branches with higher probability of success can lead to a higher hardware utilization. The extended scheme has been simulated, and its performance has been compared against the original model and the traditional parallel execution paradigm of the logic program.


Authors
P. Hartmann
Title
Parallel and Distributed Processing of Cellular Hypergraphs
Paper 57
Abstract
In this paper it will be explained how cellular hypergraphs (CHG) can be easily distributed to a network of processor nodes. Replacement systems (CHGRS) can be used to describe the dynamics of a CHG. A CHGRS can operate in a conflict-free and synchronous-parallel manner and it can be implemented in a multiprocessor system with little overhead. As a consequence CHGRS can be used as abstract models for many natural phenomena and can support the efficient simulation on a multiprocessor system. The two major questions will be discussed: How a CHGRS can be implemented distributively, even if it contains complex replacement rules. And how load balancing can be performed using a non-supervised algorithm.

Authors
O. Bessonov, V. Brailovskaya, V. Polezhaev, B. Roux
Title
Parallelization of the Solution of 3D Navier-Stokes Equations for Fluid Flow in a Cavity with Moving Covers
Paper 386
Abstract
This paper describes the numerical method of solution of 3D Navier-Stokes equations in a regular domain and direct method of parallelization of solution for distributed-memory computers. A vorticity-vector-potential formulation and Finite Difference method of solution are chosen, using fractional step ADI method for vorticity equation and Fourier method for Poisson equation. Special attention is paid to single-processor optimization of the algorithm. Parallelization technology is given in detail, with speedup and efficiency levels achieved for 2 and 4 processors. Numerical results are presented for different geometries and Reynolds numbers.

Authors
A.E. Doroshenko
Title
Programming Abstracts for Synchronization and Communication in Parallel Programs
Paper 157
Abstract
A class of distributed/shared memory parallel programs with static, race free structure of accesses to shared memory is considered and programming abstracts in the form of regular expressions are proposed as synchronization facilities for these programs. Along with more concurrency these facilities can expose with respect to semaphore-like ones they can be applicable for designing efficient communucation schemes for multilevel distributed/shared memory parallel programs.

Authors
V.A. Evstigneev and V.N. Kasyanov
Title
A Program Manipulation System for Fine-grained Architectures
Paper 163
Abstract
The PROGRESS system being implemented at the Institute of Informatics Systems in Novosibirsk is discussed. The system is intended to support rapid prototyping of compilers for high level languages (e.g. Fortran-77, Modula-2, SISAL) and for a family of architectures exploited fine-grained parallelism. The next goal of the project is to develop an environment for investigation of optimizing and restructuring transformations of programs to be parallelized. : parallel processing, fine-grain architectures, transformational approach, program restructuring, multifunctional cooperation

Authors
V.A. Nepomniaschy, G.I. Alekseev, A.V. Bystrov, T.G. Churina, S.P. Mylnikov, E.V. Okunishnikova
Title
Petri Net Modelling of Estelle-specified Communication Protocols
Paper 94
Abstract
In order to use net models for communication protocol verification, a method intended for automatic translation of Estelle protocol specifications in coloured Petri nets is proposed. A tool for simulation and analysis of the net models is outlined. For explanation of the method the Stenning protocol is used.

Authors
S.M. Achasova
Title
SYNCHRONOUS-ASYNCHRONOUS CELLULAR COMPUTATIONS
Paper 1
Abstract
Operation of parallel substitutions over cellular arrays in synchronous-asynchronous mode in studied. Correctness conditions for parallel substitution systems in this mode of execution are stated.

Authors
O.L. Bandman
Title
CELLULAR-NEURAL COMPUTATIONS. FORMAL MODEL AND POSSIBLE APPLICATIONS.
Paper 21
Abstract
A formal model of fine-grained parallel computations is presented, in which the connectionist method of Artificial Neural Networks is combined with the cellular-like structure of interneuron communication. The model is based on the concepts and formalisms of Parallel Substitution Algorithm, which is considered be the most theoretically advanced generalization of Cellular Automaton. Some fields of application are discussed and computer simulation results are given.

Authors
V. Markova, S. Piskunov
Title
COMPUTER MODELS OF 3D CELLULAR STRUCTURES
Paper 70
Abstract
A computer technology of 3D cellular structure design based on the model of distributed computations (Parallel Substitution Algorithm) is presented. This technology is demonstrated on the example of two original 3D structures (universal and algorithm-oriented). It is shown that the structures can be converted into electrooptical devices with a simple topology of each layer and massive data exchanges between layers.

Authors
A.Sh. Nepomniaschaya
Title
COMPARISON OF TWO MST ALGORITHMS FOR ASSOCIATIVE PARALLEL PROCESSORS
Paper 85
Abstract
In this paper, we analyze procedures for finding a minimal spanning tree of a graph for an abstract associative STAR-machine with bit-serial processing. We compare the implementations of the Prim-Dijkstra algorithm and the Baase algorithm om the STAR-machine for the same graph representation. Then we briefly describe our STAR-system.

Authors
A. Vazhenin and V. Morozov
Title
PARALLEL ITERATIVE SOLUTION OF SYSTEMS OF LINEAR EQUATIONS WITH DYNAMICALLY CHANGED LENGTH OF OPERANDS
Paper 294
Abstract
The paper deals with the development of parallel iterative algorithms for solving systems of linear equations in MIMD architecture. The problem if discussed taking into account factors, defining both the time and the accuracy of solution. The new parallel algorithm is described implementing the multistep refinement of results. The speedup is achieved using small operand length at early stages of solution. The results are presented of some numerical experiments executed in a multitransputer system.

Authors
M. Royak, E. Shurina, Yu. Soloveichik, V. Malyshkin
Title
PARALLELIZATION OF COMPUTER CODE MASTAC THREE-DIMENSIONAL FINITE ELEMENTS METHOD IMPLEMENTING
Paper 305
Abstract
The special features of computer code MASTAC for non-linear three-dimensional magnitostatic field calculations and abilities of its parallelization are considered. Computer code MASTAC is implemented on IBM-compatible computers using C++ and FORTRAN-77 programming languages. MASTAC has been used to calculate magnetic field for wigglers, curvilinear dipole magnets of positrons accumulator-cooler, direct current electric machines with high magnetic concentration degree. All these tasks are characterized by complex three-dimensional geometry with curvilinear surfaces that divides parts of construction with different physical properties.

MASTAC has comfortable graphic preprocessor and provides magnitostatic fields' calculation in complex three-dimensional constructions with high accuracy. All these properties make MASTAC very attractive for researchers who use such calculation for analyzing and creating complex technical constructions. Parallelization of MASTAC calculation procedures at high efficiency workstation computers gives researchers an ability to solve many problems of complex technical constructions designing optimization and analysis.


Authors
A. Kremlev, O. Monakhov, T. Thiel
Title
PARALLEL SEISMIC DATA PROCESSING METHOD FOR MEMSY MULTIPROCESSOR SYSTEM
Abstract
The results of testing of seismic data parallel processing by Wava analogy of the Common Depth Point (WCDP) method on multiprocessor pyramidal architecture MEMSY System are presented.

Authors
D.A.Pospelov and Ya.I.Fet
Title
PARALLEL COMPUTING IN RUSSIA
Paper 465
Abstract
The paper presents an overview of the history and the state-of-the-art in the field of parallal computing in Russia. A valuable contribution of Russian scientists in theoretical computer science as well as in the architecture of high performance parallel computing systems is emphasized. The most interesting works of Russian authors are shortly characterized.

48 ref.


Authors
T. Ludwig and S. Lamberts
Title
PFSLib - A Parallel File System for Workstation Clusters
Paper 249
Abstract
PFSLib is a parallel file system library which was developed by the parallel processing group at LRR-TUM. The primary design goal of the project funded by the Intel Foundation was to provide source code compatibility with Intel's parallel file system PFS. Thus, it composes together with NXLib of LRR-TUM an emulator of a Intel Paragon supercomputer. In addition, PFSLib can be used as a stand-alone software product together with other parallel programming environments like e.g. PVM or implementations of MPI. The user interface provides a set of different file access modes suitable for various situations where parallel file I/O is indispensable. Internal mechanisms of file distribution guarantee locality of disk access and therefore improve scalability. PFSLib serves as a research platform to investigate issues of user interface design and the integration of parallel I/O into a parallel tools environment with various interactive and automatic tools. The paper will describe design aspects of PFSLib, give performance results, and demonstrate in detail the integration of PFSLib into projects at LRR-TUM.

Last update: November 23, 1996