inetbot web crawler
Main  |  Get access to the repository  |  API  |  The robot  |  Publications  |  Usenet Groups  |  Plainweb  | 
 inetbot - Groups (beta)

Current group: comp.parallel

[TCPP-announce] Call for papers (First Workshop on System Management Tools for ...

[TCPP-announce] Call for papers (First Workshop on System Management Tools for ...  
tcpp-announce at socorro.ece.unm.edu
From:tcpp-announce at socorro.ece.unm.edu
Subject:[TCPP-announce] Call for papers (First Workshop on System Management Tools for ...
Date:17 Dec 2004 09:51:39 -0700


*************************DEADLINE EXTENDED*********************************
Note new deadline: December 24, 2004
***************************************************************************

First Workshop on System Management Tools for Large-Scale Parallel Systems


(Held in Conjunction with IPDPS 2005, Denver, Colorado, April 8, 2005)

***************************************************************************

We are entering a new era in computing where the size and complexity of
scientific and engineering simulations is growing at a speed that has
never been observed before. In order to satisfy the needs of these
applications, parallel systems with an "extreme-scale" are being designed
and deployed. Although the progress in hardware and architecture design
has made it possible to build machines with tens of thousands of processors,
the development of software tools for such systems is still lagging
behind. To name just a few examples, new operating system level modifications
are needed to efficiently utilize the massive computing and networking power.
In addition, sophisticated fault-tolerant tools are in great need to minimize
the performance loss under a faulty condition. The scale of the systems
also demands advanced power management tools. For both commodity supercomputing


clusters and custom-designed supercomputers, system maintenance,
reliability, fault isolation, prevention and control pose huge challenges.
There is a great need of research not only in terms of scale of the machine,
but also in terms of their implications on system performance and utilization.

This workshop is intended to bring together researchers and practitioners to
begin identifying the new challenges imposed by this trend and investigating
efficient software tools to improve the performance, reliability and
operation of large scale parallel systems.

Topics of interest include, but are not limited to:
=B7 Scalable operating system design
=B7 Scalable resource management tools
=B7 Efficient failure diagnosis, failure prediction and failure recovery
tools
=B7 Scalable job scheduling tools
=B7 Scalable parallel check-pointing tools
=B7 Self-healing and self-management tools
=B7 Power management for large scale machines
=B7 System bring-up and control tools
=B7 Ease of system maintenance, services including system management
experiences
=B7 Performance, system utilization implications

Workshop Organizers:

Fabrizio Petrini , Los Alamos National Laboratory, New Mexico
(fabrizio@lanl.gov)
Ramendra K. Sahoo, IBM TJ Watson Research Center, Yorktown Heights,NY
(rsahoo@us.ibm.com)
Yanyong Zhang, Dept. of Electrical and Computer Engineering, Rutgers
University (yyzhang@ece.rutgers.edu)

Program Committee:

Ricardo Bianchini(Rutgers Univ., CS) ricardob@cs.rutgers.edu
Henri Casanova (UCSD) casanova@cs.ucsd.edu
Dror Feitelson (Hebrew University) feiteldg@vuse.vanderbilt.edu
Rahul Garg (IBM India) grahul@in.ibm.com
Jose E. Moreira (IBM Research) jmoreira@us.ibm.com
Manish Parashar (Rutgers Univ., ECE) parashar@caip.rutgers.edu
Kyung Ryu (IBM Research) kryu@us.ibm.com
Anand Sivasubramaniam (Penn. State Univ.) anand@cse.psu.edu
Rajeev Thakur (Argonne National Lab.) thakur@mcs.anl.gov
Jeff Vetter (Oak Ridge National Lab.) vetterjs@ornl.gov
Andy Yoo (Lawrence Livermore National Lab.) ayoo@llnl.gov
Xiaodong Zhang (College of William & Mary) zhang@cs.wm.edu

Important Dates:
Submission Date: 12/24/2004 (Extended!)
Notification Date: 1/07/2005
Camera-Ready Date: 1/21/2005

CONTACT INFO:
------------
web : http://www.ece.rutgers.edu/~yyzhang/ipdps-ws
email: yyzhang@ece.rutgers.edu


Informal proceedings will be handed out at the workshop and published
along with other IPDPS 05 publications. For full paper submissions, we
are also planning to publish formal proceedings as one issue of
Springer-Verlag?s Lecture Notes in Computer Science (LNCS) series.

Ramendra K. Sahoo
IBM TJ Watson Research Center
1101 Kitchawan Road, Yorktown Heights, NY 10598
phone: 914-945-2936, T/L 8-862-2936
email: rsahoo@us.ibm.com

--
   

Copyright © 2006 inetbot   -   All rights reserved