Network failure in a rack causes job loss.
by
Steinar Trædal-Henden
—
last modified
Jun 23, 2010 09:20 PM
We have lost a network switch in a rack this caused job loss.
A Swich connecting the blades c37-x , c38-x, c39-x and c40-x to the clustre was lost.
This caused that all jobs running on these nodes are lost.
We are terrible sorry for the inconveniences this may have caused.
for the HPC Staff
Steinar

