Note: This discussion is about an older version of the COMSOL Multiphysics® software. The information provided may be out of date.
Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.
Cluster job not saving output file model
Posted 2011年9月15日 GMT-4 10:46 Results & Visualization Version 4.2 3 Replies
Please login with a confirmed email address before reporting spam
Dear All,
I am facing a problem when running my model in a cluster.
I am using a parametric distributed study for a time dependent problem. The model has ~
450000 Dofs. The model runs smoothly if executed in single computer with a single parameter.
However, when using a batch cluster procedure (Linux/fedora), the calculation runs perfectly until the end, but the output file is not saved!
Sometimes, I am getting a message error at the end, which seems an MPI issue!
{
Assertion failed in file ../../socksm.c at line 2576: (it_plfd->revents & 0x008) == 0
internal ABORT - process 0
}
and sometimes nothing happens, the CPUs keep running after reaching 100% progress, without saving the output file even after 48h.
It seems that Comsol/cluster cannot handle huge output files, knowing that a single parameter study would generate an output file around 3.5Go.
I have even reduced the number of parameters considered in the study (from 6 to 3 parameters) and the result was the same, no output file saved at the end!!
Moreover, reducing the time range (running just 1 period <=> reducing the output file size) would generate the output file at the end.
However, other simple calculations run smoothly in the cluster and end by saving the output file.
This, just to inform you that the problem is not coming from our cluster!
I wonder if anybody has faced such a problem!
Your comments and suggestions are welcome!!
Cheers
I am facing a problem when running my model in a cluster.
I am using a parametric distributed study for a time dependent problem. The model has ~
450000 Dofs. The model runs smoothly if executed in single computer with a single parameter.
However, when using a batch cluster procedure (Linux/fedora), the calculation runs perfectly until the end, but the output file is not saved!
Sometimes, I am getting a message error at the end, which seems an MPI issue!
{
Assertion failed in file ../../socksm.c at line 2576: (it_plfd->revents & 0x008) == 0
internal ABORT - process 0
}
and sometimes nothing happens, the CPUs keep running after reaching 100% progress, without saving the output file even after 48h.
It seems that Comsol/cluster cannot handle huge output files, knowing that a single parameter study would generate an output file around 3.5Go.
I have even reduced the number of parameters considered in the study (from 6 to 3 parameters) and the result was the same, no output file saved at the end!!
Moreover, reducing the time range (running just 1 period <=> reducing the output file size) would generate the output file at the end.
However, other simple calculations run smoothly in the cluster and end by saving the output file.
This, just to inform you that the problem is not coming from our cluster!
I wonder if anybody has faced such a problem!
Your comments and suggestions are welcome!!
Cheers
3 Replies Last Post 2011年9月30日 GMT-4 09:12