there is no stability problem in a solution of this type: AWS + Linux + JVM. All these components can be considered as reliable. So if your application is killed it's not a stability problem. There is a reason and a log file somewhere should have a trace of the operation that has been done .
If you have examined the StdOut/and StdErr outputs of the JVM without finding any information it means that the action comes from outside Java environnent, so from the system.
On Linux you should examine the logs (/var/log/messages ) and the kernel information (dmesg ).
For example, to find out if you can find the word "killed"
sudo dmesg -T | grep killed
sudo cat /var/log/messages | grep killed
Your machine having little memory, we can possibly suspect the management of the Overcommit memory. You should know if your application is the only one running on the machine or if other applications are likely to start at certain times.
This is because in the OverCommit Memory system, when a process requests more space, Linux will give it that space, assuming that not all processes actually use all the memory they have requested. The process will get exclusive use of the memory it requested when it actually uses it. This makes allocation fast, and can allow more memory to be allocated than is actually there.
However, once processes start using this memory, if there is not enough memory, Linux will kill a process to free up memory. The choice of which process to kill is based on a score taking into account execution time (long processes are safer), memory usage (greedy processes are less safe), and some other parameters adjustable by the system administrator. It is the OOM Killer function of the kernel that takes care of killing a process.
the pseudo files in the /proc/sys/vm directory allow to know the various parameters or to change them
for example
cd /proc/sys/vm
cat overcommit_memory
it displays a value 0,1, or 2
- with 0, or 1 the linux system uses OOM Killer if necessary
- with 2 OOM Killer is not used
(there are other overcommit_xxx files which manage the parameters)
You can also check in dmseg or in the log messages if you find traces of the string "OOM" which would tend to show the reason why the process was killed.
If the problem really comes from the overcommit (i.e. a value of 0, or 1 and traces of "kill"/"OOM" in the logs) the possible solutions are :
- Add more memory for the VM
- Don't launch other processes that could ask for memory and therefore cause the kill
- Disable the OOM Killer (value 2)
- Manage the OOM Killer parameters so that your process does not have priority