B4J Question Websocket Server - Client connections - JVM not releasing memory

Status
Not open for further replies.

Jmu5667

Well-Known Member
Licensed User
Hello All

We have a web-socket server that uses an SSL connection for the clients. We were contacted by one of our customers informing us that the server was not working any more.

We looked at the server and it was not accepting any connections, and the existing connections had died. We started to investigate and built a test app that makes a connection to a ping class on on the server, once it has a connection, it disconnects and make the connection again. It will repeat this process forever unless you stop it. I have attached the test_ping app.

The problem we now face is that we are not sure what is causing the memory to keep growing. I have used visual VM on the server and all the threads being created are being release. We stripped down the ping class to its bare bones, here it is.

B4X:
'WebSocket class
Sub Class_Globals
 
   Private ws                    As WebSocket
 
End Sub

Public Sub Initialize

End Sub

Private Sub WebSocket_Connected (WebSocket1 As WebSocket)


    ws = WebSocket1
    ws.Session.MaxInactiveInterval = 30

End Sub

Private Sub WebSocket_Disconnected
 
    log("WebSocket_Disconnected")

End Sub

This is the command to start the server:

java -Xms32M -Xmx64M -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:parallelGCThreads=20 -XX:ConcGCThreads=5 -XX:InitiatingHeapOccupancyPercent=30 -jar ies_svr_aspect_hello.jar

Java Version :
1.8 1.8.0_202 http://java.sun.com/products/autodl/j2se C:\Program Files\Java\jre1.8.0_202\bin\javaw.exe true x86_64

Machine:
View attachment 77242

Main in the server app is as follows:


B4X:
'Non-UI application (console / server application)
' Version 1.0.0.8
#Region  Project Attributes
   #CommandLineArgs:
   #AdditionalJar: mssql-jdbc-6.2.2.jre8.jar
   #AdditionalJar: conscrypt-openjdk-uber-1.1.2.jar
   #AdditionalJar: bcprov-jdk15on-159.jar
#End Region

Sub Process_Globals
 
   Public DB_SQL As ConnectionPool
   Public srvr As Server
   Public svrConsole As ServerSocket
 
   ' // Global Map holding the settings
   Public SettingsMap As Map
   ' // Define the settings file which is read by the global settings map
   Public SettingsFile As String = "ies_svr_aspect_hello.ini"
   Public appPath As String
   Public logPath As String
 
   Private logStream    As OutputStream

End Sub

Sub AppStart (Args() As String)
 
   Dim res As Int
  
   Try
       appPath = mod_settings.get_app_path
       logPath = mod_settings.get_log_path
  
       mod_settings.load_settings
 
       writelog("AppStart, Version 1.0.0.8")
  
       ' // setup the SQL server
       res = ConfigureSQL
       If res > 0 Then
           ExitApplication2(res)
       End If
  
  
       ' // configure websocket  
       srvr.Initialize("")
       srvr.AddWebSocket("/ping", "ping")
       res = ConfigureSSL(SettingsMap.Get("ssl_port"))
       If res > 0 Then
           ExitApplication2(res)
       End If
       srvr.Port = SettingsMap.Get("http_port")
       srvr.Http2Enabled = True
       srvr.Start
                
       ' // start the message loop
       StartMessageLoop
  
   Catch
       writelog("Error " & LastException.Message)
       ExitApplication2(999)
   End Try
 
 
 
End Sub

private Sub ConfigureSQL As Int
 
 
   Dim res As Int = 0

   Try
       DB_SQL.Initialize("com.microsoft.sqlserver.jdbc.SQLServerDriver","jdbc:sqlserver:" & _
                       SettingsMap.Get("sql_host"), _
                       mod_security.decrypt(SettingsMap.Get("sql_usr")), _
                       mod_security.decrypt(SettingsMap.Get("sql_pwd")))
      
       If Not(DB_SQL.IsInitialized) Then
           res = 1       
           writelog("ConfigureSQL() error - " & SettingsMap.Get("sql_host") & ", Not Initialized")
           writelog("ConfigureSQL() error -  " & LastException.Message)
       Else
           writelog("ConfigureSQL() SQL IsInitialized = " & DB_SQL.IsInitialized)
       End If
      
      
       ' // set the connection pool size
       Dim jo As JavaObject = DB_SQL
       Dim pool As Int  = SettingsMap.Get("sql_pool")
       writelog("AppStart pool   = " & pool)
       jo.RunMethod("setMaxPoolSize", Array(pool))
  
   Catch
       writelog("ConfigureSQL() error -  " & LastException.message)
       res = 1
   End Try
  
  
   Return res
 
End Sub

Private Sub ConfigureSSL (SslPort As Int) As Int
 
   Dim res As Int = 0
 
   Try
       ' // SSL connector configuration
       Dim ssl As SslConfiguration
  
       ssl.Initialize
       ssl.SetKeyStorePath(File.GetFileParent(SettingsMap.Get("keystore")) , File.GetName(SettingsMap.Get("keystore")))
       ssl.KeyStorePassword = mod_security.decrypt(SettingsMap.Get("ssl_pwd"))
       ssl.KeyManagerPassword = mod_security.decrypt(SettingsMap.Get("ssl_pwd"))
       ssl.EnableConscryptProvider
       srvr.SetSslConfiguration(ssl, SslPort)
       writelog("ConfigureSSL() " & SslPort & ", Configured")
   Catch
       res = 2
       writelog("ConfigureSSL() " & SslPort & ", Error - " & LastException.Message)
   End Try
      
   Return res
      
End Sub


Sub writelog(pData As String)
 
   DateTime.DateFormat = "yyyy-MM-dd"
   DateTime.TimeFormat = "HH:mm:ss.SSS"
 
   RedirectOutput(logPath,"ies_svr_aspect_hello.log")
 
   If File.Size(logPath,"ies_svr_aspect_hello.log") > ((1024*1024)*10) Then
 
       logStream.Close
       ' // remove existing backup
       If File.Exists(logPath,"ies_svr_aspect_ptt_prev10mb.log") Then
           File.Delete(logPath,"ies_svr_aspect_ptt_prev10mb.log")
       End If
       ' // make new backup
       File.Copy(logPath,"ies_svr_aspect_hello.log",logPath,"ies_svr_aspect_hello_prev10mb.log")
       ' // delete current log
       File.Delete(logPath,"ies_svr_aspect_hello.log")
       RedirectOutput(logPath,"ies_svr_aspect_hello.log")
 
   End If

   Dim logString As String
   Dim Data() As Byte
   logString =  DateTime.date(DateTime.Now) & " " & DateTime.Time(DateTime.Now) & _
                   " " & pData & CRLF
   Data = logString.GetBytes("UTF8")

   logStream.WriteBytes(Data,0,Data.Length)
   logStream.Flush
   logStream.Close
 
   Log(logString)
 
End Sub


Sub RedirectOutput (Dir As String, FileName As String)
 
   logStream = File.OpenOutput(Dir,FileName,True)
 
End Sub

The server uses keystore so if anyone want to assist in helping you will need the same setup. As I said before we do not know why this is happening and I would like to hear from other who may have had this problem, or who have a production deployed solution similar to this but do not have this issue.

I will be back online Sunday 10th to work on this.

Regards

John.
 
Last edited:

Jmu5667

Well-Known Member
Licensed User
How do you see that the memory usage keeps growing?
In task Manager, it keeps growing and if I stop the test_ping app, the memory used show by task manager does not reduce, even if I call GC from within the server app. If left running eventually the server app will run out of memory.
 
Upvote 0

OliverA

Expert
Licensed User
In test_ping app, you open then then immediately close the WebSocket. Is that how WebSockets are used in production? You open a WebSocket, sent some information and then close a WebSocket? If so, at what rate is that happening?
 
Upvote 0

Jmu5667

Well-Known Member
Licensed User
In test_ping app, you open then then immediately close the WebSocket. Is that how WebSockets are used in production? You open a WebSocket, sent some information and then close a WebSocket? If so, at what rate is that happening?

Test ping is just to test the opening and closing of the connection. It is used really for testing the server load balancing, you can have loads of them running and they give the server a good trashing. I was of the belief that when the server socket went out of scope the memory was released, does not seem to be. This is a very simple test and the results so far are not promising.

The settings used to launch the jar, java -XX:MetaspaceSize=64m -XX:MaxMetaspaceSize=128m -XX:MinHeapFreeRatio=40 -XX:MaxHeapFreeRatio=70 -XX:NewRatio=8 -XX:SurvivorRatio=32 -jar ies_svr_aspect_hello.jar, seem to be ignored by the JVM.
 
Upvote 0

OliverA

Expert
Licensed User
I was asking since a usage case of open and closing a connection rapidly is really not what WebSockets were designed for. I initially ran your test (I stripped out al DB, SSL and log related code) and within a couple of minutes the server would start tripping over java.lang.RuntimeException: java.lang.NullPointerExceptionWebSocket_Disconnected errors. I noticed than in the client console that even though there was one wsh_Connected entry that there were suddenly 2, then 3, 4, up to 5 wsh_Closed messages, with only one containing an empty result. I've did not use any command line options to start the server. It grew to about 180MB before it started acting up. It would take a while to get more connections and then it grew to 200+MB. This happened in about a 10 minute span. I then threw a random 1-5 second delay before the client would call ws.Close. This has been running now for close to 2 hours without errors. The JVM is still growing, but it's currently only at 102.2MB.
Shooting from the hip: Your test really stresses the WebSocket server. WebSocket is a heavy protocol, with several handshakes for creating and closing a connection. Anything that interrupts the server at that load may cause issues. An aggressive GC will constantly interfere with a heavy workload. So idk if this test is good at testing WebSockets for a scenario that it is not designed for. OTH, I guess as is, WebSockets could be DOS'ed with this method.
 
Upvote 0

OliverA

Expert
Licensed User
Attached are the stripped out server and the modified client. See the server's output for port# it uses.
 

Attachments

  • wspingclient.zip
    3.4 KB · Views: 78
  • wspingserver.zip
    1.1 KB · Views: 82
Upvote 0

OliverA

Expert
Licensed User
After 13+ hours, no drastic change in heap space usage. GC seems to happen at regular intervals.
 

Attachments

  • 2019-02-10_medium.png
    2019-02-10_medium.png
    337.8 KB · Views: 104
Upvote 0

Erel

Administrator
Staff member
Licensed User
I was of the belief that when the server socket went out of scope the memory was released
You must close all sockets.

Why are you limiting the server heap size to 64mb?

The online server example which includes many handlers, doing all kinds of things is running for 80 days now. There are no memory leaks in jServer when it is used correctly.
https://b4x.com:51041/
 
Upvote 0

Jmu5667

Well-Known Member
Licensed User
I noticed using Visual VM when you do a heap dump it clears the memory used that is shown in Windows Task Manager, I also noticed in your test server that there is no session scavenger being used, got me thinking and I found this. https://www.codota.com/code/java/classes/org.eclipse.jetty.server.session.DefaultSessionIdManager

I need to use SSL & DB so it is important that is is used in the testing. I wonder is the SSL part caching memory or something like that.
You must close all sockets.

Why are you limiting the server heap size to 64mb?

The online server example which includes many handlers, doing all kinds of things is running for 80 days now. There are no memory leaks in jServer when it is used correctly.
https://b4x.com:51041/

In the ping class when the 'WebSocket_Disconnected' event is fired as a result of the client disconnecting is the socket not closed ? In relation to the heap size we are testing the viable amount of memory each B4J server app should be using, and there will be many.
 
Upvote 0

Erel

Administrator
Staff member
Licensed User
In the ping class when the 'WebSocket_Disconnected' event is fired as a result of the client disconnecting is the socket not closed ?
Where is the ping class?

1. I would start with the default JVM options. Remove all those arguments.
2. Worth testing with a larger heap size. Maybe 256mb. You need to see whether it continues to grow or not.
3. It is not related to SSL. This server with SSL is running for 80 days with no memory leaks: https://b4x.com:51041/
 
Upvote 0

Jmu5667

Well-Known Member
Licensed User
Where is the ping class?

1. I would start with the default JVM options. Remove all those arguments.
2. Worth testing with a larger heap size. Maybe 256mb. You need to see whether it continues to grow or not.
3. It is not related to SSL. This server with SSL is running for 80 days with no memory leaks: https://b4x.com:51041/

Ping Class is a websocket added in main

B4X:
srvr.AddWebSocket("/ping", "ping")

clients connect using :
B4X:
ws.Connect("wss://xxx.xxx.xxx.xxx:nnnn/ping")

I am now using suggestion 1
 
Upvote 0

Jmu5667

Well-Known Member
Licensed User
This is the ping class :

B4X:
'WebSocket class
Sub Class_Globals
 
   Private ws                    As WebSocket
   Private session_id_client   As String
   Private startTime            As Long = DateTime.Now
   Private tmrKill               As Timer
   Private threadName           As String
 
End Sub

Public Sub Initialize
 
   tmrKill.Initialize("tmrKill",10000)
   threadName = getcurrentThreadName

End Sub

Private Sub WebSocket_Connected (WebSocket1 As WebSocket)
 
   Try
       ws = WebSocket1     
       ws.Session.MaxInactiveInterval = 10
       ws.RunFunction("ServerReady", Array As Object("READY"))
       ws.Flush
       writelog("WebSocket_Connected, Remote IP = " & ws.UpgradeRequest.RemoteAddress)
       tmrKill.Enabled = True
   Catch
       writelog("WebSocket_Connected, error - " & LastException)
       close_socket
   End Try

End Sub


Private Sub WebSocket_Disconnected
 
   ' // turn off timer
   tmrKill.Enabled = False
   ws.Session.Invalidate
   writelog("WebSocket_Disconnected, Processing Time " & (DateTime.Now - startTime) & " ms")
 
End Sub

Sub tmrKill_Tick
 
   writelog("tmrKill_Tick, Killing Connection")
   tmrKill.Enabled = False
   close_socket
 
End Sub

Sub close_socket
 
   tmrKill.Enabled = False
   ws.Session.Invalidate
   ws.Close
 
End Sub


Sub update_ttl
 
     
   Dim rs As ResultSet, sSQL As String
   Dim valid As Boolean
 
   tmrKill.Enabled = False
 
   valid = True
   ' // get a SQL connection
   Dim sql   As SQL = Main.DB_SQL.GetConnection
 
   Try
       ' // do we have a connection
       If sql.IsInitialized Then
           Try
               ' // check f this user is online
               rs = sql.ExecQuery("SELECT xxxxx FROM [XXXX].dbo.xxxxx " & _
                                   "WHERE xxxx = '" & session_id_client & "'")
               If rs.NextRow Then
                   ' // update the TTL for this device
                   Dim ttl As Long
                 
                   ttl = DateTime.Now + ((Main.SettingsMap.Get("heartbeat") * DateTime.TicksPerMinute)*3) + DateTime.TicksPerMinute
                   sql.BeginTransaction
                   sSQL = "UPDATE [xxxxx].dbo.xxxxx " & _
                       "SET xxx = '" & DateTime.Date(ttl) & " " & DateTime.time(ttl) & "' " & _
                       "WHERE xxxx = '" & session_id_client & "'"
                     
                   sql.ExecNonQuery(sSQL)
                   sql.TransactionSuccessful
                   valid= True
               Else
                   'writelog("update_ttl() session is no longer valid " & session_id_client)
                   valid= False
               End If
               rs.Close
           Catch
               rs.close
               valid= False
               writelog("update_ttl() error = " & LastException.Message)
           End Try
       Else
           writelog("update_ttl() sql.IsInitialized = " & sql.IsInitialized)
           valid= False
       End If
   Catch
       writelog("update_ttl() Error - " & LastException.Message )
   End Try
         
   ' // close connection to sql
   sql.close
 
   If Not(valid) Then
       ' // send
       ws.RunFunction("ServerPing", Array As Object("FAILED"))
   Else
       ' // send
       ws.RunFunction("ServerPing", Array As Object("PINGOK"))
   End If
   ws.Flush
 
   ' // re-enable in case client does not close the connection
   tmrKill.Enabled = True
 
End Sub


Sub Login_Credentials(Params As Map)
 
   session_id_client = Params.Get("xxxxx")
   writelog("Login_Credentials, ws.Session.MaxInactiveInterval =" & ws.Session.MaxInactiveInterval)
   If session_id_client.Length > 0  Then
       CallSubDelayed(Me,"update_ttl")
   Else
       close_socket
   End If
 
 
End Sub

Sub getcurrentThreadName As String
 
   Dim t As JavaObject
   Return t.InitializeStatic("java.lang.Thread").RunMethodJO("currentThread", Null).RunMethod("getName", Null)
 
End Sub

Sub writelog(pData As String)
 
 
   CallSubDelayed2(Main,"writelog",threadName &  ":" & session_id_client & ":INFO:ping│" & pData)
 
End Sub




[code]
 
Upvote 0

OliverA

Expert
Licensed User
3. It is not related to SSL.
I agree
I need to use [...] DB
I've noticed you are using connection pooling. Through some reading and internal testing I came through the following conclusions:
1) Pools size sweet spot seems to be somewhere between 5-10. I used to have larger pool sizes until I read up on HikariCP (https://github.com/brettwooldridge/HikariCP) and used some internal testing to come to the same conclusion that larger pool sizes do not provide larger benefits. A super slow network and/or some super slow (really slow) queries may necessitate larger pool sizes, but those cases seem to be on the extreme end of pool size usage.
2) When the client and the server are on the same machine or located on the same fast (1Gbits+) network, pooling does not seem to give you any gains, especially if the queries are pretty quick (the ones listed above should be quick). In these cases, pooling just adds another layer of complexity and may add more pressure on memory (an item which you seem to care a lot about). In your code above
Dim sql As SQL = Main.DB_SQL.GetConnection
would just return a standard SQL object and in the WebSocket
sql.close
would just need to be commented out (you don't close the SQL object until the server object quits). It seems counter-intuitive, but my own internal testing has shown that pooling provided no benefit in these cases. At that point, SQL pooling does check if the connection is good for you, but you could handle that yourself (most pools just do a "SELECT 1" query against the server as a means to check if it is up and running).
Note: This is not a opinion of a DB expert, just some observations gained through reading and doing some inhouse testing.
 
Upvote 0

Erel

Administrator
Staff member
Licensed User
I don't think that it is related to the pool.
would just need to be commented out (you don't close the SQL object until the server object quits).
How can it be possible? The connection pool will treat it as an open connection and it will cause a connection leak.

@Jmu5667 you need to let the process crash with an out of memory error and analyze the memory dump.
 
Upvote 0

Jmu5667

Well-Known Member
Licensed User
I agree

I've noticed you are using connection pooling. Through some reading and internal testing I came through the following conclusions:
1) Pools size sweet spot seems to be somewhere between 5-10. I used to have larger pool sizes until I read up on HikariCP (https://github.com/brettwooldridge/HikariCP) and used some internal testing to come to the same conclusion that larger pool sizes do not provide larger benefits. A super slow network and/or some super slow (really slow) queries may necessitate larger pool sizes, but those cases seem to be on the extreme end of pool size usage.
2) When the client and the server are on the same machine or located on the same fast (1Gbits+) network, pooling does not seem to give you any gains, especially if the queries are pretty quick (the ones listed above should be quick). In these cases, pooling just adds another layer of complexity and may add more pressure on memory (an item which you seem to care a lot about). In your code above

would just return a standard SQL object and in the WebSocket

would just need to be commented out (you don't close the SQL object until the server object quits). It seems counter-intuitive, but my own internal testing has shown that pooling provided no benefit in these cases. At that point, SQL pooling does check if the connection is good for you, but you could handle that yourself (most pools just do a "SELECT 1" query against the server as a means to check if it is up and running).
Note: This is not a opinion of a DB expert, just some observations gained through reading and doing some inhouse testing.

If i do not close the sql in the class it will cause a leak. I close the sql as soon as I have finished the necessary sql functions. In the test I am doing the update_ttl, where the SQL is being used is not called. This is purely a connect/disconnect test.

I am currently running the b4j server with 9 clients connected, belting open and closes, the task man memory usage is at 670 mb within 5 mins. Waiting for the big crash !
 
Upvote 0

OliverA

Expert
Licensed User
How can it be possible? The connection pool will treat it as an open connection and it will cause a connection leak.
This is only done when not using a pool.
I don't think that it is related to the pool.
True. I just reported an observation. The OP seems to be concerned about memory usage. Modern JDBC drivers seem to be getting to the point that pooling is only necessary when necessary. Same host or fast local LAN seem to be two cases where it may not.
 
Last edited:
Upvote 0
Status
Not open for further replies.
Top