Bug #1062

avatar

Crash when persistency is sqlite and number of messages is large enough

Added by Felipe Sateler 3480 days ago. Updated 3468 days ago.

Status:Closed Start:05/24/2015
Priority:Urgent Due date:
Assigned to:avatarMirco Bauer % Done:

100%

Category:Engine
Target version:1.0
Complexity:

Medium

Found in Version:

Votes: 0

Description

When I use persistency sqlite, I get crashes on frontend reconnect. This only appears to happen when a certain amount of messages are stored: if I clean up the buffer store, the problem goes away.

Attached are the gdb log and debug log (from make run-server).

I have noticed (but cannot guarantee this is always the case), that the crashes tend to occur when the frontend is starting up, if connection is lost and the frontend reconnects, the crash is less frequent.

If there is anything else I can do to help debug this issue, do not hesitate to ask.

gdb.txt (49.2 KB) Felipe Sateler, 05/24/2015 10:38 PM

server.log (420.2 KB) Felipe Sateler, 05/24/2015 10:38 PM

thread-dump.txt - another thread-dump (3 KB) Mirco Bauer, 06/06/2015 12:06 PM

Associated revisions

Revision a84de167cd060b1412792964922caddb0fcaf712
Added by Mirco Bauer 3476 days ago

Engine: handle exceptions of SQLite message buffer during sync (refs: #1062)

Non-volatile backends have higher chances of race conditions and other serious
bugs that can lead to crashes because their complexity is much higher. For that
reason Smuxi handled exceptions of Db4o and should also do for all other
non-volatile ones like SQLite.

Revision b4b87f46ec4457ca8da31329bb9f784c6e119fc1
Added by Mirco Bauer 3468 days ago

Server, Frontend-GNOME: use Boehm as GC instead of SGen (closes: #1062)

Mono 3.2.8 (and possibly newer versions) SEGVs in Mono.Data.Sqlite when the GC
is SGen and huge datasets are read from more than one thread at the same time
using different databases. With Boehm this does not happen. Since Smuxi switched
to SQLite by default this is a show stopper for the 1.0 release and thus Boehm
will be used to workaround this issue till the Mono SGen GC or the
Mono.Data.Sqlite binding will be fixed in a later Mono version. Or the used
SQLite binding will be replaced with sqlite-net [0] or Hyena's SQLite binding [1].

https://github.com/praeclarum/sqlite-net
https://github.com/GNOME/hyena/tree/master/Hyena.Data.Sqlite

History

Updated by Felipe Sateler 3480 days ago

avatar

Forgot to say that both client and server are git from today.

Updated by Mirco Bauer 3480 days ago

avatar

Felipe Sateler wrote:

if I clean up the buffer store, the problem goes away.

How do you clean up your buffer store, do you delete the sqlite files or do you truncate the Messages table?

Updated by Felipe Sateler 3480 days ago

avatar

I move the ircbuffers folder away.

Updated by Mirco Bauer 3480 days ago

avatar
  • Category set to Engine
  • Assigned to set to Mirco Bauer
  • Priority changed from Normal to Urgent
  • Target version set to 1.0
  • Complexity set to Medium

Updated by Felipe Sateler 3477 days ago

avatar

A new datapoint: connecting with "low bandwidth mode" makes smuxi not crash. This also suggests the problem is when syncing a relatively large number of messages.

Updated by Mirco Bauer 3476 days ago

avatar

Felipe Sateler wrote:

A new datapoint: connecting with "low bandwidth mode" makes smuxi not crash. This also suggests the problem is when syncing a relatively large number of messages.

Indeed, what are your buffer settings then? Please provide all numbers from Preferences -> Interface -> General -> Message Buffer

Updated by Felipe Sateler 3476 days ago

avatar

Buffer Lines: 6999
Engine Buffer Lines: 6999
Persistency: SQLite (recommended)
Volatile Buffer lines: 10000
Persistent Buffer Lines: 50000

Updated by Mirco Bauer 3476 days ago

avatar
  • Status changed from New to Assigned

I can finally reproduce this issue with your history and auto-connect + auto-join. it seems to be a stress/race issue.

meebey@redhorse:~$ mono -V
Mono JIT compiler version 3.2.8 (Debian 3.2.8+dfsg-9)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
    TLS:           __thread
    SIGSEGV:       altstack
    Notifications: epoll
    Architecture:  amd64
    Disabled:      none
    Misc:          softdebug 
    LLVM:          supported, not enabled.
    GC:            sgen
meebey@redhorse:~$ dpkg -l|grep libsqlite3
ii  libsqlite3-0:amd64                                          3.8.7.4-1                              amd64        SQLite 3 shared library
ii  libsqlite3-0:i386                                           3.8.7.4-1                              i386         SQLite 3 shared library
ii  libsqlite3-dev:amd64                                        3.8.7.4-1                              amd64        SQLite 3 development files
meebey@redhorse:~$ 

Updated by Felipe Sateler 3476 days ago

avatar
 Version table:
     3.8.10.2-1 0
        500 http://httpredir.debian.org/debian/ sid/main amd64 Packages
 *** 3.8.9-2 0
        100 /var/lib/dpkg/status

I am now updating to 3.8.10.2-1, if that doesn't make a difference I will try downgrading to the version in jessie

Updated by Felipe Sateler 3476 days ago

avatar

No, updating to 3.8.10.2-1 does not fix the issue.

Updated by Felipe Sateler 3475 days ago

avatar

And 3.8.7.1-1+deb8u1 does not fix it either.

Updated by Mirco Bauer 3475 days ago

avatar

Too bad, then this issue must have been there since day one guess. Odd that nobody hit it in the past 6 months of testing SQLite (on the Smuxi cloud)

Updated by Mirco Bauer 3468 days ago

avatar
00:33:39 <fsateler> meebey: FWIW, I have been running under boehm and all is well :)
00:33:45 <fsateler> since we discovered this
00:33:51 <meebey> fsateler: how many days?
00:35:43 <fsateler> active (running) since Fri 2015-05-29 19:52:26 CLT; 6 days ago
00:36:22 <meebey> fsateler: how often did it crash with sgen?
00:39:43 <fsateler> each time the frontend connected to the server
00:39:48 <meebey> fsateler: aahhh!
00:39:59 <meebey> fsateler: so it is fixed/workarounded guaranteed

Updated by Mirco Bauer 3468 days ago

avatar

So Smuxi 1.0 will be shipped with using Mono on Boehm as GC instead of SGen

Updated by Mirco Bauer 3468 days ago

avatar
  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

Updated by Mirco Bauer 3468 days ago

avatar

What happens on Windows using .NET needs to be evaluated. If the binding is doing something bad then the issue should also show up on Windows, if Mono's GC is the issue then the crash shouldn't be reproducible on .NET

Updated by Mirco Bauer 3468 days ago

avatar

Updated by Mirco Bauer 3468 days ago

avatar

Also available in: Atom PDF