Bug #1062
Crash when persistency is sqlite and number of messages is large enough
Status: | Closed | Start: | 05/24/2015 | |
Priority: | Urgent | Due date: | ||
Assigned to: | Mirco Bauer | % Done: | 100% |
|
Category: | Engine | |||
Target version: | 1.0 | |||
Complexity: | Medium |
Found in Version: | ||
Votes: | 0 |
Description
When I use persistency sqlite, I get crashes on frontend reconnect. This only appears to happen when a certain amount of messages are stored: if I clean up the buffer store, the problem goes away.
Attached are the gdb log and debug log (from make run-server).
I have noticed (but cannot guarantee this is always the case), that the crashes tend to occur when the frontend is starting up, if connection is lost and the frontend reconnects, the crash is less frequent.
If there is anything else I can do to help debug this issue, do not hesitate to ask.
Associated revisions
Revision a84de167cd060b1412792964922caddb0fcaf712
Engine: handle exceptions of SQLite message buffer during sync (refs: #1062)
Non-volatile backends have higher chances of race conditions and other serious
bugs that can lead to crashes because their complexity is much higher. For that
reason Smuxi handled exceptions of Db4o and should also do for all other
non-volatile ones like SQLite.
Revision b4b87f46ec4457ca8da31329bb9f784c6e119fc1
Server, Frontend-GNOME: use Boehm as GC instead of SGen (closes: #1062)
Mono 3.2.8 (and possibly newer versions) SEGVs in Mono.Data.Sqlite when the GC
is SGen and huge datasets are read from more than one thread at the same time
using different databases. With Boehm this does not happen. Since Smuxi switched
to SQLite by default this is a show stopper for the 1.0 release and thus Boehm
will be used to workaround this issue till the Mono SGen GC or the
Mono.Data.Sqlite binding will be fixed in a later Mono version. Or the used
SQLite binding will be replaced with sqlite-net [0] or Hyena's SQLite binding [1].
https://github.com/praeclarum/sqlite-net
https://github.com/GNOME/hyena/tree/master/Hyena.Data.Sqlite
History
Updated by Felipe Sateler 3523 days ago
Forgot to say that both client and server are git from today.
Updated by Mirco Bauer 3523 days ago
Felipe Sateler wrote:
if I clean up the buffer store, the problem goes away.
How do you clean up your buffer store, do you delete the sqlite files or do you truncate the Messages table?
Updated by Mirco Bauer 3522 days ago
- Category set to Engine
- Assigned to set to Mirco Bauer
- Priority changed from Normal to Urgent
- Target version set to 1.0
- Complexity set to Medium
Updated by Felipe Sateler 3520 days ago
A new datapoint: connecting with "low bandwidth mode" makes smuxi not crash. This also suggests the problem is when syncing a relatively large number of messages.
Updated by Mirco Bauer 3519 days ago
Felipe Sateler wrote:
A new datapoint: connecting with "low bandwidth mode" makes smuxi not crash. This also suggests the problem is when syncing a relatively large number of messages.
Indeed, what are your buffer settings then? Please provide all numbers from Preferences -> Interface -> General -> Message Buffer
Updated by Felipe Sateler 3519 days ago
Buffer Lines: 6999
Engine Buffer Lines: 6999
Persistency: SQLite (recommended)
Volatile Buffer lines: 10000
Persistent Buffer Lines: 50000
Updated by Mirco Bauer 3519 days ago
- Status changed from New to Assigned
I can finally reproduce this issue with your history and auto-connect + auto-join. it seems to be a stress/race issue.
meebey@redhorse:~$ mono -V Mono JIT compiler version 3.2.8 (Debian 3.2.8+dfsg-9) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notifications: epoll Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen meebey@redhorse:~$ dpkg -l|grep libsqlite3 ii libsqlite3-0:amd64 3.8.7.4-1 amd64 SQLite 3 shared library ii libsqlite3-0:i386 3.8.7.4-1 i386 SQLite 3 shared library ii libsqlite3-dev:amd64 3.8.7.4-1 amd64 SQLite 3 development files meebey@redhorse:~$
Updated by Felipe Sateler 3518 days ago
Version table: 3.8.10.2-1 0 500 http://httpredir.debian.org/debian/ sid/main amd64 Packages *** 3.8.9-2 0 100 /var/lib/dpkg/status
I am now updating to 3.8.10.2-1, if that doesn't make a difference I will try downgrading to the version in jessie
Updated by Mirco Bauer 3518 days ago
Too bad, then this issue must have been there since day one guess. Odd that nobody hit it in the past 6 months of testing SQLite (on the Smuxi cloud)
Updated by Mirco Bauer 3511 days ago
00:33:39 <fsateler> meebey: FWIW, I have been running under boehm and all is well :) 00:33:45 <fsateler> since we discovered this 00:33:51 <meebey> fsateler: how many days? 00:35:43 <fsateler> active (running) since Fri 2015-05-29 19:52:26 CLT; 6 days ago 00:36:22 <meebey> fsateler: how often did it crash with sgen? 00:39:43 <fsateler> each time the frontend connected to the server 00:39:48 <meebey> fsateler: aahhh! 00:39:59 <meebey> fsateler: so it is fixed/workarounded guaranteed
Updated by Mirco Bauer 3511 days ago
So Smuxi 1.0 will be shipped with using Mono on Boehm as GC instead of SGen
Updated by Mirco Bauer 3511 days ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
Applied in changeset b4b87f46ec4457ca8da31329bb9f784c6e119fc1.
Updated by Mirco Bauer 3511 days ago
What happens on Windows using .NET needs to be evaluated. If the binding is doing something bad then the issue should also show up on Windows, if Mono's GC is the issue then the crash shouldn't be reproducible on .NET
Updated by Mirco Bauer 3510 days ago
Reported Mono upstream here: https://bugzilla.xamarin.com/show_bug.cgi?id=30864
Updated by Mirco Bauer 3510 days ago
- File thread-dump.txt added