Skip to Content.
Sympa Menu

overpass - Re: [overpass] apply_osc_to_db not handling dispatcher problems

Subject: Overpass API developpement

List archive

Re: [overpass] apply_osc_to_db not handling dispatcher problems


Chronological Thread 
  • From: mmd <>
  • To:
  • Subject: Re: [overpass] apply_osc_to_db not handling dispatcher problems
  • Date: Thu, 7 Sep 2017 20:01:21 +0200

Hi Roland,


Am 07.09.2017 um 17:48 schrieb Roland Olbricht:

>
> Thank you for the simulation. According to the information I find about
> SIGBUS, this happens if a shared memory file has been deleted. I found
> no information that it is related to sockets.
>
> Can you check whether both the shared memory and the socket exist before
> the incident? I.e. please run the commands
>

Both shared memory and the unix domain socket are still available after
the resume operation, and seem to be working ok.

According to the following "straces" I collected on both dispatcher and
osm3s_query processes:
- osm3s_query can read from shared memory, and send requests to the
correct unix domain socket without errors,
- the dispatcher correctly receives the data.

I tried the same without suspend/resume and the strace looks pretty much
the same.

The crash location template_db/dispatcher.cc:110 didn't make a lot of
sense to me, so I started looking for similar crash messages. One
notable thing wer some reports for Java applications triggering SIGBUS,
si_code=BUS_ADRERR, as shown below. They were related to a recent linux
kernel regression, which is supposed to be fixed in the 4.4.0-83 version
I ran my tests on :/

See: https://usn.ubuntu.com/usn/usn-3344-1/

_USN 3328-1 fixed a vulnerability in the Linux kernel. However, that
fix introduced regressions for some Java applications. This update
addresses the issue. We apologize for the inconvenience._

Tested on: Linux version 4.4.0-83-generic (buildd@lgw01-29) (gcc version
5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #106-Ubuntu SMP Mon Jun
26 17:54:43 UTC 2017

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

Let's see if Igor can reproduce the issue...


--------------------------------------------------------------
dispatcher:
--------------------------------------------------------------
accept(3, 0x7ffcb16cad60, 0x7ffcb16cad14) = -1 EAGAIN (Resource
temporarily unavailable)
select(1024, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 5
fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0
recvfrom(5, "\6\35\0\0", 4, 0, NULL, NULL) = 4
recvfrom(5, "\311\0\0\0", 4, 0, NULL, NULL) = 4
open("/home/ubuntu/p/transactions.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 6
lseek(6, 0, SEEK_END) = 9221
write(6, "2017-09-07 17:04:39 [7279] waite"..., 55) = 55
close(6) = 0
recvfrom(5, "\264\0\0\0", 4, 0, NULL, NULL) = 4
recvfrom(5, "\0\0\0 ", 4, 0, NULL, NULL) = 4
recvfrom(5, "\0\0\0\0", 4, 0, NULL, NULL) = 4
recvfrom(5, "\0\0\0\0", 4, 0, NULL, NULL) = 4
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x407fa0} ---
+++ killed by SIGBUS (core dumped) +++



--------------------------------------------------------------
osm3s_query:
--------------------------------------------------------------

rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART,
0x7efc7dece4b0}, {SIG_DFL, [], 0}, 8) = 0
statfs("/dev/shm/", {f_type="TMPFS_MAGIC", f_bsize=4096,
f_blocks=126996, f_bfree=126994, f_bavail=126994, f_files=126996,
f_ffree=126993, f_fsid={0, 0}, f_namelen=255, f_frsize=4096,
f_flags=38}) = 0
futex(0x7efc7de98310, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/dev/shm/osm3s_v0.7.54_osm_base", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0666, st_size=65, ...}) = 0
mmap(NULL, 65, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7efc7f371000
socket(PF_LOCAL, SOCK_STREAM, 0) = 4
connect(4, {sa_family=AF_LOCAL,
sun_path="/home/ubuntu/p//osm3s_v0.7.54_osm_base"}, 110) = 0
sendto(4, "\6\35\0\0", 4, 0, NULL, 0) = 4
open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
fstat(5, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
read(5,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096)
= 127
lseek(5, -71, SEEK_CUR) = 56
read(5,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 71
close(5) = 0
open("/home/ubuntu/p/transactions.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 5
lseek(5, 0, SEEK_END) = 9165
write(5, "2017-09-07 17:04:39 [7430] reque"..., 56) = 56
close(5) = 0
sendto(4, "\311\0\0\0", 4, 0, NULL, 0) = 4
sendto(4, "\264\0\0\0", 4, 0, NULL, 0) = 4
sendto(4, "\0\0\0 \0\0\0\0", 8, 0, NULL, 0) = 8
sendto(4, "\0\0\0\0", 4, 0, NULL, 0) = 4
recvfrom(4, "", 4, 0, NULL, NULL) = 0
select(1024, NULL, NULL, NULL, {0, 300000}) = 0 (Timeout)
sendto(4, "\311\0\0\0", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=7430,
si_uid=1000} ---
futex(0x7efc7e478680, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/home/ubuntu/p/transactions.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 5
lseek(5, 0, SEEK_END) = 9276
write(5, "2017-09-07 17:04:40 [7430] Dispa"..., 117) = 117
close(5) = 0
write(2, "runtime error: ", 15) = 15
write(2, "open64: 32 Broken pipe /osm3s_v0"..., 97) = 97
write(2, "\n", 1) = 1
exit_group(1) = ?
+++ exited with 1 +++


--------------------------------------------------------------
dispatcher stack trace for crash location
--------------------------------------------------------------

#0 Global_Resource_Planner::probe (this=this@entry=0x7ffc56ae5918,
pid=1638, client_token=client_token@entry=0,
time_units=time_units@entry=180, max_space=max_space@entry=536870912)
at template_db/dispatcher.cc:110
#1 0x000000000040909e in Dispatcher::standby_loop
(this=this@entry=0x7ffc56ae5770, milliseconds=milliseconds@entry=0) at
template_db/dispatcher.cc:659
#2 0x000000000040534f in main (argc=<optimized out>,
argv=0x7ffc56ae5b18) at overpass_api/dispatch/dispatcher_server.cc:472




..

cheers




Archive powered by MHonArc 2.6.19+.

Top of Page