Bug #2928: slapd Upstart status is out of control if BDB is corrupted - NethServer 6 - NethServer.org

Bug #2928

« Previous | Next »

slapd Upstart status is out of control if BDB is corrupted

Added by Davide Principi almost 7 years ago. Updated almost 7 years ago.

Status:

CLOSED

Start date:

Priority:

Normal

Due date:

Assignee:

% Done:

100%

Category:

nethserver-directory

Target version:

v6.5

Security class:

Resolution:

Affected version:

v6.5-final

NEEDINFO:

Description

A temporary power loss left a BDB log file corrupted. The db_recover command fixes it, but a problem on restarting slapd persists.

I tried to reproduce the same condition manually:

warning make a backup of your BDB files

  cd /var/lib/ldap
  mkdir backup
  cp __db.00* log.* backup/
  for F in __db.* log.*; do dd if=/dev/urandom of=$F count=10; done
  ldapsearch -Y EXTERNAL uid=admin

BOOM

In /var/log/messages:

Oct 24 17:04:28 localhost kernel: slapd[15093] general protection ip:7f9f94c525c8 sp:7f9f90f7c310 error:0 in libdb-4.7.so[7f9f94c29000+16f000]
ldap_result: Can't contact LDAP server (-1)
[root@davidep2 ldap]# Oct 24 17:04:28 localhost init: slapd main process (15087) killed by SEGV signal
Oct 24 17:04:28 localhost init: slapd main process ended, respawning
Oct 24 17:04:28 localhost nslcd[15099]: caught signal SIGTERM (15), shutting down
Oct 24 17:04:28 localhost nslcd[15099]: version 0.7.5 bailing out
Oct 24 17:04:28 localhost init: nslcd main process (15099) terminated with status 1
Oct 24 17:04:29 localhost init: slapd main process (15116) terminated with status 1
Oct 24 17:04:29 localhost init: slapd main process ended, respawning

On root's console:

[root@davidep2 ldap]# ps axf
[...]
15117 ?        Ss     0:00 /bin/sh -e /dev/fd/10
15122 ?        S      0:00  \_ sleep 3
[root@davidep2 ldap]# ldapsearch -Y EXTERNAL uid=admin
ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1)
[root@davidep2 ldap]# status slapd
slapd respawn/post-start, (post-start) process 15117

The slapd upstart task is blocked in respawn/post-start state. It persists into an infinite loop of ldapwhoami commands. See slapd.conf, line 51.

Related issues

Associated revisions

Revision 90a5d20f
Added by Davide Principi almost 7 years ago

hosts.allow/deny templates: removed slapd fragment. Refs #2928 #2785

Revision 92acb087
Added by Davide Principi almost 7 years ago

slapd: send "stat" log messages to syslog, in /var/log/messages. Refs #2928

Revision 566dae2a
Added by Davide Principi almost 7 years ago

slapd.conf Upstart script: avoid infinite loop on post-start. Refs #2928

- Kill dangling post-start scripts on pre-stop stanza.
- Stop respawing after 4 attempts in a minute.

Revision 5ac1d1d4
Added by Davide Principi almost 7 years ago

fix_accounts script. Refs #2928

This helper script completes the creation of user and group accounts,
wherever the Accounts DB entry is not present in system databases (see
getent).

Revision 0e1f6444
Added by Davide Principi almost 7 years ago

fix_accounts helper script. Refs #2928

This helper script completes the creation of user and group accounts,
wherever the Accounts DB entry is not present in system databases (see
getent).

Revision a7606cda
Added by Davide Principi almost 7 years ago

Merge branch 'b2928'. Refs #2928

Revision 30a41214
Added by Davide Principi almost 7 years ago

Reverted original slapd/LogLevel value. Refs #2928

Default LogLevel=0

Revision ac17966e
Added by Davide Principi almost 7 years ago

Simplified slapd LogLevel semantics. Refs #2928

- removed rsyslog configuration template
- LogLevel prop value is passed as-is to slapd daemon
- LogLevel default is 0
- rsyslogd is restarted at the end of update event
- log messages are directed to /var/log/slapd
- added logrotate configuration

History

#1 Updated by Davide Principi almost 7 years ago

Related to Enhancement #2785: Drop TCP wrappers hosts.allow hosts.deny templates added

#2 Updated by Davide Principi almost 7 years ago

Status changed from TRIAGED to ON_DEV
Assignee set to Davide Principi
% Done changed from 20 to 30

#3 Updated by Davide Principi almost 7 years ago

Description updated (diff)

#4 Updated by Davide Principi almost 7 years ago

Status changed from ON_DEV to MODIFIED
Assignee deleted (~~Davide Principi~~)
% Done changed from 30 to 60

In branch b2928 (see revision history for details).

#5 Updated by Davide Principi almost 7 years ago

Subject changed from Upstart status slapd corrupted BDB to slapd Upstart status is out of control if BDB is corrupted
Description updated (diff)

#6 Updated by Davide Principi almost 7 years ago

Status changed from MODIFIED to ON_QA
% Done changed from 60 to 70

Added also fix_accounts helper script in doc directory (see git changelog for details)

In nethserver-testing:
nethserver-directory-2.0.3-1.7gita7606cd.ns6.noarch.rpm

#7 Updated by Giacomo Sanchietti almost 7 years ago

Assignee set to Giacomo Sanchietti

#8 Updated by Giacomo Sanchietti almost 7 years ago

Status changed from ON_QA to VERIFIED
Assignee deleted (~~Giacomo Sanchietti~~)
% Done changed from 70 to 90

VERIFIED

After breaking the db, the server will fail to start:

[root@localhost ldap]# start slapd
start: Job failed to start
[root@localhost ldap]#

If the db is good the server can be correctly started/restarted/stopped.

#9 Updated by Giacomo Sanchietti almost 7 years ago

Status changed from VERIFIED to ON_QA
% Done changed from 90 to 70

#10 Updated by Giacomo Sanchietti almost 7 years ago

Status changed from ON_QA to TRIAGED
% Done changed from 70 to 20

Back to triaged just to change this:

remove patch for syslog
default LogLevel to 0 is good

#11 Updated by Davide Principi almost 7 years ago

Status changed from TRIAGED to ON_DEV
Assignee set to Davide Principi
% Done changed from 20 to 30

#12 Updated by Davide Principi almost 7 years ago

Status changed from ON_DEV to MODIFIED
Assignee deleted (~~Davide Principi~~)
% Done changed from 30 to 60

See commit message for details

#13 Updated by Davide Principi almost 7 years ago

Status changed from MODIFIED to ON_QA
% Done changed from 60 to 70

In nethserver-testing:
nethserver-directory-2.0.3-1.10git4d83597.ns6.noarch.rpm

#14 Updated by Giacomo Sanchietti almost 7 years ago

Assignee set to Giacomo Sanchietti

#15 Updated by Giacomo Sanchietti almost 7 years ago

Status changed from ON_QA to VERIFIED
Assignee deleted (~~Giacomo Sanchietti~~)
% Done changed from 70 to 90