Results 1 to 10 of 10
Hi,
what is this error ?
server got rebooted automatically and the servers are in CLUSTER.
Sep 13 15:00:01 Dillbesg2 1[29182]: Sync interrupted, [[ERROR=1]]
Sep 13 15:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 09-22-2012 #1Linux Newbie
- Join Date
- May 2012
- Posts
- 109
Production server reboot Issue ?
Hi,
what is this error ?
server got rebooted automatically and the servers are in CLUSTER.
Sep 13 15:00:01 Dillbesg2 1[29182]: Sync interrupted, [[ERROR=1]]
Sep 13 15:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-18,15:00:56 1638029 Seconds
Sep 13 15:06:02 Dillbesg2 1[32275]: Sync interrupted, [[ERROR=1]]
Sep 13 15:12:01 Dillbesg2 1[2656]: Sync interrupted, [[ERROR=1]]
Sep 13 15:18:01 Dillbesg2 1[5474]: Sync interrupted, [[ERROR=1]]
Sep 13 15:24:01 Dillbesg2 1[8258]: Sync interrupted, [[ERROR=1]]
Sep 13 15:30:01 Dillbesg2 1[11037]: Sync interrupted, [[ERROR=1]]
Sep 13 15:36:01 Dillbesg2 1[14814]: Sync interrupted, [[ERROR=1]]
Sep 13 15:42:02 Dillbesg2 1[17880]: Sync interrupted, [[ERROR=1]]
Sep 13 15:48:01 Dillbesg2 1[21030]: Sync interrupted, [[ERROR=1]]
Sep 13 15:54:01 Dillbesg2 1[24031]: Sync interrupted, [[ERROR=1]]
Sep 13 16:00:01 Dillbesg2 1[26820]: Sync interrupted, [[ERROR=1]]
Sep 13 16:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-18,16:00:56 1641629 Seconds
Sep 13 16:06:01 Dillbesg2 1[29936]: Sync interrupted, [[ERROR=1]]
Sep 13 16:12:01 Dillbesg2 1[32668]: Sync interrupted, [[ERROR=1]]
Sep 13 16:18:01 Dillbesg2 1[3163]: Sync interrupted, [[ERROR=1]]
Sep 13 16:24:01 Dillbesg2 1[5909]: Sync interrupted, [[ERROR=1]]
Sep 13 16:30:01 Dillbesg2 1[8597]: Sync interrupted, [[ERROR=1]]
Sep 13 16:36:01 Dillbesg2 1[11523]: Sync interrupted, [[ERROR=1]]
Sep 13 16:42:01 Dillbesg2 1[15323]: Sync interrupted, [[ERROR=1]]
Sep 13 16:48:01 Dillbesg2 1[19188]: Sync interrupted, [[ERROR=1]]
Sep 13 16:54:01 Dillbesg2 1[23285]: Sync interrupted, [[ERROR=1]]
Sep 13 17:00:01 Dillbesg2 1[26721]: Sync interrupted, [[ERROR=1]]
Sep 13 17:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-18,17:00:56 1645229 Seconds
Sep 13 17:06:01 Dillbesg2 1[30146]: Sync interrupted, [[ERROR=1]]
Sep 13 17:12:01 Dillbesg2 1[5542]: Sync interrupted, [[ERROR=1]]
Sep 13 17:18:01 Dillbesg2 1[10553]: Sync interrupted, [[ERROR=1]]
Sep 13 17:24:01 Dillbesg2 1[14683]: Sync interrupted, [[ERROR=1]]
Sep 13 17:30:01 Dillbesg2 1[19017]: Sync interrupted, [[ERROR=1]]
Sep 13 17:36:01 Dillbesg2 1[24616]: Sync interrupted, [[ERROR=1]]
Sep 13 17:42:02 Dillbesg2 1[29131]: Sync interrupted, [[ERROR=1]]
Regards,
- 09-22-2012 #2Just Joined!
- Join Date
- Sep 2012
- Location
- Finland
- Posts
- 88
It looks like your servers are out of sync with each other or something is breaking the link between your servers.
Need more info on your system to find put why.
- 09-23-2012 #3Linux Newbie
- Join Date
- May 2012
- Posts
- 109
Sep 23 04:03:02 Dillbesg6 syslogd 1.4.1: restart.
Sep 23 05:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,05:00:32 303132 Seconds
Sep 23 06:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,06:00:32 306732 Seconds
Sep 23 07:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,07:00:32 310332 Seconds
Sep 23 08:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,08:00:32 313932 Seconds
Sep 23 09:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,09:00:32 317532 Seconds
Sep 23 10:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,10:00:32 321132 Seconds
Sep 23 11:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,11:00:32 324732 Seconds
Sep 23 12:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,12:00:32 328332 Seconds
Sep 23 13:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,13:00:32 331932 Seconds
Sep 23 14:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,14:00:32 335532 Seconds
Sep 23 15:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,15:00:32 339132 Seconds
Sep 23 16:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,16:00:32 342732 Seconds
Sep 23 17:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,17:00:32 346332 Seconds
Sep 23 18:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,18:00:32 349932 Seconds
what these errors are ?
- 09-23-2012 #4Linux Newbie
- Join Date
- May 2012
- Posts
- 109
The cluster installed on the servers is : OCFS2 Cluster (ORACLE CLUSTER FILE SYSTEM )
- 09-23-2012 #5Linux Newbie
- Join Date
- May 2012
- Posts
- 109
sOME mORE LOG FILES :
Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (65,17): dlm has evicted node 1
Sep 17 23:32:09 Dillbesg6 kernel: kjournald starting. Commit interval 5 seconds
Sep 17 23:32:09 Dillbesg6 kernel: kjournald starting. Commit interval 5 seconds
Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (65,81): dlm has evicted node 1
Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (65,161): dlm has evicted node 1
Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (66,145): dlm has evicted node 1
Sep 17 23:32:09 Dillbesg6 kernel: kjournald starting. Commit interval 5 seconds
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13304,0):dlm_flush_asts:604 ERROR: status = -107
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12983,1):dlm_get_lock_resource:84 4 EE23ADAF3E0D4F129648842B36A54F39:$RECOVERY: at least one node (1) to re
cover before lock mastery can begin
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13290,0):dlm_send_proxy_ast_msg:457 ERROR: status = -107
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13290,0):dlm_flush_asts:604 ERROR: status = -107
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,12786,0):dlm_send_proxy_ast_msg:457 ERROR: status = -107
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,12786,0):dlm_flush_asts:604 ERROR: status = -107
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13122,0):dlm_send_proxy_ast_msg:457 ERROR: status = -107
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13122,0):dlm_flush_asts:604 ERROR: status = -107
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12801,3):dlm_get_lock_resource:84 4 024E544C12614474AE17F834D3DF9A44:$RECOVERY: at least one node (1) to re
cover before lock mastery can begin
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12801,3):dlm_get_lock_resource:87 8 024E544C12614474AE17F834D3DF9A44: recovery map is not empty, but must m
aster $RECOVERY lock now
Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12801,3):dlm_do_recovery:524 (12801) Node 2 is the Recovery Master for the Dead Node 1 for Domain 024E544C
12614474AE17F834D3DF9A44
Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,12829,0):dlm_get_lock_resource:84 4 670B5DAF23634DA7BF8945C07A27F14E:$RECOVERY: at least one node (1) to re
cover before lock mastery can begin
Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,12829,0):dlm_get_lock_resource:87 8 670B5DAF23634DA7BF8945C07A27F14E: recovery map is not empty, but must m
aster $RECOVERY lock now
Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,12829,0):dlm_do_recovery:524 (12829) Node 2 is the Recovery Master for the Dead Node 1 for Domain 670B5DAF
23634DA7BF8945C07A27F14E
Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,13179,0):dlm_get_lock_resource:84 4 EF545CBC2C874314AAA9EA02171438FD:$RECOVERY: at least one node (1) to re
cover before lock mastery can begin
Sep 17 23:32:10 Dillbesg6kernel: (dlm_reco_thread,13179,0):dlm_get_lock_resource:87 8 EF545CBC2C874314AAA9EA02171438FD: recovery map is not empty, but must m
aster $RECOVERY lock now
Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,13179,0):dlm_do_recovery:524 (13179) Node 2 is the Recovery Master for the Dead Node 1 for Domain EF545CBC
--More--(42%)
- 09-23-2012 #6Linux Newbie
- Join Date
- May 2012
- Posts
- 109
what is the above error ?
why the servers are not in sync ?
what is making the servers reboot every boot which are in clusters ?
- 09-23-2012 #7Just Joined!
- Join Date
- Sep 2012
- Location
- Finland
- Posts
- 88
Did you do any updates?
How many computers have you in the cluster?
By looking at your list here it does look like a computer or a hard drive is not kicking in (starting).
Make sure ALL of your systems are running.Last edited by Peconet009; 09-23-2012 at 07:58 PM. Reason: Addition
- 09-24-2012 #8Linux Newbie
- Join Date
- May 2012
- Posts
- 109
[root@Dillbesg6 etc]# cd ocfs2
[root@Dillbesg6 ocfs2]# ls
cluster.conf
[root@Dillbesg6 ocfs2]# more cluster.conf
cluster:
node_count=3
name=ocfs2
node:
ip_port=7777
ip_address=192.168.156.121
number=0
name=Dillbesg4
cluster=ocfs2
node:
ip_port=7777
ip_address=192.168.156.123
number=1
name=Dillbesg5
cluster=ocfs2
node:
ip_port=7777
ip_address=192.168.156.124
number=2
name=Dillbesg6
cluster=ocfs2
- 09-24-2012 #9Linux Newbie
- Join Date
- May 2012
- Posts
- 109
Was the OCFS2 Filesystem problem ?
Can this Filesystem be a problem ?
- 09-24-2012 #10Just Joined!
- Join Date
- Sep 2012
- Location
- Finland
- Posts
- 88
Your filesystem could be part of the problem but it is hard to really say without knowing how your hardware is setup.
You will have to take a look/read at how Oracle can resolve this, sometimes their software is closed source.


Reply With Quote
