Find the answer to your Linux question:
Results 1 to 10 of 10
Hi, what is this error ? server got rebooted automatically and the servers are in CLUSTER. Sep 13 15:00:01 Dillbesg2 1[29182]: Sync interrupted, [[ERROR=1]] Sep 13 15:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Linux Newbie
    Join Date
    May 2012
    Posts
    110

    Production server reboot Issue ?


    Hi,

    what is this error ?

    server got rebooted automatically and the servers are in CLUSTER.

    Sep 13 15:00:01 Dillbesg2 1[29182]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-18,15:00:56 1638029 Seconds
    Sep 13 15:06:02 Dillbesg2 1[32275]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:12:01 Dillbesg2 1[2656]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:18:01 Dillbesg2 1[5474]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:24:01 Dillbesg2 1[8258]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:30:01 Dillbesg2 1[11037]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:36:01 Dillbesg2 1[14814]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:42:02 Dillbesg2 1[17880]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:48:01 Dillbesg2 1[21030]: Sync interrupted, [[ERROR=1]]
    Sep 13 15:54:01 Dillbesg2 1[24031]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:00:01 Dillbesg2 1[26820]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-18,16:00:56 1641629 Seconds
    Sep 13 16:06:01 Dillbesg2 1[29936]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:12:01 Dillbesg2 1[32668]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:18:01 Dillbesg2 1[3163]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:24:01 Dillbesg2 1[5909]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:30:01 Dillbesg2 1[8597]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:36:01 Dillbesg2 1[11523]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:42:01 Dillbesg2 1[15323]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:48:01 Dillbesg2 1[19188]: Sync interrupted, [[ERROR=1]]
    Sep 13 16:54:01 Dillbesg2 1[23285]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:00:01 Dillbesg2 1[26721]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:00:56 Dillbesg2 MR_MONITOR[17983]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-18,17:00:56 1645229 Seconds
    Sep 13 17:06:01 Dillbesg2 1[30146]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:12:01 Dillbesg2 1[5542]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:18:01 Dillbesg2 1[10553]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:24:01 Dillbesg2 1[14683]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:30:01 Dillbesg2 1[19017]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:36:01 Dillbesg2 1[24616]: Sync interrupted, [[ERROR=1]]
    Sep 13 17:42:02 Dillbesg2 1[29131]: Sync interrupted, [[ERROR=1]]





    Regards,

  2. #2
    Just Joined!
    Join Date
    Sep 2012
    Location
    Finland
    Posts
    96
    It looks like your servers are out of sync with each other or something is breaking the link between your servers.
    Need more info on your system to find put why.

  3. #3
    Linux Newbie
    Join Date
    May 2012
    Posts
    110
    Sep 23 04:03:02 Dillbesg6 syslogd 1.4.1: restart.
    Sep 23 05:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,05:00:32 303132 Seconds
    Sep 23 06:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,06:00:32 306732 Seconds
    Sep 23 07:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,07:00:32 310332 Seconds
    Sep 23 08:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,08:00:32 313932 Seconds
    Sep 23 09:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,09:00:32 317532 Seconds
    Sep 23 10:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,10:00:32 321132 Seconds
    Sep 23 11:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,11:00:32 324732 Seconds
    Sep 23 12:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,12:00:32 328332 Seconds
    Sep 23 13:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,13:00:32 331932 Seconds
    Sep 23 14:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,14:00:32 335532 Seconds
    Sep 23 15:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,15:00:32 339132 Seconds
    Sep 23 16:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,16:00:32 342732 Seconds
    Sep 23 17:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,17:00:32 346332 Seconds
    Sep 23 18:00:32 Dillbesg6 MR_MONITOR[5862]: <MRMON044> Controller ID: 0 Time established since power on: Time 2012-09-23,18:00:32 349932 Seconds


    what these errors are ?

  4. $spacer_open
    $spacer_close
  5. #4
    Linux Newbie
    Join Date
    May 2012
    Posts
    110
    The cluster installed on the servers is : OCFS2 Cluster (ORACLE CLUSTER FILE SYSTEM )

  6. #5
    Linux Newbie
    Join Date
    May 2012
    Posts
    110
    sOME mORE LOG FILES :


    Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (65,17): dlm has evicted node 1
    Sep 17 23:32:09 Dillbesg6 kernel: kjournald starting. Commit interval 5 seconds
    Sep 17 23:32:09 Dillbesg6 kernel: kjournald starting. Commit interval 5 seconds
    Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (65,81): dlm has evicted node 1
    Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (65,161): dlm has evicted node 1
    Sep 17 23:32:09 Dillbesg6 kernel: (o2net,12668,2):ocfs2_dlm_eviction_cb:98 device (66,145): dlm has evicted node 1
    Sep 17 23:32:09 Dillbesg6 kernel: kjournald starting. Commit interval 5 seconds
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13304,0):dlm_flush_asts:604 ERROR: status = -107
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12983,1):dlm_get_lock_resource:84 4 EE23ADAF3E0D4F129648842B36A54F39:$RECOVERY: at least one node (1) to re
    cover before lock mastery can begin
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13290,0):dlm_send_proxy_ast_msg:457 ERROR: status = -107
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13290,0):dlm_flush_asts:604 ERROR: status = -107
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,12786,0):dlm_send_proxy_ast_msg:457 ERROR: status = -107
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,12786,0):dlm_flush_asts:604 ERROR: status = -107
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13122,0):dlm_send_proxy_ast_msg:457 ERROR: status = -107
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_thread,13122,0):dlm_flush_asts:604 ERROR: status = -107
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12801,3):dlm_get_lock_resource:84 4 024E544C12614474AE17F834D3DF9A44:$RECOVERY: at least one node (1) to re
    cover before lock mastery can begin
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12801,3):dlm_get_lock_resource:87 8 024E544C12614474AE17F834D3DF9A44: recovery map is not empty, but must m
    aster $RECOVERY lock now
    Sep 17 23:32:09 Dillbesg6 kernel: (dlm_reco_thread,12801,3):dlm_do_recovery:524 (12801) Node 2 is the Recovery Master for the Dead Node 1 for Domain 024E544C
    12614474AE17F834D3DF9A44
    Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,12829,0):dlm_get_lock_resource:84 4 670B5DAF23634DA7BF8945C07A27F14E:$RECOVERY: at least one node (1) to re
    cover before lock mastery can begin
    Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,12829,0):dlm_get_lock_resource:87 8 670B5DAF23634DA7BF8945C07A27F14E: recovery map is not empty, but must m
    aster $RECOVERY lock now
    Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,12829,0):dlm_do_recovery:524 (12829) Node 2 is the Recovery Master for the Dead Node 1 for Domain 670B5DAF
    23634DA7BF8945C07A27F14E
    Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,13179,0):dlm_get_lock_resource:84 4 EF545CBC2C874314AAA9EA02171438FD:$RECOVERY: at least one node (1) to re
    cover before lock mastery can begin
    Sep 17 23:32:10 Dillbesg6kernel: (dlm_reco_thread,13179,0):dlm_get_lock_resource:87 8 EF545CBC2C874314AAA9EA02171438FD: recovery map is not empty, but must m
    aster $RECOVERY lock now
    Sep 17 23:32:10 Dillbesg6 kernel: (dlm_reco_thread,13179,0):dlm_do_recovery:524 (13179) Node 2 is the Recovery Master for the Dead Node 1 for Domain EF545CBC
    --More--(42%)

  7. #6
    Linux Newbie
    Join Date
    May 2012
    Posts
    110
    what is the above error ?

    why the servers are not in sync ?

    what is making the servers reboot every boot which are in clusters ?

  8. #7
    Just Joined!
    Join Date
    Sep 2012
    Location
    Finland
    Posts
    96
    Did you do any updates?

    How many computers have you in the cluster?
    By looking at your list here it does look like a computer or a hard drive is not kicking in (starting).
    Make sure ALL of your systems are running.
    Last edited by Peconet009; 09-23-2012 at 08:58 PM. Reason: Addition

  9. #8
    Linux Newbie
    Join Date
    May 2012
    Posts
    110
    [root@Dillbesg6 etc]# cd ocfs2
    [root@Dillbesg6 ocfs2]# ls
    cluster.conf
    [root@Dillbesg6 ocfs2]# more cluster.conf
    cluster:
    node_count=3
    name=ocfs2

    node:
    ip_port=7777
    ip_address=192.168.156.121
    number=0
    name=Dillbesg4
    cluster=ocfs2

    node:
    ip_port=7777
    ip_address=192.168.156.123
    number=1
    name=Dillbesg5
    cluster=ocfs2

    node:
    ip_port=7777
    ip_address=192.168.156.124
    number=2
    name=Dillbesg6
    cluster=ocfs2

  10. #9
    Linux Newbie
    Join Date
    May 2012
    Posts
    110
    Was the OCFS2 Filesystem problem ?

    Can this Filesystem be a problem ?

  11. #10
    Just Joined!
    Join Date
    Sep 2012
    Location
    Finland
    Posts
    96
    Your filesystem could be part of the problem but it is hard to really say without knowing how your hardware is setup.
    You will have to take a look/read at how Oracle can resolve this, sometimes their software is closed source.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •