Find the answer to your Linux question:
Results 1 to 7 of 7
Hi, I need to write a shell script that will be run from cron every minute or so. The script will use rsync to keep some folders across several machines ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    May 2010
    Posts
    23

    Making sure only one instance of script runs concurrently


    Hi,

    I need to write a shell script that will be run from cron every minute or so. The script will use rsync to keep some folders across several machines in sync. I am very new to shell scripting so some answers might seem obvious but they are not to me:

    1. How do I check at the beginning of the script if the script is already running and abort if it is? I saw that most processes use a PID file for that but I am not sure this is the right case for that and in fact how to do it exactly
    2. Assuming the list of target machines/dirs to sync are supposed to come from some config file. How should I code that? Just use the normal shell tools that I already know (cut etc) or is there a better way?

    Thanks,
    DB

  2. #2
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,387
    Not exactly an answer to your question, but you might want to look at this nice tool:
    lsyncd - Project Hosting on Google Code
    You must always face the curtain with a bow.

  3. #3
    Just Joined!
    Join Date
    May 2010
    Posts
    23
    Thanks for that Irithori.

    I actually posted a different question a while back asking about which tools I can use to pull this off. lsyncd did not come up.

    Seems like it's exactly what I need. One thing bothers me though... Reading the lsyncd project page, I see in "When to use" section:

    Lsyncd is designed to synchronize a local directory tree with low profile of expected changes to a remote mirror. Lsyncd is especially useful to sync data from a secure area to a not-so-secure area.
    Specifically the "low profile of expected changes" makes me wonder. I intend to use this to sync one master web server to multiple slave web server all of which (master and slaves) are hit very hard but on a relatively small number of files of relatively small size (a few MB max per each).

    [EDIT]
    Forgot to mention updates to these files is rare which I think is more to the point
    [/EDIT]

    So does it fit?


    Cheers,
    DB

  4. #4
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,387
    No way of telling before testing
    As this depends not only on lsyncd, but also on your setup, network, serverperformance, etc

    But to suggest one more idea:
    I run about 60 webservers for one website. (not including the DB servers + backend processing)

    In the past, we had a similar setup with rsync.
    - One master server running rsyncd
    - Each of the 60 webservers had a 15min cronjob (with a variation based on the last octet of its IP to distribute the load) to connect to rsyncd and update.
    - Via pssh, we were able to trigger a sync manually


    Disadvantages:
    - This procedure lacks a backchannel aka "Was it really updated on all servers and completely?"
    It always was, but the topic was more than once discussed between sysadmins and devs/QA if problems with the new version occured
    - An easy way of knowing what really is installed
    - The rsync server is a single point of failure:
    It being down is the slightest problem.
    But if someone for whatever reason copies unwanted content on it, it is distributed without further checking


    So we enhanced the procedure by producing RPMs of our php application.
    All our servers are kickstarted, registered to spacewalk and controlled by cfengine anyway, so this was natural development.

    Of course this setup is initially more work, but RPMs offer advantages:
    - RPMs are native to the (SuSE|RedHat|CentOS|fedora) package manager
    - Dependencies on other RPMed apps or system RPMs
    - Defined users and permissions
    - %pre and %post scripts
    - Installation is atomic. Either the RPM is installed or not.
    - The RPMs can be tested by QA before they go live.
    Thus enabling a controlled release process.
    - Versions can be gathered and summarized by either spacewalk and/or pssh.
    So you know exactly which server has what version.
    - Rollbacks on one or all servers are easy
    You must always face the curtain with a bow.

  5. #5
    Just Joined!
    Join Date
    Dec 2010
    Posts
    1
    This "low profile thing" from the homepage is to be considered when comparing to when using DRBD or GlusterFS. A periodic cronjob will be in anyway inferior to any of the three solutions. Lsyncd has the advantage its easier to install than DRBD or GlusterFS, and does not affect local performance as it works on "a staff" instead of putting itself between application and harddisc like DRBD or GlusterFS do. These two might be more approperiate if you sync databases or want to do other more complicated stuff.

    A relative small number of relaitve small size is "low profile" IMHO.

  6. #6
    Just Joined!
    Join Date
    May 2010
    Posts
    23
    OK, points all taken.

    As another approach... anyone ever consider using SVN (or another form of version control) for this task? I can imagine scenarios where I'd use SVN exclusively or in conjunction with something like lsyncd. This way I get not only the replication part but also content management, the ability to roll back production and the ability to audit changes.

    Only piece missing for me is how I'd go about automating the whole thing (when using SVN).

  7. #7
    Trusted Penguin Irithori's Avatar
    Join Date
    May 2009
    Location
    Munich
    Posts
    3,387
    svn is technically possible, and you can update multiple machines either with cfengine/puppet or a terminal multiplexer like pssh/cssh.

    But I see three problems:
    1) unlike rsync and friends and also RPM, svn will stop if it encounters locally modified files.
    So an update is not in every case just a "svn up".
    You will have cases, that need manual intervention.
    One thinkable might be: Co-worker manually fiddled with the files on a production box.

    Suppose, one of the production servers got compromised:
    2) Then the attacker not only has a snapshot of your data/sourcecode,
    but also the whole history, log, etc.

    3) You should at least avoid to use a system account to connect to the svn server.
    (Please note, I didnt say: do not use authentication.
    I said: Do not use a system account )

    Because with a system account for svn, the attacker is then already close on having an account on your backend system (svn server)


    So no: I wouldnt use svn in that way.
    You must always face the curtain with a bow.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •