Find the answer to your Linux question:
Results 1 to 9 of 9
Hi, General FTP Question ------------------------- I have a problem where text files uploaded to my FTP server are only transmitted partially. In other words if I open the text file ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Mar 2003
    Posts
    5

    General FTP & Scripting problem


    Hi,
    General FTP Question
    -------------------------
    I have a problem where text files uploaded to my FTP server are only
    transmitted partially. In other words if I open the text file I might only
    have half of the actual text inside the file. Is there any way that I can
    prevent this from happening so that I only receive complete files?

    FTP Script Problem
    ----------------------
    Once a file has been uploaded it to my box a have script takes the file and
    processes it. (The data from the text file is imported into a database and
    the text file is archived.) If I receive a big file the file might still be
    transferring when my script processes the file, this causes some problems.
    (My script will only import the file partially and then archive it.) How can
    I check if the file is complete before I process the file? I through about
    using a cron job to move uploaded files to another directory and then
    process them but I need to process the file immediately once it is uploaded.

    FTP Server: vsFTPd orWU-FTPD
    OS: Redhat Linux 8

    Any suggestions?

    Kind Regards,
    Jason

  2. #2
    Linux Engineer
    Join Date
    Jan 2003
    Location
    Lebanon, pa
    Posts
    994
    There are many ways of doing this. I had to do something similar with apache logs and doing webstats. I had a script on my webstats server that would use rysnc to pull the apache logs off the webhosting server. Then the script would process all the logs and create a web stats pages in html and send them back to the webhosting server with rysnc. You could do something similar with a script that pulls the files with rysnc(you could also use wget or scp) and then runs your other script to process them. That way the script always runs after all files are transferred. When your text files are uploaded, are they from random clients or always from the same and at the sametime? Another idea is to use md5 sums to determine if the file is done uploading. You can upload the md5sum from the text file and have your script verify the md5 generated from the uploaded file to see if it is complete.

  3. #3
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    This might only be me, but I'd alter the ftp daemon source to make it run a script when a file is completed. I thought of examining the logs, too, which would, of course, work perfectly, but I'd prefer having the script being run per notification by the ftp daemon. Uploading an MD5 sum seems to be too much work for me. It takes too much energy uploading two files every time.
    And I know this is really cheating, but you could check the /proc/*/fd directories of the ftp daemons to see if they are currently holding the file open.
    Alternatively, you could also just slightly alter the ftp daemon to make it flock() the file while it's uploading, and then having the script test that lock.

  4. #4
    Just Joined!
    Join Date
    Mar 2003
    Posts
    5
    Thanks for your replies.

    genlee, if I use rysnc to pull the files of the server I might still get the incomplete file if it is still busy uploading.

    The files are uploaded from specific clients all day long. A minimum of 10 files per minute is uploaded. (Yes, quite a busy box) Using an md5 file together with the uploaded data file would be great but clients would not be happy so that is not an option.

    Dolda2000, I agree that changing the source would be perfect. The ftp server that we are using is part of a custom solution and any modification to the source will void all support contracts. This prevents me from changing the source code. Correct me if I'm wrong but checking /proc/*/fd will show me what locks are currently held by the ftp daemon, on a very busy server there will always be locks that would make it quite hard find the right files.

    Any other ideas?

    Kind Regards,
    Jason

  5. #5
    Linux Enthusiast
    Join Date
    Jun 2002
    Location
    San Antonio
    Posts
    621
    why not use an NFS file system instead of ftping hundreds of files a day? I know that has support for file locking, that way you know if the NFS daemon is writing the file at any time. This might be a little better of a solution than FTP, but who am I to say?
    I respectfully decline the invitation to join your delusion.

  6. #6
    Linux Engineer
    Join Date
    Jan 2003
    Location
    Lebanon, pa
    Posts
    994
    You can have rsync copy the files over to a temp dir and after copy is complete, it will move them to a dir that you want. Use the -T option with rsync to specify the temp directory you want to use.

  7. #7
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    Well, you could still check the logs to see if a file has been completed. It's clearly the cleanest way to do it.
    You really shouldn't use the /proc/*/fd method since it's quite dirty, but it is very possible to do it if you want to anyway. I don't know what locks you are referring to, but I know of no locks being listed in /proc/*/fd. The fd directories contain symlinks named after each file descriptor that the process holds. They each point to the path that was originally used to open the file. So therefore, if you have a given file which you want to see if it is in use be the FTP daemon, you could just use this function in a shell script:
    Code:
    function inuse()
    {
        for pid in `pidof ftpd`; do
            if ls -l /proc/$pid/fd 2>/dev/null | grep "$1" -q; then
                exit 0
            fi
        done
        exit 1
    }
    
    if inuse /home/user/file; then echo "In use!"; else enterdb /home/user/file; rm /home/user/file; fi
    As you can surely see, it is a little dirty, but that's just what you'll have to stand if you want to use that method.

  8. #8
    Just Joined!
    Join Date
    Mar 2003
    Posts
    5
    Thanks for you detailed response I appreciate it. The script you included would be perfect.

    As you probably noticed I'm not the most experience Linux admin around. (3 days experience and counting)

    Based on the responses I got I basically have two options: (that applicable to my setup)

    1. Monitor the logs files
    2. /proc/*/fd

    You said the method two is quite shaky, does that mean I should rather use method one. I think that method two would be the quicker one of the two.

    Regards,
    Jason

  9. #9
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    Yes, method two is both easier and faster, but so is building a house if you just place bricks on top of each other. The house is much more sturdy if you cement them into place as well.
    It's possible that I'm being over-cautious, though. This is mainly a portability issue, after all.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •