Results 1 to 9 of 9
Hi,
General FTP Question
-------------------------
I have a problem where text files uploaded to my FTP server are only
transmitted partially. In other words if I open the text file ...
- 03-13-2003 #1Just Joined!
- Join Date
- Mar 2003
- Posts
- 5
General FTP & Scripting problem
Hi,
General FTP Question
-------------------------
I have a problem where text files uploaded to my FTP server are only
transmitted partially. In other words if I open the text file I might only
have half of the actual text inside the file. Is there any way that I can
prevent this from happening so that I only receive complete files?
FTP Script Problem
----------------------
Once a file has been uploaded it to my box a have script takes the file and
processes it. (The data from the text file is imported into a database and
the text file is archived.) If I receive a big file the file might still be
transferring when my script processes the file, this causes some problems.
(My script will only import the file partially and then archive it.) How can
I check if the file is complete before I process the file? I through about
using a cron job to move uploaded files to another directory and then
process them but I need to process the file immediately once it is uploaded.
FTP Server: vsFTPd orWU-FTPD
OS: Redhat Linux 8
Any suggestions?
Kind Regards,
Jason
- 03-13-2003 #2Linux Engineer
- Join Date
- Jan 2003
- Location
- Lebanon, pa
- Posts
- 994
There are many ways of doing this. I had to do something similar with apache logs and doing webstats. I had a script on my webstats server that would use rysnc to pull the apache logs off the webhosting server. Then the script would process all the logs and create a web stats pages in html and send them back to the webhosting server with rysnc. You could do something similar with a script that pulls the files with rysnc(you could also use wget or scp) and then runs your other script to process them. That way the script always runs after all files are transferred. When your text files are uploaded, are they from random clients or always from the same and at the sametime? Another idea is to use md5 sums to determine if the file is done uploading. You can upload the md5sum from the text file and have your script verify the md5 generated from the uploaded file to see if it is complete.
- 03-14-2003 #3Linux Guru
- Join Date
- Oct 2001
- Location
- Täby, Sweden
- Posts
- 7,578
This might only be me, but I'd alter the ftp daemon source to make it run a script when a file is completed. I thought of examining the logs, too, which would, of course, work perfectly, but I'd prefer having the script being run per notification by the ftp daemon. Uploading an MD5 sum seems to be too much work for me. It takes too much energy uploading two files every time.
And I know this is really cheating, but you could check the /proc/*/fd directories of the ftp daemons to see if they are currently holding the file open.
Alternatively, you could also just slightly alter the ftp daemon to make it flock() the file while it's uploading, and then having the script test that lock.
- 03-14-2003 #4Just Joined!
- Join Date
- Mar 2003
- Posts
- 5
Thanks for your replies.
genlee, if I use rysnc to pull the files of the server I might still get the incomplete file if it is still busy uploading.
The files are uploaded from specific clients all day long. A minimum of 10 files per minute is uploaded. (Yes, quite a busy box) Using an md5 file together with the uploaded data file would be great but clients would not be happy so that is not an option.
Dolda2000, I agree that changing the source would be perfect. The ftp server that we are using is part of a custom solution and any modification to the source will void all support contracts. This prevents me from changing the source code. Correct me if I'm wrong but checking /proc/*/fd will show me what locks are currently held by the ftp daemon, on a very busy server there will always be locks that would make it quite hard find the right files.
Any other ideas?
Kind Regards,
Jason
- 03-14-2003 #5Linux Enthusiast
- Join Date
- Jun 2002
- Location
- San Antonio
- Posts
- 621
why not use an NFS file system instead of ftping hundreds of files a day? I know that has support for file locking, that way you know if the NFS daemon is writing the file at any time. This might be a little better of a solution than FTP, but who am I to say?
I respectfully decline the invitation to join your delusion.
- 03-14-2003 #6Linux Engineer
- Join Date
- Jan 2003
- Location
- Lebanon, pa
- Posts
- 994
You can have rsync copy the files over to a temp dir and after copy is complete, it will move them to a dir that you want. Use the -T option with rsync to specify the temp directory you want to use.
- 03-14-2003 #7Linux Guru
- Join Date
- Oct 2001
- Location
- Täby, Sweden
- Posts
- 7,578
Well, you could still check the logs to see if a file has been completed. It's clearly the cleanest way to do it.
You really shouldn't use the /proc/*/fd method since it's quite dirty, but it is very possible to do it if you want to anyway. I don't know what locks you are referring to, but I know of no locks being listed in /proc/*/fd. The fd directories contain symlinks named after each file descriptor that the process holds. They each point to the path that was originally used to open the file. So therefore, if you have a given file which you want to see if it is in use be the FTP daemon, you could just use this function in a shell script:
As you can surely see, it is a little dirty, but that's just what you'll have to stand if you want to use that method.Code:function inuse() { for pid in `pidof ftpd`; do if ls -l /proc/$pid/fd 2>/dev/null | grep "$1" -q; then exit 0 fi done exit 1 } if inuse /home/user/file; then echo "In use!"; else enterdb /home/user/file; rm /home/user/file; fi
- 03-15-2003 #8Just Joined!
- Join Date
- Mar 2003
- Posts
- 5
Thanks for you detailed response I appreciate it. The script you included would be perfect.
As you probably noticed I'm not the most experience Linux admin around. (3 days experience and counting)
Based on the responses I got I basically have two options: (that applicable to my setup)
1. Monitor the logs files
2. /proc/*/fd
You said the method two is quite shaky, does that mean I should rather use method one. I think that method two would be the quicker one of the two.
Regards,
Jason
- 03-15-2003 #9Linux Guru
- Join Date
- Oct 2001
- Location
- Täby, Sweden
- Posts
- 7,578
Yes, method two is both easier and faster, but so is building a house if you just place bricks on top of each other. The house is much more sturdy if you cement them into place as well.
It's possible that I'm being over-cautious, though. This is mainly a portability issue, after all.


Reply With Quote
