Results 1 to 5 of 5
I'm new to shell scripts and I need one that can edit a large txt file. Every line in the file has a large number on it, and I need ...
- 02-10-2009 #1
need help; shell script for editing large txt files
I'm new to shell scripts and I need one that can edit a large txt file. Every line in the file has a large number on it, and I need to remove lines containing any number with 4 or more repeating characters anywhere within the number ie 123456789 would be fine but 111123456, 023444456, and 018399991 all would not. I tried reading a guide on shell scripting, but its taking me too long to figure it out and I need this done for work asap. Any help is very appreciated. Sorry if my post and explanation sounds rushed, but it was.
- 02-10-2009 #2Linux User
- Join Date
- Jan 2007
- Location
- cleveland
- Posts
- 452
welcome to the forum
you can use awk's interval expressions, like this:
awk --posix '/1{4}|4{4}|9{4}/ {print}' <filename
add the other alternatives as you like. r{4} matches at least 4 r'sthe sun is new every day (heraclitus)
- 02-10-2009 #3
- 02-10-2009 #4Just Joined!
- Join Date
- Oct 2004
- Posts
- 62
Hi otkaz,
I have seen that you already received the help of tpl...
I don't have time to do your homework as you requested (shell script)...
But being an exercise for me, I solved your problem in python.
I always use python (instead of bash) whenever possible.Code:# rmlines.py # load text file f=open('xxx.txt') sF=f.read() # all the file in one big string (sF) f.close() for n in range(10): # n = 0, 1...9 sC4 = str(n) * 4 # sC4 -> 0000, 1111, 2222 ... if sF.find(sC4) > -1: # if you find a number with a repeated sequence sF = sF.replace(sC4, '~') # replace the sequence everywhere in sF with tilde lF=sF.split('\n') # generate a list of all the lines (w/out LF) lNew=[] # init. new list w/out the wrong lines (those w/ prohibited numbers) for sLine in lF: if sLine.find('~') == -1: # if the line doesn't contain tilde lNew.append(sLine) # add one line to the new list fOut=open('yyy.txt','w') # save new file in yyy.txt fOut.write('\n'.join(lNew)) fOut.close() # run with python rmlines.py from the dir where there is xxx.txt # For ex. xxx.txt is: # I'm new to shell scripts # and I need one that can edit a large txt file. # Every line in the file has a large number on it, # and I need to remove lines containing any number # with 4 or more repeating characters anywhere # within the number ie 123456789 would be fine # but 111123456, # or also # 023444456, # and 018399991 all would not. # I tried reading a guide on shell scripting, # but its taking me too long to figure it out and I need this done for work asap. # Any help is very appreciated. # Sorry if my post and explanation sounds rushed, but it was. # you obtain yyy.txt: # I'm new to shell scripts # and I need one that can edit a large txt file. # Every line in the file has a large number on it, # and I need to remove lines containing any number # with 4 or more repeating characters anywhere # within the number ie 123456789 would be fine # or also # I tried reading a guide on shell scripting, # but its taking me too long to figure it out and I need this done for work asap. # Any help is very appreciated. # Sorry if my post and explanation sounds rushed, but it was.
Generally every Linux distribution has it (control by entering python in the shell).
The script seems long,. but (taking away the comments) I think it has a length
comparable to a shell script (also using awk).... and it is faster IMHO.
Bye.
- 02-10-2009 #5
thanks!
I'm sorry if I was rude asking for someone to do my work for me. This whole time I have been reading tuts and trying to figure it out myself. I'm just too new to shell scripts to figure it out in time to have this ready. You wont see me in here asking a question in this way again I just got cought in a pinch. I'll continue my reading and learn how to do this myself for the future. Thanks so much for the python script I'm going to go try it now.


Reply With Quote
