Find the answer to your Linux question:
Results 1 to 8 of 8
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1

    xargs with command group?


    Hi,

    I can't manage doing the following:

    find . -exec ( head -n 10 {} | grep -q XXX && some other command )

    or

    find . | xargs -I {} ( head -n 10 {} | grep -q XXX & some other command )

    I'm not that good in bash programming, so there's something I'm missing here. How can one combine several commands through pipes and pass this group to -exec or xargs?

    I tried defining a function and calling this but also failed ..

    Thanks in advance

  2. #2
    Quote Originally Posted by gandalf1024 View Post
    Hi,

    I can't manage doing the following:

    find . -exec ( head -n 10 {} | grep -q XXX && some other command )

    or

    find . | xargs -I {} ( head -n 10 {} | grep -q XXX & some other command )

    I'm not that good in bash programming, so there's something I'm missing here. How can one combine several commands through pipes and pass this group to -exec or xargs?

    I tried defining a function and calling this but also failed ..
    Yeah - functions are sort of like shell built-in commands... But they're local to the shell session in which they're defined. A program running in the shell (xargs, in this case) can't access the shell in which it's running (at least, current shells define no such mechanism...) so xargs has no way of running a function you've defined in bash... Even if find or xargs started a new bash process to run its command, that new bash process wouldn't know about a function defined in its grandparent process, either...

    Also, note that the default behavior of xargs (and, according to the man page, of "find -exec") is to collect a number of arguments before running the command to reduce the number of individual calls to the command. (For instance, "find | xargs echo" may only output one line, even if "find" finds a hundred files...) This is an optimization for performance reasons - it can be disabled using the --max-args argument to xargs...

    I think you would have to create a shell script (on disk) and supply the name of that script to xargs... I thought you might be able to create a script inline:

    Code:
    find -print0 | xargs -n 1 -0 bash -c "cmd $1" --
    But that doesn't seem to work either...

    Another option is, instead of using xargs, use Bash's internal for loop:

    Code:
    for f in $(find); do cmd1 "$f"; cmd2 "$f"; cmd3 "$f" | cmd4 --arg "$f"; done
    The problem there, obviously, is that Bash will split the output of "find" not according to newlines but according to whitespace... So if any filenames include whitespace, you're hosed...

    I had thought you could get around that by setting $IFS, Bash's field separator variable:

    Code:
    IFS=$(echo -ne '\0')
    for f in $(find -print0); do cmd "$f"; done
    unset IFS
    But this doesn't seem to work... It seems the shell is treating $IFS as an empty string rather than one containing the null character. I'm not sure if there's a way around that. But I also was unable to do a similar command with IFS as the newline character - so maybe I've got something else going on, too...

  3. #3
    Hi and thanks for replying

    I finally got around this by calling a shell:

    find -type f -exec sh -c 'head -n 10 {} | grep -q XXXXXXXX' \; -print

    I think a for loop is a bad option. If I remember right, it has limitations on the number of files it will process, so it won't work for me. There are hundrends of thousands of files I have to process this way.

    Also I couln't call a script on disk for every file, for performance reasons, since as already said the number of files is very big.

  4. $spacer_open
    $spacer_close
  5. #4
    Quote Originally Posted by gandalf1024 View Post
    Hi and thanks for replying

    I finally got around this by calling a shell:

    find -type f -exec sh -c 'head -n 10 {} | grep -q XXXXXXXX' \; -print
    That's cool. I wasn't able to get this to work - this sort of approach, I mean, not this specific command... My trouble was that I was trying to pass the filenames to the inline shell script as command arguments - like "sh -c 'echo $1' $f" - which didn't work at first because I was putting double quotes around the 'echo $1', causing $1 to be interpolated before the subshell was run...

    BTW, I tested a similar script on my machine - if there's a possibility of whitespace in a filename, you need to have double quotes around the {}. Otherwise a filename "foo\nbar" would cause the shell to run:

    Code:
    head -n 10 foo
    bar | grep -q XXXXXXXX
    So the script should be like this:

    Code:
    find -type f -exec sh -c 'head -n 10 "{}" | grep -q XXXXXXXX' \; -print
    Or if you wanted to use argument passing, this should work, too:

    Code:
    find -type f -exec sh -c 'head -n 10 "$1" | grep -q XXXXXXXX' 'sh' '{}' \; -print
    I think a for loop is a bad option. If I remember right, it has limitations on the number of files it will process, so it won't work for me. There are hundrends of thousands of files I have to process this way.
    You may be right there - if you do a "for f in $(find)" then bash evaluates this by running $(find) first, getting all the output buffered up, and then substituting it into the command line to create one huge command... If "for" were an ordinary executable that would surely exceed the limit for the number of arguments. Since "for" is a builtin, bash might be able to handle it (it's not impossible, I wouldn't count on it either ) but even so it's still a very undesirable mode of operation... If nothing else because you don't get any results until the entire find operation is finished... exec-related issues with "find" are actually an example of one of my biggest complaints about the traditional Unix shell. Dealing with a stream of filenames is a brain-dead simple problem, conceptually speaking, but to actually get it right (given the possibility of whitespace in the filenames) you need at least to use -print0...

    I have actually been working on a design for a shell - one of my goals is to deal well with these kinds of issues. The problem of doing a loop over the output of "find" in general is something I've never been entirely happy with. A case like yours, where you want to actually perform a test on the state of the filename, is a part of the problem I'd never actually recognized before...

    Also I couln't call a script on disk for every file, for performance reasons, since as already said the number of files is very big.
    Yeah, I understand completely. Even if not for that performance issue I don't like the idea of having to write a script out to disk to do something like this anyway. But I was honestly stumped on how to come up with a good solution to this problem, so I was sort of going through all the options I could think of. I couldn't get that method to work, anyway - it was splitting on whitespace instead of newlines or null characters - and my efforts to change $IFS didn't solve the problem...

    I'm glad you got it solved - As a bonus, I've learned that you can use -exec and -print together in a find command to perform tests using a subshell. That's good to know.

  6. #5
    Another solution to this problem I picked up: Less robust than using "-exec", but more elegant IMO:

    Code:
    find | (while read x; do ...  ; done)
    This approach is probably the best balance between elegance and robustness I have seen for this sort of problem in the shell. The main limitation is you can't have a newline within a filename... But since "read" reads a line of data, other forms of whitespace and special characters in filenames are handled correctly - and it's a very straightforward way to solve the problem.

    Now if only "read" allowed you to specify the null byte as a delimiter, then you could use that with find's "-print0" and have a solution that's as robust as xargs or -exec...

  7. #6
    All I said about performance are just a thought. I'm not sure which approach is best till I experiment a lot. But I guess that for 2000000 files a, say, 2ms overhead is a big deal because this would mean one hour more for my mail server to complete this task. At night, of course, but it's an overhead I would like to avoid. Anyway, I'm thinking of a headgrep command made in C that will be a very primitive grep with a -q and a -n parameter, where -n would stand for number of lines as in head

    You're right about the spaces in filenames, I'm already aware of this possibility, I just didn't include the quotes because in this case I'm pretty sure the filenames will not have spaces in them, since they are email files with a very specific format, inside maildirs. But you're absolutely right. Better safe than sorry

    I have used the "while read" approach in the past a thousand times so I'm familiar with it. Strange I didn't think of this approach in the first place. Must be that I try to get very familiar with the "-exec" and " | xargs" approaches because I'm a little behind on this, so I'm glad that we discuss about all of this because it reminds me of things.

    The main limitation is you can't have a newline within a filename
    Maybe we could get over this limitation with a new IFS = '\0'? Or a newline is always a seperator for read?

    Have you read the Unix haters handbook?

  8. #7
    Quote Originally Posted by gandalf1024 View Post
    Maybe we could get over this limitation with a new IFS = '\0'? Or a newline is always a seperator for read?
    I tried that - I really couldn't set $IFS to a whitespace character and make it work, let alone the null character...

    What I did manage, however:

    Code:
    find -print0 | while read -d "$(echo -ne '\000')" x; do echo a $x; done
    This actually works. I'm pretty surprised by this, given how little luck I've had so far with this sort of thing...

    Have you read the Unix haters handbook?
    Yeah, some of it anyway. I should probably go over it again actually, it's been a while. My interest in the book was that I believe in the fundamental approach of Unix, but I want to understand what it got wrong and why. I think the book was somewhat helpful there, but I can't remember.

    The general spirit of Unix Haters seems to be that the system is rotten to the core. Personally I don't agree with this. I think some fairly fundamental philosophies in the system are wrong (primarily, the use of text as the "common data format" - it's simple until your field payloads start including your delimiter characters. That's the core problem of this whole thread...) but other bits, like the idea of the shell providing a collection of powerful, single-purpose, simple tools meant to be used together, are great ideas. I just think the means of using these tools together needs to be improved.

    It certainly doesn't help matters that the one character you would expect to be really useful as a delimiter, the null character, is only more-or-less available. POSIX, I believe, doesn't even require that tools handle the null character correctly when it's part of your data... You can pass it over a pipe, of course, but you can't pass it as part of a command argument or to another process as an environment variable, since both are implemented as C strings... If more tools supported the use of null as a delimiter in piped data, at least, I think a lot of the difficulties would go away...

  9. #8
    Quote Originally Posted by tetsujin View Post
    The general spirit of Unix Haters seems to be that the system is rotten to the core. Personally I don't agree with this. I think some fairly fundamental philosophies in the system are wrong (primarily, the use of text as the "common data format" - it's simple until your field payloads start including your delimiter characters. That's the core problem of this whole thread...) but other bits, like the idea of the shell providing a collection of powerful, single-purpose, simple tools meant to be used together, are great ideas. I just think the means of using these tools together needs to be improved.
    Well, I actually mentioned it as a pretty entertaining and amusing book I once stumbled upon and didn't have the chance to read all of it since then. I would like to some time, not because I'm a Unix hater of course, but because it helped me see things different. About ideas I had that everything in Unix is wisely implemented. I realized it was just my previous bad experience with Windows that made me think this way

    I have a print in my wall with the words of Ken Pier:
    I liken starting one's computing career with Unix, say as an undergraduate, to being born in East Africa. It is intolerably hot, your body is covered with lice and flies, you are malnourished and you suffer from numerous curable diseases. But, as far as young East Africans can tell, this is simply the natural condition and they live within it. By the time they find out differently, it is too late. They already think that the writing of shell scripts is a natural act.


    Anyway, Unix and Linux in particular are great. And the fact is that I love all the little tools that can be combined together to build something bigger very easily. It's just that some things need to be understood very well before using them in advanced (for my level of experience) ways. Otherwise they can prove very tricky and - sometimes - evil. But I guess it's the same with all things advanced ..

    I remember searching for IFS and had some trouble finding the right string to input for it to work properly. For non-alphanumeric characters, I mean. I will take a look and see, it's interesting to know.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •