Find the answer to your Linux question:
Results 1 to 8 of 8
I am trying to get my google mini to index my redhat box. It needs to follow links to index a folder that is jsut full of pdf files. I ...
  1. #1
    Just Joined!
    Join Date
    Jul 2009
    Posts
    3

    [SOLVED] Dealing with spaces in awk print

    I am trying to get my google mini to index my redhat box. It needs to follow links to index a folder that is jsut full of pdf files.

    I used:

    Code:
    ls | awk {'print "<a href=\"mydocs/"$1"\">"$1"</a>"'} >> jump.html
    to create a page full of links that the mini can find and follow.

    80% of my filenames do not have spaces in them, but the ones that do have spaces do not show up in the jump.html correctly. All I get is the characters before the space.

    audit_report_399.pdf shows up like:

    Code:
    <a href="/mydocs/audit_report_399.pdf">audit_report_399.pdf</a>
    but audit report 400.pdf shows up like:

    Code:
    <a href="/mydocs/audit'>audit</a>
    anyway to change the awk {print} command I am using to use &20 for the spaces? or another solution?

    Greg

  2. #2
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    452
    welcome to the forum

    one thing would be, to run the "ls" output through "sed"
    replacing all blanks with...something else: like this--

    ls | sed 's/ /_/g'

    using the underscore: now no filenames have any blanks,
    proceed as before
    the sun is new every day (heraclitus)

  3. #3
    Just Joined!
    Join Date
    Jul 2009
    Posts
    3
    I was think along those lines with Sed, but if I actually rename the files it will break the links to them in my CMS (joomla).

    If I can replace the spaces with %20, then the links from the CMS should still work...

    Hmm....

  4. #4
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    then try using $0 instead of $1

  5. #5
    Just Joined!
    Join Date
    Jul 2009
    Posts
    58
    If you're trying to build a searchable index than you can't change the name of the file or remove the spaces. Unless you do that on the file system as well.

    To keep the non-encoded characters you have to change them to their ord value

    so create a encode.sed file with:

    Code:
    s/%/%25/g
    s/ /%20/g
    s/ /%09/g
    s/!/%21/g
    s/"/%22/g
    s/#/%23/g
    s/\$/%24/g
    s/\&/%26/g
    s/'\''/%27/g
    s/(/%28/g
    s/)/%29/g
    s/\*/%2a/g
    s/+/%2b/g
    s/,/%2c/g
    s/-/%2d/g
    s/\./%2e/g
    s/\//%2f/g
    s/:/%3a/g
    s/;/%3b/g
    s//%3e/g
    s/?/%3f/g
    s/@/%40/g
    s/\[/%5b/g
    s/\\/%5c/g
    s/\]/%5d/g
    s/\^/%5e/g
    s/_/%5f/g
    s/`/%60/g
    s/{/%7b/g
    s/|/%7c/g
    s/}/%7d/g
    s/~/%7e/g
    Than you can do:

    Code:
    ENCODED=$(echo "${FOO}" | sed -f encode.sed)
    Now $ENCODED should be the encoded string that you can store and request from the webserver.

  6. #6
    Linux Newbie
    Join Date
    Mar 2009
    Posts
    228
    Quote Originally Posted by ghostdog74 View Post
    then try using $0 instead of $1
    Which might work if you have the ls output one file per line which can be done with the -1 option. Try this:

    Code:
    ls -1 | awk {'print "<a href=\"mydocs/"$0"\">"$0"</a>"'}

  7. #7
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    Quote Originally Posted by lomcevak View Post
    Which might work if you have the ls output one file per line which can be done with the -1 option. Try this:

    Code:
    ls -1 | awk {'print "<a href=\"mydocs/"$0"\">"$0"</a>"'}
    try this
    Code:
    ls | awk '{print $0}'

  8. #8
    Just Joined!
    Join Date
    Jul 2009
    Posts
    3
    Quote Originally Posted by ghostdog74 View Post
    then try using $0 instead of $1
    This was the solution.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...