Find the answer to your Linux question:
Results 1 to 9 of 9
Hello all, I read somewher that regular expressions work with ASCII table so when i type Code: grep "[a-z][a-z]*" file_name it uses values from ACII dec97(a) to dec122(z), right ? ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2009
    Posts
    70

    regular expressions and foregin language


    Hello all,
    I read somewher that regular expressions work with ASCII table so when i type
    Code:
    grep "[a-z][a-z]*" file_name
    it uses values from ACII dec97(a) to dec122(z), right ?
    But if I have file containing diacritics, lets say (ordinary Slovak language characters):
    Code:
    marek@cepi:~$ cat diakritika 
    ťľľščťž
    ŤĽĽŠČŤŽ
    
    marek@cepi:~$ grep -o "[a-z][a-z]*" diakritika 
    ťľľščť
    
    
    
    Why this regexp know diacritics? And why know only lower case and not "ž" ??? This is strange for me. Friend told me it could be something with $LANG. So my $LANG is:
    Code:
    marek@cepi:~$ echo $LANG
    en_US.UTF-8
    Also I would ask if I want uppercase file with diacritic i type:
    Code:
    marek@cepi:~$ cat diakritika | tr "[:lower:]" "[:upper:]"
    ťľľščťž
    ŤĽĽŠČŤŽ
    why it not change lower to upper ?
    Thanks a lot for reply
    PS: I hope that characters display properly

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,601
    Quote Originally Posted by wakatana View Post
    Hello all,
    I read somewher that regular expressions work with ASCII table so when i type
    Code:
    grep "[a-z][a-z]*" file_name
    it uses values from ACII dec97(a) to dec122(z), right ?
    But if I have file containing diacritics, lets say (ordinary Slovak language characters):
    Code:
    marek@cepi:~$ cat diakritika 
    ťľľčť
    ŤĽĽČŤ
    
    marek@cepi:~$ grep -o "[a-z][a-z]*" diakritika 
    ťľľčť
    
    
    
    Why this regexp know diacritics? And why know only lower case and not "" ??? This is strange for me. Friend told me it could be something with $LANG. So my $LANG is:
    Code:
    marek@cepi:~$ echo $LANG
    en_US.UTF-8
    Also I would ask if I want uppercase file with diacritic i type:
    Code:
    marek@cepi:~$ cat diakritika | tr "[:lower:]" "[:upper:]"
    ťľľčť
    ŤĽĽČŤ
    why it not change lower to upper ?
    Thanks a lot for reply
    PS: I hope that characters display properly
    You probably need to change your LOCALE which I think you can do by setting the LANG environment variable as you may have surmised. Upper/Lower case issues are language specific so you need to change the locale to the language you are interested in. To see what languages are compiled into your system execute the command "locale -a". To see what your current locale information is just execute the command "locale".
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Jul 2009
    Posts
    70
    hi. probably I know why is "ž" different.... it is "behind" "z" and so it did not match to regexp [a-z]

    I issued command locale -a
    Code:
    C
    en_AU.utf8
    en_BW.utf8
    en_CA.utf8
    en_DK.utf8
    en_GB.utf8
    en_HK.utf8
    en_IE.utf8
    en_IN
    en_NG
    en_NZ.utf8
    en_PH.utf8
    en_SG.utf8
    en_US.utf8
    en_ZA.utf8
    en_ZW.utf8
    POSIX
    and locale
    Code:
    LANG=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
    I saw that I have not Slovak language in locale -a. Could that be reason why I have strange ID3s characters in slovak music in my Rhytmbox on xubuntu ? And how can I fix this problem and how can I install slovak language ? Thank you so mouch

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,601
    I'm not sure which locale code is for Slovakian. I have a lot more locales installed on my system. What distribution+version of Linux are you running? In Red Hat systems, I think the extra locales are installed with the mono-locale-extras package, but I'm not certain about that. The only locales I am personally interested in are English and Spanish for the most part.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Just Joined!
    Join Date
    Jul 2009
    Posts
    70
    Hi I am runung xubuntu

    cat /boot/grub/menu.lst | grep title
    title Ubuntu 9.04, kernel 2.6.28-15-generic

    cat /proc/version
    Linux version 2.6.28-14-generic (buildd@palmer) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #47-Ubuntu SMP Sat Jul 25 00:28:35 UTC 2009


    Could be reason of missing locale that some programs that display slovak language have problems with characters ?

  6. #6
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,601
    Quote Originally Posted by wakatana View Post
    Hi I am runung xubuntu

    cat /boot/grub/menu.lst | grep title
    title Ubuntu 9.04, kernel 2.6.28-15-generic

    cat /proc/version
    Linux version 2.6.28-14-generic (buildd@palmer) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #47-Ubuntu SMP Sat Jul 25 00:28:35 UTC 2009


    Could be reason of missing locale that some programs that display slovak language have problems with characters ?
    That would be a reasonable assumption. You need to get the Slovak locale database installed and then see what happens. Some tools are ASCII only, but most these days are locale-aware.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  7. #7
    Just Joined!
    Join Date
    Jul 2009
    Posts
    70
    Thanks , can you please explain how to instale locale database to xubuntu ? on internet i found just something about oracle

  8. #8
    Just Joined!
    Join Date
    Jul 2009
    Posts
    70
    nobody ? Someone please post tutorial or something, the more I read the less I know, and my knowledge about languages in linux is very poor

  9. #9
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,601
    You should be able to get the locale repository from the Ubuntu web site. Then, you may need to compile the ones you want - the locale tool will do that for you once you have the repository.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •