Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 12
So this is really bugging me. Why is [a-z] not case sensitive, but [A-Z] is? For example: # ls -l total 0 -rw-r--r-- 1 root root 0 Nov 20 12:22 ...
  1. #1
    Just Joined!
    Join Date
    Nov 2008
    Posts
    5

    Weirdness with BASH wildcards

    So this is really bugging me. Why is [a-z] not case sensitive, but [A-Z] is? For example:

    # ls -l
    total 0
    -rw-r--r-- 1 root root 0 Nov 20 12:22 xa
    -rw-r--r-- 1 root root 0 Nov 20 12:22 xA

    # ls -l x[a-z]
    -rw-r--r-- 1 root root 0 Nov 20 12:22 xa
    -rw-r--r-- 1 root root 0 Nov 20 12:22 xA

    # ls -l x[A-Z]
    -rw-r--r-- 1 root root 0 Nov 20 12:22 xA

    Any ideas?

    grendelos

  2. #2
    Just Joined!
    Join Date
    Nov 2008
    Posts
    26
    Interesting. And of course, `man bash` makes no mention of this.

    I have always disliked bash passionately since it seems to be a cheap knockoff of a solution that's already free (ksh). Why do we need a cheap knockoff for a free entity? Who cares that we don't own the source, if we're not planning on changing it?!?!!

    If you install pdksh, and switch to that as your shell, you will find that it operates more like you'd expect. (I tried it. Your first example yields two lines. Your 2nd/3rd examples yield one line each.)

  3. #3
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    Your system works differently from mine, so I'm playing blind here. So in that same directory in which you've been working, do this:
    Code:
    touch xa; touch xb; touch xc
    touch xA; touch xB; touch xC
    echo $LANG
    ls x*
    What output do you get?
    --
    Bill

    Old age and treachery will overcome youth and skill.

  4. #4
    Just Joined!
    Join Date
    Nov 2008
    Posts
    5
    Here is what I get.

    # touch xa; touch xb; touch xc

    # touch xA; touch xB; touch xC

    # echo $LANG
    en_US.UTF-8

    # ls x*
    xa xA xb xB xc xC

    grendelos

  5. #5
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    The issue of interest here is not case sensitivity, but the order in which letters are ordered. In your case, the order is:
    Code:
    aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
    So the range [a-z] will include these in red:
    Code:
    aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
    and the range [A-Z] will include these in red:
    Code:
    aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
    This excludes the lower case a, which explains why it was excluded in your original [A-Z] experiment.

    Now try this:
    Code:
    export LANG=en_US
    ls x*
    You should get
    Code:
    xA xB xC xa xb xc
    because the collating sequence is
    Code:
    ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
    and the range [A-Z] will get you
    Code:
    ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
    and the range [a-z] will get you
    Code:
    ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
    Then if you try this:
    Code:
    ls x[a-z]
    ls x[A-Z]
    you should get what looks like case sensitivity in both cases.

    Hope this helps.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  6. #6
    Just Joined!
    Join Date
    Nov 2008
    Posts
    5
    Thanks. That is a very good explanation. I have also stumbled across this suggestion which works as well:

    export LC_ALL=C

    But I am not certain what this variable represents exactly. Are LC_ALL=C and LANG=en_US related somehow or another?

    grendelos

  7. #7
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    The problem here is the unexpected "collation order". The bash manpage does actually mention the "LC_COLLATE" variable, but if you do not know what it does beforehand, you will probably not understand the explanation. See "man 7 glob" for some more information on collation order.

    You can set the collation order to "C" in order to get the old-style ascii order. If you want to refer to upper case letter, lower case letters or all letters, you should probably use named character classes, such as "[:lower:]" anyway.

  8. #8
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    Quote Originally Posted by delovelady View Post
    Interesting. And of course, `man bash` makes no mention of this.

    I have always disliked bash passionately since it seems to be a cheap knockoff of a solution that's already free (ksh). Why do we need a cheap knockoff for a free entity? Who cares that we don't own the source, if we're not planning on changing it?!?!!

    If you install pdksh, and switch to that as your shell, you will find that it operates more like you'd expect. (I tried it. Your first example yields two lines. Your 2nd/3rd examples yield one line each.)
    Clearly, you do not know what you are talking about.

    First of all, bash is not a "knockoff" of ksh. Both the name of the "Bourne-Again SHell" and the description section of the man page should have told you that. Secondly, although the man page does not explain the effect in sufficient detail, it is clearly not a bug, but an internationalization feature.

  9. #9
    Linux Engineer wje_lf's Avatar
    Join Date
    Sep 2007
    Location
    Mariposa
    Posts
    1,192
    Quoth the highly esteemed delovelady:
    I have always disliked bash passionately since it seems to be a cheap knockoff
    ... and the equally highly esteemed burschik:
    Clearly, you do not know what you are talking about.
    Um, folks, let's push beyond the pleasantries and get to the heart of the matter, mmmkay?

    First, I must apologize for not running the following tests with pdksh. It's not installed on my system, and I was too lazy to install it just for the purpose of these tests.

    Instead, I use the standard AT&T version of ksh. Like pdksh, it's free (as in "beer" and "speech"). Like pdksh, the source is freely available and compilable.

    Moving right along:

    There are several potential environment variables involved here. Each of them can have the same possible values, with the same meanings of those values. Here they are:
    Code:
    LANG
    LC_ALL
    LC_COLLATE
    LC_CTYPE
    LC_MESSAGES
    LC_MONETARY
    LC_NUMERIC
    LC_TIME
    These environment variables address internationalization issues and character set issues.

    The simplest value of each of these variables is C, or POSIX; both values mean the same thing. It basically says that no internationalization customization is to be done, and the collating sequence is that of the standard ASCII character set. To see the order of characters in the standard ASCII character set, do this at the command line:
    Code:
    man ascii
    If your system does not have this man page, you can scroogle it as follows:
    Code:
    Linux man ascii
    Note that all the upper case letters appear before any of the lower case letters.

    Other interesting values are en_US (which means English, as spoken in the US), and en_US.UTF-8. The UTF-8 specifies the collating sequence of the characters, and indicates that the letters are to be ordered thus:
    Code:
    aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
    (Look familiar?)

    What can we use instead of en_US? That depends on your system. I'm using Slackware 12.1. The following files are in directory /usr/share/i18n/locales:
    Code:
    aa_DJ        es_MX               nl_BE
    aa_ER        es_NI               nl_BE@euro
    aa_ER@saaho  es_PA               nl_NL
    aa_ET        es_PE               nl_NL@euro
    af_ZA        es_PR               nn_NO
    am_ET        es_PY               nr_ZA
    an_ES        es_SV               nso_ZA
    ar_AE        es_US               oc_FR
    ar_BH        es_UY               om_ET
    ar_DZ        es_VE               om_KE
    ar_EG        et_EE               or_IN
    ar_IN        eu_ES               pa_IN
    ar_IQ        eu_ES@euro          pap_AN
    ar_JO        fa_IR               pa_PK
    ar_KW        fi_FI               pl_PL
    ar_LB        fi_FI@euro          POSIX
    ar_LY        fil_PH              pt_BR
    ar_MA        fo_FO               pt_PT
    ar_OM        fr_BE               pt_PT@euro
    ar_QA        fr_BE@euro          ro_RO
    ar_SA        fr_CA               ru_RU
    ar_SD        fr_CH               ru_UA
    ar_SY        fr_FR               rw_RW
    ar_TN        fr_FR@euro          sa_IN
    ar_YE        fr_LU               sc_IT
    as_IN        fr_LU@euro          se_NO
    ast_ES       fur_IT              sid_ET
    az_AZ        fy_DE               si_LK
    be_BY        fy_NL               sk_SK
    be_BY@latin  ga_IE               sl_SI
    ber_DZ       ga_IE@euro          so_DJ
    ber_MA       gd_GB               so_ET
    bg_BG        gez_ER              so_KE
    bn_BD        gez_ER@abegede      so_SO
    bn_IN        gez_ET              sq_AL
    br_FR        gez_ET@abegede      sr_ME
    br_FR@euro   gl_ES               sr_RS
    bs_BA        gl_ES@euro          sr_RS@latin
    byn_ER       gu_IN               ss_ZA
    ca_AD        gv_GB               st_ZA
    ca_ES        ha_NG               sv_FI
    ca_ES@euro   he_IL               sv_FI@euro
    ca_FR        hi_IN               sv_SE
    ca_IT        hr_HR               ta_IN
    crh_UA       hsb_DE              te_IN
    csb_PL       hu_HU               tg_TJ
    cs_CZ        hy_AM               th_TH
    cy_GB        i18n                ti_ER
    da_DK        id_ID               ti_ET
    de_AT        ig_NG               tig_ER
    de_AT@euro   ik_CA               tk_TM
    de_BE        is_IS               tl_PH
    de_BE@euro   iso14651_t1         tn_ZA
    de_CH        iso14651_t1_common  translit_circle
    de_DE        iso14651_t1_pinyin  translit_cjk_compat
    de_DE@euro   it_CH               translit_cjk_variants
    de_LU        it_IT               translit_combining
    de_LU@euro   it_IT@euro          translit_compat
    dz_BT        iu_CA               translit_font
    el_CY        iw_IL               translit_fraction
    el_GR        ja_JP               translit_hangul
    el_GR@euro   ka_GE               translit_narrow
    en_AU        kk_KZ               translit_neutral
    en_BW        kl_GL               translit_small
    en_CA        km_KH               translit_wide
    en_DK        kn_IN               tr_CY
    en_GB        ko_KR               tr_TR
    en_HK        ku_TR               ts_ZA
    en_IE        kw_GB               tt_RU
    en_IE@euro   ky_KG               tt_RU@iqtelif
    en_IN        lg_UG               ug_CN
    en_NG        li_BE               uk_UA
    en_NZ        li_NL               ur_PK
    en_PH        lo_LA               uz_UZ
    en_SG        lt_LT               uz_UZ@cyrillic
    en_US        lv_LV               ve_ZA
    en_ZA        mai_IN              vi_VN
    en_ZW        mg_MG               wa_BE
    es_AR        mi_NZ               wa_BE@euro
    es_BO        mk_MK               wal_ET
    es_CL        ml_IN               wo_SN
    es_CO        mn_MN               xh_ZA
    es_CR        mr_IN               yi_US
    es_DO        ms_MY               yo_NG
    es_EC        mt_MT               zh_CN
    es_ES        nb_NO               zh_HK
    es_ES@euro   nds_DE              zh_SG
    es_GT        nds_NL              zh_TW
    es_HN        ne_NP               zu_ZA
    What can we use instead of UTF-8? Well, we can leave it (and the preceding period) out, or again use whatever's installed on the system. For Slackware 12.1, I see the following files in /usr/share/i18n/charmaps:
    Code:
    ANSI_X3.110-1983.gz    IBM869.gz
    ANSI_X3.4-1968.gz      IBM870.gz
    ARMSCII-8.gz           IBM871.gz
    ASMO_449.gz            IBM874.gz
    BIG5.gz                IBM875.gz
    BIG5-HKSCS.gz          IBM880.gz
    BRF.gz                 IBM891.gz
    BS_4730.gz             IBM903.gz
    BS_VIEWDATA.gz         IBM904.gz
    CP10007.gz             IBM905.gz
    CP1125.gz              IBM918.gz
    CP1250.gz              IBM922.gz
    CP1251.gz              IEC_P27-1.gz
    CP1252.gz              INIS-8.gz
    CP1253.gz              INIS-CYRILLIC.gz
    CP1254.gz              INIS.gz
    CP1255.gz              INVARIANT.gz
    CP1256.gz              ISIRI-3342.gz
    CP1257.gz              ISO_10367-BOX.gz
    CP1258.gz              ISO_10646.gz
    CP737.gz               ISO_11548-1.gz
    CP775.gz               ISO_2033-1983.gz
    CP949.gz               ISO_5427-EXT.gz
    CSA_Z243.4-1985-1.gz   ISO_5427.gz
    CSA_Z243.4-1985-2.gz   ISO_5428.gz
    CSA_Z243.4-1985-GR.gz  ISO_646.BASIC.gz
    CSN_369103.gz          ISO_646.IRV.gz
    CWI.gz                 ISO_6937-2-25.gz
    DEC-MCS.gz             ISO_6937-2-ADD.gz
    DIN_66003.gz           ISO_6937.gz
    DS_2089.gz             ISO-8859-10.gz
    EBCDIC-AT-DE-A.gz      ISO-8859-11.gz
    EBCDIC-AT-DE.gz        ISO-8859-13.gz
    EBCDIC-CA-FR.gz        ISO-8859-14.gz
    EBCDIC-DK-NO-A.gz      ISO-8859-15.gz
    EBCDIC-DK-NO.gz        ISO-8859-16.gz
    EBCDIC-ES-A.gz         ISO_8859-1,GL.gz
    EBCDIC-ES.gz           ISO-8859-1.gz
    EBCDIC-ES-S.gz         ISO-8859-2.gz
    EBCDIC-FI-SE-A.gz      ISO-8859-3.gz
    EBCDIC-FI-SE.gz        ISO-8859-4.gz
    EBCDIC-FR.gz           ISO-8859-5.gz
    EBCDIC-IS-FRISS.gz     ISO-8859-6.gz
    EBCDIC-IT.gz           ISO-8859-7.gz
    EBCDIC-PT.gz           ISO-8859-8.gz
    EBCDIC-UK.gz           ISO-8859-9E.gz
    EBCDIC-US.gz           ISO-8859-9.gz
    ECMA-CYRILLIC.gz       ISO_8859-SUPP.gz
    ES2.gz                 ISO-IR-197.gz
    ES.gz                  ISO-IR-209.gz
    EUC-JISX0213.gz        ISO-IR-90.gz
    EUC-JP.gz              IT.gz
    EUC-JP-MS.gz           JIS_C6220-1969-JP.gz
    EUC-KR.gz              JIS_C6220-1969-RO.gz
    EUC-TW.gz              JIS_C6229-1984-A.gz
    GB18030.gz             JIS_C6229-1984-B-ADD.gz
    GB_1988-80.gz          JIS_C6229-1984-B.gz
    GB2312.gz              JIS_C6229-1984-HAND-ADD.gz
    GBK.gz                 JIS_C6229-1984-HAND.gz
    GEORGIAN-ACADEMY.gz    JIS_C6229-1984-KANA.gz
    GEORGIAN-PS.gz         JIS_X0201.gz
    GOST_19768-74.gz       JOHAB.gz
    GREEK7.gz              JUS_I.B1.002.gz
    GREEK7-OLD.gz          JUS_I.B1.003-MAC.gz
    GREEK-CCITT.gz         JUS_I.B1.003-SERB.gz
    HP-ROMAN8.gz           KOI-8.gz
    IBM037.gz              KOI8-R.gz
    IBM038.gz              KOI8-RU.gz
    IBM1004.gz             KOI8-T.gz
    IBM1026.gz             KOI8-U.gz
    IBM1047.gz             KSC5636.gz
    IBM1124.gz             LATIN-GREEK-1.gz
    IBM1129.gz             LATIN-GREEK.gz
    IBM1132.gz             MAC-CENTRALEUROPE.gz
    IBM1133.gz             MAC-CYRILLIC.gz
    IBM1160.gz             MACINTOSH.gz
    IBM1161.gz             MAC-IS.gz
    IBM1162.gz             MAC-SAMI.gz
    IBM1163.gz             MAC-UK.gz
    IBM1164.gz             MIK.gz
    IBM256.gz              MSZ_7795.3.gz
    IBM273.gz              NATS-DANO-ADD.gz
    IBM274.gz              NATS-DANO.gz
    IBM275.gz              NATS-SEFI-ADD.gz
    IBM277.gz              NATS-SEFI.gz
    IBM278.gz              NC_NC00-10.gz
    IBM280.gz              NEXTSTEP.gz
    IBM281.gz              NF_Z_62-010_1973.gz
    IBM284.gz              NF_Z_62-010.gz
    IBM285.gz              NS_4551-1.gz
    IBM290.gz              NS_4551-2.gz
    IBM297.gz              PT154.gz
    IBM420.gz              PT2.gz
    IBM423.gz              PT.gz
    IBM424.gz              RK1048.gz
    IBM437.gz              SAMI.gz
    IBM500.gz              SAMI-WS2.gz
    IBM850.gz              SEN_850200_B.gz
    IBM851.gz              SEN_850200_C.gz
    IBM852.gz              SHIFT_JIS.gz
    IBM855.gz              SHIFT_JISX0213.gz
    IBM856.gz              T.101-G2.gz
    IBM857.gz              T.61-7BIT.gz
    IBM860.gz              T.61-8BIT.gz
    IBM861.gz              TCVN5712-1.gz
    IBM862.gz              TIS-620.gz
    IBM863.gz              TSCII.gz
    IBM864.gz              UTF-8.gz
    IBM865.gz              VIDEOTEX-SUPPL.gz
    IBM866.gz              VISCII.gz
    IBM866NAV.gz           WINDOWS-31J.gz
    IBM868.gz
    What do the various environment variables do with these values? This post is already getting too long, so go here for the answer.

    Now let's explore the question of whether these environment variables are used by ls, or by the shell. The short answer is: both!

    If you want to see ls sort the filenames, do this at the command line:
    Code:
    ls
    Note that with this command, no shell globbing is done, because you haven't used anything like * or ? or x[A-Z].

    If you want the shell to sort the filenames, do this at the command line:
    Code:
    echo *
    Moving right along, I wrote a shell script which demonstrates that the behavior is the same, whether the filenames are sorted by ls, bash, or ksh. Here's the script:
    Code:
    #!/bin/bash
    
    #-----------------------------------------------------------------------------
    
    try_both()
    {
      cat > collation_script
      echo === we are about to run this script:
      cat collation_script
      echo === here is the output for bash:
      bash < collation_script 2>&1
      echo === here is the output for ksh:
      ksh < collation_script 2>&1
    }
    
    #-----------------------------------------------------------------------------
    
    rm -rf collation_experiment
    
    umask 077
    
    mkdir collation_experiment
    touch collation_experiment/a
    touch collation_experiment/b
    touch collation_experiment/c
    touch collation_experiment/A
    touch collation_experiment/B
    touch collation_experiment/C
    
    unset LANG
    unset LC_ALL
    unset LC_COLLATE
    unset LC_CTYPE
    unset LC_MESSAGES
    unset LC_MONETARY
    unset LC_NUMERIC
    unset LC_TIME
    unset LC_NLSPATH
    
    echo +++ Demonstrate that we really are executing each shell.
    
    try_both <<EOD
    help
    EOD
    
    echo +++ Use no environment variables.
    
    try_both <<EOD
    cd collation_experiment
    echo /// ls handles the sorting:
    ls
    echo /// The shell handles the sorting:
    echo *
    EOD
    
    echo +++ Use LANG=C\; LANG=POSIX would do the same thing.
    echo +++ We\'ll see the same output as above.
    
    export LANG=C
    
    try_both <<EOD
    cd collation_experiment
    echo /// ls handles the sorting:
    ls
    echo /// The shell handles the sorting:
    echo *
    EOD
    
    echo +++ Use LANG=en_US.UTF-8.
    
    export LANG=en_US.UTF-8
    
    try_both <<EOD
    cd collation_experiment
    echo /// ls handles the sorting:
    ls
    echo /// The shell handles the sorting:
    echo *
    EOD
    I got the following output from this script:
    Code:
    +++ Demonstrate that we really are executing each shell.
    === we are about to run this script:
    help
    === here is the output for bash:
    GNU bash, version 3.1.17(2)-release (i486-slackware-linux-gnu)
    These shell commands are defined internally.  Type `help' to see this list.
    Type `help name' to find out more about the function `name'.
    Use `info bash' to find out more about the shell in general.
    Use `man -k' or `info' to find out more about commands not in this list.
    
    A star (*) next to a name means that the command is disabled.
    
     JOB_SPEC [&]                       (( expression ))
     . filename [arguments]             :
     [ arg... ]                         [[ expression ]]
     alias [-p] [name[=value] ... ]     bg [job_spec ...]
     bind [-lpvsPVS] [-m keymap] [-f fi break [n]
     builtin [shell-builtin [arg ...]]  caller [EXPR]
     case WORD in [PATTERN [| PATTERN]. cd [-L|-P] [dir]
     command [-pVv] command [arg ...]   compgen [-abcdefgjksuv] [-o option
     complete [-abcdefgjksuv] [-pr] [-o continue [n]
     declare [-afFirtx] [-p] [name[=val dirs [-clpv] [+N] [-N]
     disown [-h] [-ar] [jobspec ...]    echo [-neE] [arg ...]
     enable [-pnds] [-a] [-f filename]  eval [arg ...]
     exec [-cl] [-a name] file [redirec exit [n]
     export [-nf] [name[=value] ...] or false
     fc [-e ename] [-nlr] [first] [last fg [job_spec]
     for NAME [in WORDS ... ;] do COMMA for (( exp1; exp2; exp3 )); do COM
     function NAME { COMMANDS ; } or NA getopts optstring name [arg]
     hash [-lr] [-p pathname] [-dt] [na help [-s] [pattern ...]
     history [-c] [-d offset] [n] or hi if COMMANDS; then COMMANDS; [ elif
     jobs [-lnprs] [jobspec ...] or job kill [-s sigspec | -n signum | -si
     let arg [arg ...]                  local name[=value] ...
     logout                             popd [+N | -N] [-n]
     printf [-v var] format [arguments] pushd [dir | +N | -N] [-n]
     pwd [-LP]                          read [-ers] [-u fd] [-t timeout] [
     readonly [-af] [name[=value] ...]  return [n]
     select NAME [in WORDS ... ;] do CO set [--abefhkmnptuvxBCHP] [-o opti
     shift [n]                          shopt [-pqsu] [-o long-option] opt
     source filename [arguments]        suspend [-f]
     test [expr]                        time [-p] PIPELINE
     times                              trap [-lp] [arg signal_spec ...]
     true                               type [-afptP] name [name ...]
     typeset [-afFirtx] [-p] name[=valu ulimit [-SHacdfilmnpqstuvx] [limit
     umask [-p] [-S] [mode]             unalias [-a] name [name ...]
     unset [-f] [-v] [name ...]         until COMMANDS; do COMMANDS; done
     variables - Some variable names an wait [n]
     while COMMANDS; do COMMANDS; done  { COMMANDS ; }
    === here is the output for ksh:
    ksh: line 1: help: not found
    +++ Use no environment variables.
    === we are about to run this script:
    cd collation_experiment
    echo /// ls handles the sorting:
    ls
    echo /// The shell handles the sorting:
    echo *
    === here is the output for bash:
    /// ls handles the sorting:
    A  B  C  a  b  c
    /// The shell handles the sorting:
    A B C a b c
    === here is the output for ksh:
    /// ls handles the sorting:
    A  B  C  a  b  c
    /// The shell handles the sorting:
    A B C a b c
    +++ Use LANG=C; LANG=POSIX would do the same thing.
    +++ We'll see the same output as above.
    === we are about to run this script:
    cd collation_experiment
    echo /// ls handles the sorting:
    ls
    echo /// The shell handles the sorting:
    echo *
    === here is the output for bash:
    /// ls handles the sorting:
    A  B  C  a  b  c
    /// The shell handles the sorting:
    A B C a b c
    === here is the output for ksh:
    /// ls handles the sorting:
    A  B  C  a  b  c
    /// The shell handles the sorting:
    A B C a b c
    +++ Use LANG=en_US.UTF-8.
    === we are about to run this script:
    cd collation_experiment
    echo /// ls handles the sorting:
    ls
    echo /// The shell handles the sorting:
    echo *
    === here is the output for bash:
    /// ls handles the sorting:
    a  A  b  B  c  C
    /// The shell handles the sorting:
    a A b B c C
    === here is the output for ksh:
    /// ls handles the sorting:
    a  A  b  B  c  C
    /// The shell handles the sorting:
    a A b B c C
    Three loose ends.

    Loose end one.

    You'll recall that for Slackware 12.1 I mentioned these directories:
    Code:
    /usr/share/i18n/locales
    /usr/share/i18n/charmaps
    You may well ask: what's the 18 doing in there? It's an abbreviation for 18 characters which have been left out. Here's the count:
    Code:
    internationalization
     000000000111111111
     123456789012345678
    internationalisation
    Loose end two.

    The danger with all of this is that if filenames are globbed differently from system to system, you can accidentally delete files you didn't intend to. My wife's system uses en_US.UTF-8; mine uses en_US. I wanted to delete all files in a particular directory which began with upper case letters, so I did this:
    Code:
    rm [A-Z]*
    That would have been fine on my system, but on hers, it also deleted all files which began with lower case letters, except the letter a.

    Sigh.

    Third loose end.

    delovelady, I would request that you run this same script using pdksh instead of the standard ksh, and report back whether you observe the same behavior, or different behavior.

    If the behavior is different, I would call that a bug in pdksh. But I'm guessing that the behavior is the same.
    --
    Bill

    Old age and treachery will overcome youth and skill.

  10. #10
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    Thanks for the lucid exposition.

    Quote Originally Posted by wje_lf View Post
    The danger with all of this is that if filenames are globbed differently from system to system, you can accidentally delete files you didn't intend to.
    This is clearly bad, but what are the alternatives? The old ascii behaviour is only suitable for English, and possibly other languages that use exactly the same alphabet (offhand, I can't think of any). If someone speaking German or French, for example, used a glob like "[a-z]" he or she would naturally expect accented or umlaut vowels to be included in that range. On the other hand, someone speaking English would probably be surprised to see these characters included in the range.

    Similarly, people not interested in the (ancient) history of computing might be surprised by a collation order that sorts "a" before "Z", since that is not what dictionaries and similar works have been doing for centuries.

    However, the situation would be (slightly) improved if legacy character sets went the way of the dodo. And in the context of Linux, everything except UTF-8 should be considered a legacy character set (except maybe for East Asian languages, I am a bit hazy on that point).

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...