I noticed a strange behaviour on Red Hat Ent 4.7
Can someone help me to understand?
If I have a string containing a non standard ascii character (let's say decimal 212, hex 4d), I expect that substring extraction, through ${stringosition:length}, should work, provided that the current encoding setting includes that character.
So, utf-8 will be misleaded, because it will expect a multibyte valid sequence after x'4d', but both 850 and iso-8859-1 should work, because they include '4d', even if with different appearance (accented E, and circumflex accented O).
Yet, there are strange inconsistencies, showed in the attached script.
At the end of the script I reported as a comment the output that I get.
Max
-------------------------------------
# script to show encoding/substrict extraction problems

echo lang is initially $LANG

# create a file containing character 212
echo dummy | awk '{print "ABC" "\xd4" "DEF";}' > pippo

x="$(cat pippo)"

echo ====
echo lang is $LANG
echo x is:
echo "$x" | od -tx1z
echo substring of x is:
echo "${x:0:7}" | od -tx1z


LANG=en_US.850

echo ====
echo lang is $LANG
echo x is:
echo "$x" | od -tx1z
echo substring of x is:
echo "${x:0:7}" | od -tx1z


LANG=en_US.iso-8859-1

echo ====
echo lang is $LANG
echo x is:
echo "$x" | od -tx1z
echo substring of x is:
echo "${x:0:7}" | od -tx1z


LANG=en_US.850

echo ====
echo lang is $LANG
echo x is:
echo "$x" | od -tx1z
echo substring of x is:
echo "${x:0:7}" | od -tx1z

LANG=en_US.utf-8

echo ====
echo lang is $LANG
echo x is:
echo "$x" | od -tx1z
echo substring of x is:
echo "${x:0:7}" | od -tx1z

# this is the output that i get
# lang is initially en_US.utf-8
# ====
# lang is en_US.utf-8
# x is:
# 0000000 41 42 43 d4 44 45 46 0a >ABC.DEF.<
# 0000010
# substring of x is:
# 0000000 41 42 43 0a >ABC.<
# 0000004
# ====
# lang is en_US.850
# x is:
# 0000000 41 42 43 d4 44 45 46 0a >ABC.DEF.<
# 0000010
# substring of x is:
# 0000000 41 42 43 0a >ABC.<
# 0000004
# ====
# lang is en_US.iso-8859-1
# x is:
# 0000000 41 42 43 d4 44 45 46 0a >ABCoDEF.<
# 0000010
# substring of x is:
# 0000000 41 42 43 d4 44 45 46 0a >ABCoDEF.<
# 0000010
# ====
# lang is en_US.850
# x is:
# 0000000 41 42 43 d4 44 45 46 0a >ABC.DEF.<
# 0000010
# substring of x is:
# 0000000 41 42 43 d4 44 45 46 0a >ABC.DEF.<
# 0000010
# ====
# lang is en_US.utf-8
# x is:
# 0000000 41 42 43 d4 44 45 46 0a >ABC.DEF.<
# 0000010
# substring of x is:
# 0000000 41 42 43 0a >ABC.<
# 0000004