JDK1.8源码-03-java.lang.String

2020-03-29

字数统计: 7.3k字 | 阅读时长≈ 33分钟

JDK1.8源码-03-java.lang.String

String 类也是java.lang 包下的一个类，算是日常编码中最常用的一个类了

1. 定义

1 2	public final class String implements java.io.Serializable, Comparable<String>, CharSequence {

和上一篇Integer类一样，这也是一个用final声明的常量类，不能被任何类所继承，而且一旦一个String对象被创建，包含在这个对象中的字符序列是不可以改变的，包括该类后续的所有方法都不能修改这个对象，直至对象被销毁，这是我们需要特别注意的（该类的一些方法看似改变了字符串，其实内部都是创建了一个新的字符串，下面讲解的时候会介绍）。

接着这个类实现了Serializable接口，这个是一个序列化标志接口，还实现了Comparable接口，用于比较两个字符串的大小（按照字符的ASCII码进行排列），后续还会介绍具体方法的实现。

最后实现了CharSequence接口，表示一个有序的字符集合，相应的后续也会介绍。

2. 字段属性

   /** The value is used for character storage. */
// 用来存储字符串的char数组
   private final char value[];

   /** Cache the hash code for the string */
// 用来缓存字符串的哈希码
   private int hash; // Default to 0

   /** use serialVersionUID from JDK 1.0.2 for interoperability */
// 实现序列化标识
   private static final long serialVersionUID = -6849794470754667710L;

一个String字符串实际上是一个char数组。

3. 构造方法

String类型的构造方法很多，可以通过初始化一个字符串，或者字符数组，或者字节数组等等来创建一个String对象。

mark

1
2
3

String str1 = "abc";
String str2 = new String("abc");
String str3 = new String(new char[]{'a','b','c'});

4. equals(Object anObject) 方法

String类重写了equals方法，比较的是组成字符串的每一个字符是否相同,

如果相同返回true,

不相同则返回false.

/**
 * Compares this string to the specified object.  The result is {@code
 * true} if and only if the argument is not {@code null} and is a {@code
 * String} object that represents the same sequence of characters as this
 * object.
 *
 * @param  anObject
 *         The object to compare this {@code String} against
 *
 * @return  {@code true} if the given object represents a {@code String}
 *          equivalent to this string, {@code false} otherwise
 *
 * @see  #compareTo(String)
 * @see  #equalsIgnoreCase(String)
 */
public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = value.length;
        if (n == anotherString.value.length) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = 0;
            while (n-- != 0) {
                if (v1[i] != v2[i])
                    return false;
                i++;
            }
            return true;
        }
    }
    return false;
}

5. hashCode() 方法

/**
 * Returns a hash code for this string. The hash code for a
 * {@code String} object is computed as
 * <blockquote><pre>
 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 * </pre></blockquote>
 * using {@code int} arithmetic, where {@code s[i]} is the
 * <i>i</i>th character of the string, {@code n} is the length of
 * the string, and {@code ^} indicates exponentiation.
 * (The hash value of the empty string is zero.)
 *
 * @return  a hash code value for this object.
 */
public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

String 类的 hashCode 算法很简单，主要就是中间的 for 循环，计算公式如下：

1	s[0]31^(n-1) + s[1]31^(n-2) + ...s[n-1]

s 数组即源码中的 val 数组，也就是构成字符串的字符数组。

这里有个数字31，为什么选择31作为乘积因子而没有用一个常量来表示？

主要原因有两个：

31是一个不大不小的质数，是作为hashCode乘子的优选质数之一
31可以被JVM优化，31*i= （i <<5）- i 。因为移位运算比乘法运行更快更省性能。

具体请参考此篇文章：[科普String 用31作为因子][https://www.cnblogs.com/nullllun/p/8350178.html]

6. charAt(int index) 方法

/**
 * Returns the {@code char} value at the
 * specified index. An index ranges from {@code 0} to
 * {@code length() - 1}. The first {@code char} value of the sequence
 * is at index {@code 0}, the next at index {@code 1},
 * and so on, as for array indexing.
 *
 * <p>If the {@code char} value specified by the index is a
 * <a href="Character.html#unicode">surrogate</a>, the surrogate
 * value is returned.
 *
 * @param      index   the index of the {@code char} value.
 * @return     the {@code char} value at the specified index of this string.
 *             The first {@code char} value is at index {@code 0}.
 * @exception  IndexOutOfBoundsException  if the {@code index}
 *             argument is negative or not less than the length of this
 *             string.
 */
public char charAt(int index) {
    //如果传入的索引大于字符串的长度或者小于0，直接抛出索引越界异常
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    // 返回指定索引的单个字符
    return value[index];
}

一个字符串由一个字符数组组成，这个方法是通过传入的索引（数组下标），返回指定索引的单个字符。

7. compareTo() 方法

7.1 compareTo(String anotherString)

 * @param   anotherString   the {@code String} to be compared.
 * @return  the value {@code 0} if the argument string is equal to
 *          this string; a value less than {@code 0} if this string
 *          is lexicographically less than the string argument; and a
 *          value greater than {@code 0} if this string is
 *          lexicographically greater than the string argument.
 */
     
public int compareTo(String anotherString) {
    int len1 = value.length;
    int len2 = anotherString.value.length;
    int lim = Math.min(len1, len2);
    char v1[] = value;
    char v2[] = anotherString.value;

    int k = 0;
    while (k < lim) {
        char c1 = v1[k];
        char c2 = v2[k];
        if (c1 != c2) {
            return c1 - c2;
        }
        k++;
    }
    return len1 - len2;
}

这里的源码也很好理解，该方法是按字母的顺序比较两个字符串，是基于字符串中每个字符的Unicode值。

当两个字符串的某个位置的字符不同时，返回的是这一位置的字符Unicode值的差，当两个字符串都相同时，返回的事两个字符串长度的差。

7.2 compareTolgnoreCase(String str)

compareToIgnoreCase(String str) 方法实在compareTo方法的基础上忽略大小写，我们知道大写字母是比小写字母的Unicode值小32的。

底层都是先都转换成大写比较，然后都转换成小写比较。

public int compareToIgnoreCase(String str) {
    return CASE_INSENSITIVE_ORDER.compare(this, str);
}


    /**
     * A Comparator that orders {@code String} objects as by
     * {@code compareToIgnoreCase}. This comparator is serializable.
     * <p>
     * Note that this Comparator does <em>not</em> take locale into account,
     * and will result in an unsatisfactory ordering for certain locales.
     * The java.text package provides <em>Collators</em> to allow
     * locale-sensitive ordering.
     *
     * @see     java.text.Collator#compare(String, String)
     * @since   1.2
     */
    public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                         = new CaseInsensitiveComparator();
    private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if (c1 != c2) {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if (c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if (c1 != c2) {
                            // No overflow because of numeric promotion
                            return c1 - c2;
                        }
                    }
                }
            }
            return n1 - n2;
        }

        /** Replaces the de-serialized object. */
        private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
    }

8. concat(String str) 方法

该方法是将指定的字符串连接到此字符串的末尾。

/**
 * Concatenates the specified string to the end of this string.
 * <p>
 * If the length of the argument string is {@code 0}, then this
 * {@code String} object is returned. Otherwise, a
 * {@code String} object is returned that represents a character
 * sequence that is the concatenation of the character sequence
 * represented by this {@code String} object and the character
 * sequence represented by the argument string.<p>
 * Examples:
 * <blockquote><pre>
 * "cares".concat("s") returns "caress"
 * "to".concat("get").concat("her") returns "together"
 * </pre></blockquote>
 *
 * @param   str   the {@code String} that is concatenated to the end
 *                of this {@code String}.
 * @return  a string that represents the concatenation of this object's
 *          characters followed by the string argument's characters.
 */

public String concat(String str) {
    int otherLen = str.length();
    // 首先判断要拼接的字符串长度是否为0，如果为0，则直接返回原字符串。
    if (otherLen == 0) {
        return this;
    }
    int len = value.length;
    // 如果不为0,则通过Arrays工具类的copyOf方法创建一个新的字符数组，长度为原字符和要拼接的字符串之和，前面填充的是原字符串，后面为空。
    char buf[] = Arrays.copyOf(value, len + otherLen);
    // 接着通过getChars方法将要拼接的字符串放入新字符串后面空的位置。
    str.getChars(buf, len);
    // 注意返回值是new String(buf,true)。也就是重新通过new关键字创建了一个新的字符串，原字符串不是变的。
    // 这就是我们前面说的一旦一个String对象被创建，包含在这个对象的字符序列是不可改变的。
    return new String(buf, true);
}

首先判断要拼接的字符串长度是否为0，如果为0，则直接返回原字符串。如果不为0，则通过 Arrays 工具类（后面会详细介绍这个工具类）的copyOf方法创建一个新的字符数组，长度为原字符串和要拼接的字符串之和，前面填充原字符串，后面为空。接着在通过 getChars 方法将要拼接的字符串放入新字符串后面为空的位置。

　　注意：返回值是 new String(buf, true)，也就是重新通过 new 关键字创建了一个新的字符串，原字符串是不变的。这也是前面我们说的一旦一个String对象被创建, 包含在这个对象中的字符序列是不可改变的。

9. indexOf() 方法

9.1 indexOf(int ch)

indexOf(int ch)，参数ch其实是字符的Unicode值，这里也可以放单个字符（默认转成int），作用是返回指定字符第一次出现的此字符串的索引。

其内部是调用indexOf(int ch, int fromIndex)，只不过这里的fromIndex = 0,因为是从0开始搜索。而indexOf(int ch.int fromIndex)作用也是返回首次出现次字符串的内索引，但是从指定索引处开始搜索。

1
2
3

public int indexOf(int ch) {
    return indexOf(ch, 0);
}

9.2 indexOf(int ch,int fromIndex)

 * <p>All indices are specified in {@code char} values
 * (Unicode code units).
 *
 * @param   ch          a character (Unicode code point).
 * @param   fromIndex   the index to start the search from.
 * @return  the index of the first occurrence of the character in the
 *          character sequence represented by this object that is greater
 *          than or equal to {@code fromIndex}, or {@code -1}
 *          if the character does not occur.
 */
     
public int indexOf(int ch, int fromIndex) {
     // max等于字符的长度
    final int max = value.length;
     // 指定索引的位置如果小于0，默认从 0 开始搜索
    if (fromIndex < 0) {
        fromIndex = 0;
    } else if (fromIndex >= max) {
        // 如果指定索引值大于等于字符的长度，（数组的下标最多是max-1）
        // 直接返回-1
        // Note: fromIndex might be near -1>>>1.
        return -1;
    }

     // 一个char占用两个字节
     // 如果char小于2的16次方（65535），绝对部分字符都在这个范围内
    if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
        // handle most cases here (ch is a BMP code point or a
        // negative value (invalid code point))
        final char[] value = this.value;
        // for循环判断遍历是否和指定字符相等
        for (int i = fromIndex; i < max; i++) {
            // //存在相等的字符，返回第一次出现该字符的索引位置，并终止循环
            if (value[i] == ch) {
                return i;
            }
        }
        // 不存在相等，则返回-1
        return -1;
    } else {
        // 如果存在大于65535的情况，该方法先会判断是否是有效字符，然后依次进行比较
        return indexOfSupplementary(ch, fromIndex);
    }
}

10. split() 方法

10.1 split(String regex)

split(String regex) 将该字符串拆分为给定正则表达式的匹配。

对于 split(String regex) 没什么好说的，内部调用 split(regex, 0) 方法：

 * @param  regex
 *         the delimiting regular expression
 *
 * @return  the array of strings computed by splitting this string
 *          around matches of the given regular expression
 *
 * @throws  PatternSyntaxException
 *          if the regular expression's syntax is invalid
 *
 * @see java.util.regex.Pattern
 *
 * @since 1.4
 * @spec JSR-51
 */
     
public String[] split(String regex) {
    return split(regex, 0);

10.2 split(String regex,int limit)

split(String regex , int limit) 也是一样，不过对于 limit 的取值有三种情况：

limit > 0 ,则pattern（模式）应用 n-1次

String str = "a,b,c";
String[] c1 = str.split(",", 2);
System.out.println(c1.length);   // 2 
System.out.println(Arrays.toString(c1));  // {"a","b,c"}

limit = 0 ，则pattern（模式）应用无限次并且忽略末尾的空字符串。

String str2 = "a,b,c,,";
String[] c2 = str2.split(",", 0);
System.out.println(c2.length);   // 3
System.out.println(Arrays.toString(c2)); 	//{"a","b","c"}

limit < 0 ,则pattern（模式）应用无限次

String str3 = "a,b,c,,";
String[] c3 = str3.split(",", -1);
System.out.println(c3.length);  //5
System.out.println(Arrays.toString(c3));    // {"a","b","c","",""}

下面看看底层的源码实现,重点看 split(String regex, int limit) 的方法实现：

 * @param  regex
 *         the delimiting regular expression
 *
 * @param  limit
 *         the result threshold, as described above
 *
 * @return  the array of strings computed by splitting this string
 *          around matches of the given regular expression
 *
 * @throws  PatternSyntaxException
 *          if the regular expression's syntax is invalid
 *
 * @see java.util.regex.Pattern
 *
 * @since 1.4
 * @spec JSR-51
 */
     
public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
     // 单个字符，且不是".$|()[{^?*+\\"的其中一个
     // 两个字符，第一个是"/",
    char ch = 0;
    if (((regex.value.length == 1 &&
         ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
         (regex.length() == 2 &&
          regex.charAt(0) == '\\' &&
          (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
          ((ch-'a')|('z'-ch)) < 0 &&
          ((ch-'A')|('Z'-ch)) < 0)) &&
        (ch < Character.MIN_HIGH_SURROGATE ||
         ch > Character.MAX_LOW_SURROGATE))
    {
        int off = 0;
        int next = 0;
        // limited == true ，反之limited = false
        boolean limited = limit > 0;
        ArrayList<String> list = new ArrayList<>();
        while ((next = indexOf(ch, off)) != -1) {
            // 当limit小于<=0 或者 集合list的长度小于limit-1
            if (!limited || list.size() < limit - 1) {
                list.add(substring(off, next));
                off = next + 1;
            } else {    // last one
                //assert (list.size() == limit - 1);
                // 判断最后一个list.size（）==limit-1
                list.add(substring(off, value.length));
                off = value.length;
                break;
            }
        }
        // If no match was found, return this
        // 如果没有一个能匹配上的，返回一个新的字符串，内容和原来一样
        if (off == 0)
            return new String[]{this};

        // Add remaining segment
        // 当limit<=0时，limited=false，或者集合长度小于limit时，截取添加剩下的字符串。
        if (!limited || list.size() < limit)
            list.add(substring(off, value.length));

        // Construct result
        // 当limit == 0 如果末尾添加的元素为空，则集合长度不断减1，直到末尾不为空
        int resultSize = list.size();
        if (limit == 0) {
            while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                resultSize--;
            }
        }
        String[] result = new String[resultSize];
        return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
}

11. replace() 方法

11.1 replace (char oldChar, char newChar)

replace (char oldChar, char newChar) ：将原来字符串的所有oldChar字符都替换成newChar字符，返回一个新的字符串。

/**
 * Returns a string resulting from replacing all occurrences of
 * {@code oldChar} in this string with {@code newChar}.
 * <p>
 * If the character {@code oldChar} does not occur in the
 * character sequence represented by this {@code String} object,
 * then a reference to this {@code String} object is returned.
 * Otherwise, a {@code String} object is returned that
 * represents a character sequence identical to the character sequence
 * represented by this {@code String} object, except that every
 * occurrence of {@code oldChar} is replaced by an occurrence
 * of {@code newChar}.
 * <p>
 * Examples:
 * <blockquote><pre>
 * "mesquite in your cellar".replace('e', 'o')
 *         returns "mosquito in your collar"
 * "the war of baronets".replace('r', 'y')
 *         returns "the way of bayonets"
 * "sparring with a purple porpoise".replace('p', 't')
 *         returns "starring with a turtle tortoise"
 * "JonL".replace('q', 'x') returns "JonL" (no change)
 * </pre></blockquote>
 *
 * @param   oldChar   the old character.
 * @param   newChar   the new character.
 * @return  a string derived from this string by replacing every
 *          occurrence of {@code oldChar} with {@code newChar}.
 */
public String replace(char oldChar, char newChar) {
    if (oldChar != newChar) {
        int len = value.length;
        int i = -1;
        char[] val = value; /* avoid getfield opcode */

        while (++i < len) {
            if (val[i] == oldChar) {
                break;
            }
        }
        if (i < len) {
            char buf[] = new char[len];
            for (int j = 0; j < i; j++) {
                buf[j] = val[j];
            }
            while (i < len) {
                char c = val[i];
                buf[i] = (c == oldChar) ? newChar : c;
                i++;
            }
            return new String(buf, true);
        }
    }
    return this;
}

11.2 String replaceAll(String regex, String replacement)

将匹配的正则表达式regex的匹配箱都换成replacement字符串，返回一个新的字符串。

 * @see java.util.regex.Pattern
 *
 * @since 1.4
 * @spec JSR-51
 */
public String replaceAll(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}


 * @param  replacement
 *         The replacement string
 *
 * @return  The string constructed by replacing each matching subsequence
 *          by the replacement string, substituting captured subsequences
 *          as needed
 */
public String replaceAll(String replacement) {
    reset();
    boolean result = find();
    if (result) {
        StringBuffer sb = new StringBuffer();
        do {
            appendReplacement(sb, replacement);
            result = find();
        } while (result);
        appendTail(sb);
        return sb.toString();
    }
    return text.toString();
}


/**
 * Compiles the given regular expression into a pattern.
 *
 * @param  regex
 *         The expression to be compiled
 * @return the given regular expression compiled into a pattern
 * @throws  PatternSyntaxException
 *          If the expression's syntax is invalid
 */
public static Pattern compile(String regex) {
    return new Pattern(regex, 0);
}

12. substring() 方法

12.1 substring(int beginIndex)

substring(int beginIndex)返回的事一个索引从beginIndex开始一直到结尾的字符串。

/**
  * Returns a string that is a substring of this string. The
  * substring begins with the character at the specified index and
  * extends to the end of this string. <p>
  * Examples:
  * <blockquote><pre>
  * "unhappy".substring(2) returns "happy"
  * "Harbison".substring(3) returns "bison"
  * "emptiness".substring(9) returns "" (an empty string)
  * </pre></blockquote>
  *
  * @param      beginIndex   the beginning index, inclusive.
  * @return     the specified substring.
  * @exception  IndexOutOfBoundsException  if
  *             {@code beginIndex} is negative or larger than the
  *             length of this {@code String} object.
  */

 public String substring(int beginIndex) {
     // 如果索引的小于0，直接抛出异常
     if (beginIndex < 0) {
         throw new StringIndexOutOfBoundsException(beginIndex);
     }
     // subLen等于字符串长度减去索引
     int subLen = value.length - beginIndex;
     // 如果subLen小于0，也是直接抛出异常
     if (subLen < 0) {
         throw new StringIndexOutOfBoundsException(subLen);
     }
     // 1. 如果索引值beginIndex == 0,直接返回原字符串
     // 2. 如果beginIndex不等于0，返回从beginIndex开始（一直到结尾）的新字符串
     return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
 }

12.2 substring(int beginIndex, int endIndex)

substring(int beginIndex, int endIndex)

从一个索引beginIndex开始，到endIndex结尾的子字符串。

/**
 * Returns a string that is a substring of this string. The
 * substring begins at the specified {@code beginIndex} and
 * extends to the character at index {@code endIndex - 1}.
 * Thus the length of the substring is {@code endIndex-beginIndex}.
 * <p>
 * Examples:
 * <blockquote><pre>
 * "hamburger".substring(4, 8) returns "urge"
 * "smiles".substring(1, 5) returns "mile"
 * </pre></blockquote>
 *
 * @param      beginIndex   the beginning index, inclusive.
 * @param      endIndex     the ending index, exclusive.
 * @return     the specified substring.
 * @exception  IndexOutOfBoundsException  if the
 *             {@code beginIndex} is negative, or
 *             {@code endIndex} is larger than the length of
 *             this {@code String} object, or
 *             {@code beginIndex} is larger than
 *             {@code endIndex}.
 */
public String substring(int beginIndex, int endIndex) {
    // 起始索引小于0
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    
    // 结束索引大于长度
    if (endIndex > value.length) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    
    // // subLen等于结束索引减去起始索引
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    // 1. 如果beginIndex == 0 并且 endIndex == 字符串长度 返回字符串自己
    // 2. 如果不等，就返回新的字符串、
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}

13. 常量池

在前面讲解构造函数的时候，我们知道最常见的两种声明一个字符串对象的形式有两种：

通过“字面量”的形式直接赋值

1	String str = "hello";

通过new关键字调用构造函数创建对象

1	String str = new String("hello");

那么这两种声明方式有什么区别呢？

在讲解之前，我们先介绍 JDK1.7（不包括1.7）以前的 JVM 的内存分布：

mark

程序计数器：也就是PC 寄存器，保存的是程序当前执行的指令地址（也可以说是保存下一条指令所在存储单元的地址。）当CPU需要执行指令的时候，需要从程序计数器中得到当前要执行的指令所在的存储单元的地址，然后根据得到的地址获取到指令，在等到指令之后，程序计数器便会自动加1或者根据转移指针得到下一条指令的地址，如此循环，直到执行完所有的指令，（线程私有的）
操作数栈：所有的基本数据类型，对象的引用都放在这里。（线程私有的）
本地方法栈：操作数栈是为java提供服务的。而本地方法栈则是执行本地方法的（Native Method）服务。在JVM规范中，并没有对本地方法栈的具体实现方法和数据结构有强制规定，不同的虚拟机可以自由实现它。在HotSpot虚拟机中直接把本地方法栈和操作数栈合二为一。
方法区：存储了每个类的信息（包括class文件，方法信息，字段信息），静态变量，常量以及编译器编译后的代码等。注意：在class文件中除了类的字段，方法，接口的描述信息以外，还有一项信息是常量池，用来存储编译期间生成的字面量和符号引用。
堆：用来存储对象本身以及数组（当然，数组的引用在java栈中）

在 JDK1.7 以后，方法区中的常量池放在了堆中,如下图所示：

mark

常量池： java运行时会维护一个String Pool(String 池)，也就是“字符串缓冲区”。String池用来存放运行时产生的各种字符串，并且池中的字符串不会有重复。

字面量创建的字符串或者纯字符串（常量）拼接字符串会先在字符串池中进行查找，看是否有相等的对象，没有的话就在字符串池中创建该对象；有的话就直接用池中的引用，避免重复创建对象。
new关键字创建时，直接在堆中创建一个新对象，变量所引用的都是这个新对象的地址。但是如果通过new关键字创建的字符串内容在常量池中存在了，那么会由堆在指向常量池的对应字符。但是反过来，如果new关键字创建的字符串对象在常量池中没有，那么通过new关键词创建的字符串对象是不会额外在常量池中维护的。
使用包含变量表达式来创建String对象，则不仅会检查维护字符串池，还会在堆中创建这个对象，最后是指向堆内存的对象。

// 字面量创建字符串或者纯字符串（常量）
String str1 = "hello";
String str2 = "hello";
// new关键字创建时
String str3 = new String("hello");

System.out.println(str1 == str2);  // true
System.out.println(str1 == str3);  // false
System.out.println(str2 == str3);  // false

System.out.println(str1.equals(str2));  // true
System.out.println(str1.equals(str3));  // true
System.out.println(str2.equals(str3));  // true

对于上面的情况（具体解释）

首先String str1 = "hello" 会先到常量池中检查是否有相等的对象，这里发现是没有相等的对象的，于是在常量池中创建了"hello"对象，并将常量池中的引用赋值给str1;

然后，第二个字面量String str2 = "hello"; ,在常量池中检测到了该对象了，直接将引用赋值给str2;

最后，第三个String str3 = new String("hello"); , 通过new创建对象时候，常量池中有了该对象了，不用在常量池中创建，在堆中创建该对象后，将堆中的对象引用赋值给了str3，再将该对象指向常量池。

具体过程如下图所示:

mark

注意：看上图hello指向hello的箭头，通过new 关键字创建字符串的对象，如果常量池中存在了，会将堆中创建的对象指向常量池的引用。后面可以通过intern()方法进行验证。

使用包含变量表达式创建对象：

public static void main(String[] args) {
    String str1 = "hello";
    String str2 = "helloworld";
    String str3 = str1 + "world";    // 编译器不能确定是常量（会在堆中创建一个String对象）
    String str4 = "hello" + "world"; // 编译器确定为常量，直接到常量池中引用

    System.out.println(str2 == str3);  // false
    System.out.println(str2 == str4);  // true
    System.out.println(str3 == str4);  // false
}

mark s

上述例子中 str3 由于含有变量str1,编译器不能确定是不是常量，会在堆中创建一个String对象。而str4是两个常量的相加，直接引用常量池中的对象即可。

14. intern() 方法

这是一个本地方法：

/**
 * Returns a canonical representation for the string object.
 * <p>
 * A pool of strings, initially empty, is maintained privately by the
 * class {@code String}.
 * <p>
 * When the intern method is invoked, if the pool already contains a
 * string equal to this {@code String} object as determined by
 * the {@link #equals(Object)} method, then the string from the pool is
 * returned. Otherwise, this {@code String} object is added to the
 * pool and a reference to this {@code String} object is returned.
 * <p>
 * It follows that for any two strings {@code s} and {@code t},
 * {@code s.intern() == t.intern()} is {@code true}
 * if and only if {@code s.equals(t)} is {@code true}.
 * <p>
 * All literal strings and string-valued constant expressions are
 * interned. String literals are defined in section 3.10.5 of the
 * <cite>The Java&trade; Language Specification</cite>.
 *
 * @return  a string that has the same contents as this string, but is
 *          guaranteed to be from a pool of unique strings.
 */
public native String intern();

当调用intern方法时候，如果池中已经包含了一个与该String确定的字符串相同equals（Object）的字符串，则返回该字符串。否则，将此String添加到池中，并返回这个对象的引用。

这句话什么意思呢？就是说调用一个String对象的intern（）方法的时候，如果常量池中已经有该对象了，直接返回该字符串的引用（存在堆中就返回堆中的，存在池中就返回池中的），如果没有，就将该对象添加到池中，并返回池中的引用。

具体理解请看下面的例子：

public static void main(String[] args) {
    // 字面量 只会在常量池中创建对象
    String str1 = "hello";
    String str2 = str1.intern();
    System.out.println(str1 == str2);  // true

    // new 只会在堆中创建对象
    String str3 = new String("world");  // 堆中的引用
    String str4 = str3.intern();    // 常量池中没有，添加到池中，并返回池中的引用
    System.out.println(str3 == str4);   // false

    // 变量拼接的字符串，会在常量池中和堆中都创建对象(堆中)
    String str5 = str1 + str3;
    // 这里由于池子中已经有对象了，直接返回对象本身，也就是堆中的对象（堆中）
    String str6 = str5.intern();
    System.out.println(str5 == str6);  // true


    // 常量拼接的字符串，只会在常量池中创建对象
    String str7 = "hello1" + "hello2";
    String str8 = str7.intern();
    System.out.println(str7 == str8); // true
}

15. String 真的不可变吗？

答案：引用不可变，内容可以通过反射来修改。

前面介绍了String使用final关键字修饰的，所以我们认为它是不可变的对象。但是真的不可变吗？

每个字符串都是由许多单个字符组成的，我们知道其源码是char[] value字符数组的构成。

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

value是被final修饰的，只能保证引用不被改变。但是value所指向的堆中的数组，才是真实的数据，只要能够操作堆中的数组，依旧能改变数据。
而且value是基本类型构成，那么一定是可变的，即使被private修饰，也可以通过反射来改变。

public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {

    String str = "vue";

    // 打印原字符
    System.out.println(str);

    // 拿到反射字段
    Field fieldStr = String.class.getDeclaredField("value");

    // 修改反射权限
    fieldStr.setAccessible(true);

    // 获取str对象上的value属性值
    char[] value = (char[]) fieldStr.get(str);

    // 将数组第一个字符修改为V
    value[0] = 'V';

    // 打印修改后的字符数组（字符串）
    System.out.println(str);
}

通过前后两次打印的结果，我们可以看到 String 被改变了，但是在代码里，几乎不会使用反射的机制去操作 String 字符串，所以，我们会认为 String 类型是不可变的。

那么String为什么被设计成不可变的呢？我们可以从安全方面和性能考虑：

安全：
- 引发安全问题，譬如，数据库的用户名、密码都是以字符串的形式传入来获得数据库的连接，或者在socket编程中，主机名和端口都是以字符串的形式传入。因为字符串是不可变的，所以它的值是不可改变的，否则黑客们可以钻到空子，改变字符串指向的对象的值，造成安全漏洞。
- 保证线程安全，在并发场景下，多个线程同时读写资源时，会引竞态条件，由于 String 是不可变的，不会引发线程的问题而保证了线程。
- HashCode，当 String 被创建出来的时候，hashcode也会随之被缓存，hashcode的计算与value有关，若 String 可变，那么 hashcode 也会随之变化，针对于 Map、Set 等容器，他们的键值需要保证唯一性和一致性，因此,String的不可变性让它比任何对象都适合作键值。
性能：
- 当字符串不可变时，字符串常量池才有意义。
- 字符串常量池的出现，可以减少创建相同字面量的字符串，让不同的引用指向池中同一个字符串，为运行节约了很多堆内存。
- 若字符串可变，字符串常量池就失去了意义，基于常量池的intern()方法也就失效了，每次创建新的String将在堆中开辟出新的空间，占据更多的内存。

参考文档：

https://docs.oracle.com/javase/8/docs/api/java/lang/String.html

打赏

版权声明： 本博客所有文章除特别声明外，均采用 Apache License 2.0 许可协议。转载请注明出处！