====== 第九章 字符串处理 ======

===== 9.1 Rust中的字符串类型 =====

Rust中有两种主要的字符串类型：

  * **String**：可变的、有所有权的UTF-8编码字符串
  * **str**：不可变的字符串slice，通常以&str形式使用

===== 9.2 String类型 =====

==== 创建String ====

<code rust>
// 空字符串
let mut s = String::new();

// 从字符串字面量创建
let s = String::from("hello");
let s = "hello".to_string();

// 从其他类型创建
let s = format!("{}-{}-{:?}", "hello", 42, true);
</code>

==== 修改String ====

**追加字符串：**

<code rust>
let mut s = String::from("foo");
s.push_str("bar");  // foobar

let mut s = String::from("lo");
s.push('l');  // lol，push追加单个字符
</code>

**连接字符串：**

<code rust>
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2;  // s1被移动，s2被借用

// 使用format!（不会获取所有权）
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = format!("{}-{}-{}", s1, s2, s3);  // s1, s2, s3都仍然有效
</code>

**插入和删除：**

<code rust>
let mut s = String::from("Hello");

// 插入字符
s.insert(5, ',');  // "Hello,"

// 插入字符串
s.insert_str(6, " world");  // "Hello, world"

// 删除字符（按索引的字节位置）
s.remove(0);  // 删除第一个字符

// 清除
s.clear();  // 清空字符串
</code>

===== 9.3 字符串索引 =====

**Rust不支持按索引访问字符串：**

<code rust>
let s = String::from("hello");
// let h = s[0];  // 错误！
</code>

**原因：UTF-8编码**

<code rust>
let hello = "中国人";

// 每个汉字占3字节
println!("字节数：{}", hello.len());  // 9

// 获取字节
for b in hello.bytes() {
    println!("{}", b);
}
</code>

**使用chars()遍历Unicode标量值：**

<code rust>
for c in "中国人".chars() {
    println!("{}", c);
}
// 输出：中、国、人
</code>

**使用char_indices()：**

<code rust>
for (i, c) in "中国人".char_indices() {
    println!("索引{}：字符'{}'", i, c);
}
// 索引0：字符'中'
// 索引3：字符'国'
// 索引6：字符'人'
</code>

===== 9.4 字符串Slice =====

**创建Slice：**

<code rust>
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
</code>

**注意字节边界：**

<code rust>
let s = "中国人";
let zhong = &s[0..3];  // "中" - 正确
// let guo = &s[0..2];    // 错误！不是有效的UTF-8边界
</code>

===== 9.5 字符串遍历 =====

==== 按字符遍历 ====

<code rust>
for c in "hello 中国人".chars() {
    println!("{}", c);
}
</code>

==== 按字节遍历 ====

<code rust>
for b in "hello".bytes() {
    println!("{}", b);
}
</code>

==== 按行遍历 ====

<code rust>
let text = "line1\nline2\nline3";
for line in text.lines() {
    println!("{}", line);
}
</code>

==== 按单词遍历 ====

<code rust>
let text = "hello world from Rust";
for word in text.split_whitespace() {
    println!("{}", word);
}
</code>

===== 9.6 字符串方法 =====

==== 查找和替换 ====

<code rust>
let s = String::from("hello world");

// 查找子串位置
let pos = s.find("world");  // Some(6)
let pos = s.find("xyz");    // None

// 替换
let new_s = s.replace("world", "Rust");  // "hello Rust"

// 替换指定次数
let new_s = s.replacen("l", "L", 1);  // "heLlo world"
</code>

==== 分割字符串 ====

<code rust>
let s = "a,b,c,d";

// 按字符分割
let parts: Vec<&str> = s.split(',').collect();

// 按多个字符分割
let s = "a b\tc\nd";
let parts: Vec<&str> = s.split_whitespace().collect();

// 按字符串分割
let s = "hello::world::Rust";
let parts: Vec<&str> = s.split("::").collect();

// 限制分割次数
let s = "a,b,c,d";
let parts: Vec<&str> = s.splitn(2, ',').collect();  // ["a", "b,c,d"]
</code>

==== 修剪空白 ====

<code rust>
let s = "  hello world  ";

// 去除两端
let trimmed = s.trim();      // "hello world"

// 去除开头
let trimmed = s.trim_start();

// 去除结尾
let trimmed = s.trim_end();

// 去除指定字符
let s = "xxxhello worldxxx";
let trimmed = s.trim_matches('x');  // "hello world"
</code>

==== 大小写转换 ====

<code rust>
let s = "Hello";

println!("{}", s.to_uppercase());  // "HELLO"
println!("{}", s.to_lowercase());  // "hello"
</code>

==== 检查 ====

<code rust>
let s = "hello world";

// 是否以...开头
assert!(s.starts_with("hello"));

// 是否以...结尾
assert!(s.ends_with("world"));

// 是否包含
assert!(s.contains("lo wo"));

// 是否符合模式
assert!(s.matches("l").count() == 3);
</code>

===== 9.7 String与&str的转换 =====

==== String转&str ====

<code rust>
let s = String::from("hello");

// &操作符
let slice: &str = &s;

// as_str()
let slice = s.as_str();

// 自动解引用
fn takes_str(s: &str) {}
takes_str(&s);  // 自动转换
</code>

==== &str转String ====

<code rust>
let slice = "hello";

// to_string()
let s = slice.to_string();

// String::from()
let s = String::from(slice);

// into()
let s: String = slice.into();
</code>

===== 9.8 UTF-8处理 =====

==== 获取字符数 ====

<code rust>
let s = "中国人";

// 字节数
println!("字节数：{}", s.len());  // 9

// 字符数（Unicode标量值）
println!("字符数：{}", s.chars().count());  // 3
</code>

==== 字符边界检查 ====

<code rust>
fn safe_slice(s: &str, start: usize, end: usize) -> Option<&str> {
    if s.is_char_boundary(start) && s.is_char_boundary(end) {
        Some(&s[start..end])
    } else {
        None
    }
}

let s = "中国人";
println!("{:?}", safe_slice(s, 0, 3));   // Some("中")
println!("{:?}", safe_slice(s, 0, 2));   // None
</code>

===== 9.9 字符串与所有权 =====

==== 迭代时消耗所有权 ====

<code rust>
let s = String::from("hello");

// into_bytes()消耗所有权
let bytes = s.into_bytes();
// s在这里已失效
</code>

==== 避免不必要的克隆 ====

<code rust>
// 不好的做法
let s1 = String::from("hello");
let s2 = s1.clone();  // 深拷贝
let slice = &s2[0..2];

// 好的做法
let s1 = String::from("hello");
let slice = &s1[0..2];  // 借用
</code>

===== 练习题 =====

==== 练习题9.1：反转字符串 ====

<code rust>
fn reverse(s: &str) -> String {
    s.chars().rev().collect()
}

fn main() {
    let s = "hello 中国";
    println!("原字符串：{}", s);
    println!("反转后：{}", reverse(s));
}
</code>

==== 练习题9.2：判断回文 ====

<code rust>
fn is_palindrome(s: &str) -> bool {
    let s: String = s.chars()
        .filter(|c| c.is_alphanumeric())
        .map(|c| c.to_lowercase().next().unwrap())
        .collect();
    
    s == s.chars().rev().collect::<String>()
}

fn main() {
    println!("{}", is_palindrome("A man, a plan, a canal: Panama"));  // true
    println!("{}", is_palindrome("race a car"));  // false
    println!("{}", is_palindrome("中国人中国"));  // true
}
</code>

==== 练习题9.3：字符串统计 ====

<code rust>
fn analyze(s: &str) {
    println!("字符串：{}", s);
    println!("字节数：{}", s.len());
    println!("字符数：{}", s.chars().count());
    println!("单词数：{}", s.split_whitespace().count());
    println!("行数：{}", s.lines().count());
    
    // 统计各类字符
    let mut letters = 0;
    let mut digits = 0;
    let mut spaces = 0;
    let mut others = 0;
    
    for c in s.chars() {
        if c.is_alphabetic() {
            letters += 1;
        } else if c.is_numeric() {
            digits += 1;
        } else if c.is_whitespace() {
            spaces += 1;
        } else {
            others += 1;
        }
    }
    
    println!("字母：{}，数字：{}，空白：{}，其他：{}", 
             letters, digits, spaces, others);
}

fn main() {
    let text = "Hello, Rust 2024!\n这是第2行。";
    analyze(text);
}
</code>

==== 练习题9.4：CSV解析简化版 ====

<code rust>
fn parse_csv_line(line: &str) -> Vec<&str> {
    line.split(',').map(|s| s.trim()).collect()
}

fn main() {
    let csv = "name, age, city\nAlice, 30, New York\nBob, 25, London";
    
    for line in csv.lines() {
        let fields = parse_csv_line(line);
        println!("{:?}", fields);
    }
}
</code>

===== 本章小结 =====

本章学习了Rust的字符串处理：

  * **String**：可变的、有所有权的字符串
  * **&str**：字符串slice，不可变借用
  * **索引限制**：Rust不支持字符串索引，因为UTF-8变长编码
  * **遍历方式**：chars()、bytes()、lines()等
  * **常用方法**：find、replace、split、trim等
  * **UTF-8处理**：注意字符边界，使用is_char_boundary检查

理解String和&str的区别，以及UTF-8编码特性，是Rust字符串处理的关键。