====== 第九章 字符串处理 ======
===== 9.1 Rust中的字符串类型 =====
Rust中有两种主要的字符串类型:
* **String**:可变的、有所有权的UTF-8编码字符串
* **str**:不可变的字符串slice,通常以&str形式使用
===== 9.2 String类型 =====
==== 创建String ====
// 空字符串
let mut s = String::new();
// 从字符串字面量创建
let s = String::from("hello");
let s = "hello".to_string();
// 从其他类型创建
let s = format!("{}-{}-{:?}", "hello", 42, true);
==== 修改String ====
**追加字符串:**
let mut s = String::from("foo");
s.push_str("bar"); // foobar
let mut s = String::from("lo");
s.push('l'); // lol,push追加单个字符
**连接字符串:**
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2; // s1被移动,s2被借用
// 使用format!(不会获取所有权)
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = format!("{}-{}-{}", s1, s2, s3); // s1, s2, s3都仍然有效
**插入和删除:**
let mut s = String::from("Hello");
// 插入字符
s.insert(5, ','); // "Hello,"
// 插入字符串
s.insert_str(6, " world"); // "Hello, world"
// 删除字符(按索引的字节位置)
s.remove(0); // 删除第一个字符
// 清除
s.clear(); // 清空字符串
===== 9.3 字符串索引 =====
**Rust不支持按索引访问字符串:**
let s = String::from("hello");
// let h = s[0]; // 错误!
**原因:UTF-8编码**
let hello = "中国人";
// 每个汉字占3字节
println!("字节数:{}", hello.len()); // 9
// 获取字节
for b in hello.bytes() {
println!("{}", b);
}
**使用chars()遍历Unicode标量值:**
for c in "中国人".chars() {
println!("{}", c);
}
// 输出:中、国、人
**使用char_indices():**
for (i, c) in "中国人".char_indices() {
println!("索引{}:字符'{}'", i, c);
}
// 索引0:字符'中'
// 索引3:字符'国'
// 索引6:字符'人'
===== 9.4 字符串Slice =====
**创建Slice:**
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
**注意字节边界:**
let s = "中国人";
let zhong = &s[0..3]; // "中" - 正确
// let guo = &s[0..2]; // 错误!不是有效的UTF-8边界
===== 9.5 字符串遍历 =====
==== 按字符遍历 ====
for c in "hello 中国人".chars() {
println!("{}", c);
}
==== 按字节遍历 ====
for b in "hello".bytes() {
println!("{}", b);
}
==== 按行遍历 ====
let text = "line1\nline2\nline3";
for line in text.lines() {
println!("{}", line);
}
==== 按单词遍历 ====
let text = "hello world from Rust";
for word in text.split_whitespace() {
println!("{}", word);
}
===== 9.6 字符串方法 =====
==== 查找和替换 ====
let s = String::from("hello world");
// 查找子串位置
let pos = s.find("world"); // Some(6)
let pos = s.find("xyz"); // None
// 替换
let new_s = s.replace("world", "Rust"); // "hello Rust"
// 替换指定次数
let new_s = s.replacen("l", "L", 1); // "heLlo world"
==== 分割字符串 ====
let s = "a,b,c,d";
// 按字符分割
let parts: Vec<&str> = s.split(',').collect();
// 按多个字符分割
let s = "a b\tc\nd";
let parts: Vec<&str> = s.split_whitespace().collect();
// 按字符串分割
let s = "hello::world::Rust";
let parts: Vec<&str> = s.split("::").collect();
// 限制分割次数
let s = "a,b,c,d";
let parts: Vec<&str> = s.splitn(2, ',').collect(); // ["a", "b,c,d"]
==== 修剪空白 ====
let s = " hello world ";
// 去除两端
let trimmed = s.trim(); // "hello world"
// 去除开头
let trimmed = s.trim_start();
// 去除结尾
let trimmed = s.trim_end();
// 去除指定字符
let s = "xxxhello worldxxx";
let trimmed = s.trim_matches('x'); // "hello world"
==== 大小写转换 ====
let s = "Hello";
println!("{}", s.to_uppercase()); // "HELLO"
println!("{}", s.to_lowercase()); // "hello"
==== 检查 ====
let s = "hello world";
// 是否以...开头
assert!(s.starts_with("hello"));
// 是否以...结尾
assert!(s.ends_with("world"));
// 是否包含
assert!(s.contains("lo wo"));
// 是否符合模式
assert!(s.matches("l").count() == 3);
===== 9.7 String与&str的转换 =====
==== String转&str ====
let s = String::from("hello");
// &操作符
let slice: &str = &s;
// as_str()
let slice = s.as_str();
// 自动解引用
fn takes_str(s: &str) {}
takes_str(&s); // 自动转换
==== &str转String ====
let slice = "hello";
// to_string()
let s = slice.to_string();
// String::from()
let s = String::from(slice);
// into()
let s: String = slice.into();
===== 9.8 UTF-8处理 =====
==== 获取字符数 ====
let s = "中国人";
// 字节数
println!("字节数:{}", s.len()); // 9
// 字符数(Unicode标量值)
println!("字符数:{}", s.chars().count()); // 3
==== 字符边界检查 ====
fn safe_slice(s: &str, start: usize, end: usize) -> Option<&str> {
if s.is_char_boundary(start) && s.is_char_boundary(end) {
Some(&s[start..end])
} else {
None
}
}
let s = "中国人";
println!("{:?}", safe_slice(s, 0, 3)); // Some("中")
println!("{:?}", safe_slice(s, 0, 2)); // None
===== 9.9 字符串与所有权 =====
==== 迭代时消耗所有权 ====
let s = String::from("hello");
// into_bytes()消耗所有权
let bytes = s.into_bytes();
// s在这里已失效
==== 避免不必要的克隆 ====
// 不好的做法
let s1 = String::from("hello");
let s2 = s1.clone(); // 深拷贝
let slice = &s2[0..2];
// 好的做法
let s1 = String::from("hello");
let slice = &s1[0..2]; // 借用
===== 练习题 =====
==== 练习题9.1:反转字符串 ====
fn reverse(s: &str) -> String {
s.chars().rev().collect()
}
fn main() {
let s = "hello 中国";
println!("原字符串:{}", s);
println!("反转后:{}", reverse(s));
}
==== 练习题9.2:判断回文 ====
fn is_palindrome(s: &str) -> bool {
let s: String = s.chars()
.filter(|c| c.is_alphanumeric())
.map(|c| c.to_lowercase().next().unwrap())
.collect();
s == s.chars().rev().collect::()
}
fn main() {
println!("{}", is_palindrome("A man, a plan, a canal: Panama")); // true
println!("{}", is_palindrome("race a car")); // false
println!("{}", is_palindrome("中国人中国")); // true
}
==== 练习题9.3:字符串统计 ====
fn analyze(s: &str) {
println!("字符串:{}", s);
println!("字节数:{}", s.len());
println!("字符数:{}", s.chars().count());
println!("单词数:{}", s.split_whitespace().count());
println!("行数:{}", s.lines().count());
// 统计各类字符
let mut letters = 0;
let mut digits = 0;
let mut spaces = 0;
let mut others = 0;
for c in s.chars() {
if c.is_alphabetic() {
letters += 1;
} else if c.is_numeric() {
digits += 1;
} else if c.is_whitespace() {
spaces += 1;
} else {
others += 1;
}
}
println!("字母:{},数字:{},空白:{},其他:{}",
letters, digits, spaces, others);
}
fn main() {
let text = "Hello, Rust 2024!\n这是第2行。";
analyze(text);
}
==== 练习题9.4:CSV解析简化版 ====
fn parse_csv_line(line: &str) -> Vec<&str> {
line.split(',').map(|s| s.trim()).collect()
}
fn main() {
let csv = "name, age, city\nAlice, 30, New York\nBob, 25, London";
for line in csv.lines() {
let fields = parse_csv_line(line);
println!("{:?}", fields);
}
}
===== 本章小结 =====
本章学习了Rust的字符串处理:
* **String**:可变的、有所有权的字符串
* **&str**:字符串slice,不可变借用
* **索引限制**:Rust不支持字符串索引,因为UTF-8变长编码
* **遍历方式**:chars()、bytes()、lines()等
* **常用方法**:find、replace、split、trim等
* **UTF-8处理**:注意字符边界,使用is_char_boundary检查
理解String和&str的区别,以及UTF-8编码特性,是Rust字符串处理的关键。