From 78e7b4c607d8fc3b35f90ed88614093bda437195 Mon Sep 17 00:00:00 2001 From: Teddy Wing Date: Wed, 27 Apr 2016 05:50:40 -0400 Subject: Read aliases file as bytes and convert to string Discovered that my Mutt aliases file uses the latin1 character encoding. That caused a "stream did not contain valid UTF-8" error when trying to read the file in the `Alias#find_in_file` function. This error was ostensibly triggered by a `str::from_utf8` call in the standard library (https://github.com/rust-lang/rust/blob/2174bd97c1458d89a87eb2b614135d7ad68d6f18/src/libstd/io/mod.rs#L315-L338). I ended up finding this Stack Overflow answer with an easy solution: http://stackoverflow.com/questions/28169745/what-are-the-options-to-convert-iso-8859-1-latin-1-to-a-string-utf-8/28175593#28175593 fn latin1_to_string(s: &[u8]) -> String { s.iter().map(|c| c as char).collect() } Since latin1 is a subset of Unicode, we can just read the bytes from the file and typecast them to Rust chars (which are UTF-8). That gives us the opportunity to easily get the text into an encoding that we can actually work with in Rust. At first I got frustrated because the suggestion didn't compile for me. It was suggested in January 2015, before Rust 1.0, so perhaps that factors into the error I was getting. Here it is: src/alias.rs:59:41: 59:45 error: mismatched types: expected `&[u8]`, found `core::result::Result` (expected &-ptr, found enum `core::result::Result`) [E0308] src/alias.rs:59 let line = latin1_to_string(line); ^~~~ src/alias.rs:59:41: 59:45 help: run `rustc --explain E0308` to see a detailed explanation src/alias.rs:99:22: 99:31 error: only `u8` can be cast as `char`, not `&u8` src/alias.rs:99 s.iter().map(|c| c as char).collect() ^~~~~~~~~ error: aborting due to 2 previous errors A recommendation from 'niconii' Mozilla#rust-beginners was to use the Encoding library in order to do the conversion (https://github.com/lifthrasiir/rust-encoding). That certainly seems more robust and would be a good idea to try if this change doesn't work out in the long term. But the Stack Overflow answer just seemed so short and sweet that I really didn't like the idea of adding a dependency if I could get what I wanted with 3 lines of code. Finally took another look and reworked the suggested code to take a vector (which is what `BufReader#split` gives us) and clone the u8 characters to clear the compiler error of not being able to cast an &u8. --- src/alias.rs | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/alias.rs b/src/alias.rs index 0fd05d2..2e7da81 100644 --- a/src/alias.rs +++ b/src/alias.rs @@ -54,8 +54,9 @@ impl Alias { let mut matches = Vec::new(); let f = try!(File::open(file)); let file = BufReader::new(&f); - for line in file.lines() { + for line in file.split(b'\n') { let line = try!(line); + let line = latin1_to_string(line); let split: Vec<&str> = line.split_whitespace().collect(); if line.contains(&self.email) { @@ -94,6 +95,10 @@ impl Alias { } } +fn latin1_to_string(s: Vec) -> String { + s.iter().map(|c| c.clone() as char).collect() +} + #[derive(Debug)] pub enum AliasSearchError { NotFound, -- cgit v1.2.3