Решение на Spell Checker от Мирослав Лалев
Резултати
- 9 точки от тестове
- 0 бонус точки
- 9 точки общо
- 7 успешни тест(а)
- 8 неуспешни тест(а)
Код
Лог от изпълнението
Compiling solution v0.1.0 (/tmp/d20200114-2173579-1l4svpz/solution) Finished test [unoptimized + debuginfo] target(s) in 7.06s Running target/debug/deps/solution-a73e64ec87929bd0 running 7 tests test tests::test_cleanline ... ok test tests::test_spellcheck ... ok test tests::test_spellcheck_edits1 ... ok test tests::test_spellcheck_edits2 ... ok test tests::test_spellcheck_unicode ... ok test tests::test_spellchek_probability ... ok test tests::test_wordcount ... ok test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/solution_test-38971695424b36d5 running 15 tests test solution_test::test_best_word_is_returned ... FAILED test solution_test::test_clean_line_removes_punctuation ... ok test solution_test::test_clean_line_trims_the_input ... ok test solution_test::test_correction ... FAILED test solution_test::test_correction_fails_to_produce_new_result ... FAILED test solution_test::test_correction_normalizes_case ... FAILED test solution_test::test_counting ... FAILED test solution_test::test_display ... FAILED test solution_test::test_edits1 ... FAILED test solution_test::test_edits2 ... FAILED test solution_test::test_empty_counter ... ok test solution_test::test_from_empty_str ... ok test solution_test::test_from_str ... ok test solution_test::test_known_words ... ok test solution_test::test_probability ... ok failures: ---- solution_test::test_best_word_is_returned stdout ---- thread 'main' panicked at 'assertion failed: `(left == right)` left: `"pawns"`, right: `"own"`', tests/solution_test.rs:220:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. ---- solution_test::test_correction stdout ---- thread 'main' panicked at 'assertion failed: `(left == right)` left: `"либоф"`, right: `"любов"`', tests/solution_test.rs:177:5 ---- solution_test::test_correction_fails_to_produce_new_result stdout ---- thread 'main' panicked at 'assertion failed: `(left == right)` left: `"либоф"`, right: `"либофф"`', tests/solution_test.rs:206:5 ---- solution_test::test_correction_normalizes_case stdout ---- thread 'main' panicked at 'assertion failed: `(left == right)` left: `"либоф"`, right: `"любов"`', tests/solution_test.rs:188:5 ---- solution_test::test_counting stdout ---- thread 'main' panicked at 'assertion failed: `(left == right)` left: `1`, right: `2`', tests/solution_test.rs:36:5 ---- solution_test::test_display stdout ---- thread 'main' panicked at 'assertion failed: `(left == right)` left: `"WordCounter, total count: 2\none: 1\ntwo: 1\n"`, right: `"WordCounter, total count: 3\ntwo: 2\none: 1\n"`', tests/solution_test.rs:52:5 ---- solution_test::test_edits1 stdout ---- thread 'main' panicked at 'assertion failed: edits.contains("тли")', tests/solution_test.rs:123:5 ---- solution_test::test_edits2 stdout ---- thread 'main' panicked at 'assertion failed: !edits.contains("з")', tests/solution_test.rs:141:5 failures: solution_test::test_best_word_is_returned solution_test::test_correction solution_test::test_correction_fails_to_produce_new_result solution_test::test_correction_normalizes_case solution_test::test_counting solution_test::test_display solution_test::test_edits1 solution_test::test_edits2 test result: FAILED. 7 passed; 8 failed; 0 ignored; 0 measured; 0 filtered out error: test failed, to rerun pass '--test solution_test'
История (2 версии и 1 коментар)
Мирослав качи решение на 13.01.2020 22:20 (преди над 5 години)
use std::collections::{HashMap, HashSet};
pub fn clean_line(input: &str) -> String {
input
.trim()
.chars()
.into_iter()
.filter(|&c| c.is_alphabetic() || c.is_whitespace() || c == '-' || c == '\'')
.collect()
}
#[derive(Debug, PartialEq)]
pub struct WordCounter {
words: HashMap<String, u32>,
}
impl WordCounter {
pub fn new() -> Self {
WordCounter {
words: HashMap::new(),
}
}
pub fn from_str(input: &str) -> Self {
let all_words = input
.lines()
.into_iter()
.map(clean_line)
.map(|line| {
line.split_whitespace()
.map(|c| String::from(c).to_lowercase())
.collect::<Vec<_>>()
})
.flatten()
.collect::<Vec<String>>();
let mut wc = WordCounter::new();
for word in all_words {
wc.add(word.as_ref());
}
wc
}
pub fn words(&self) -> Vec<&String> {
let mut res = self.words.keys().into_iter().collect::<Vec<_>>();
res.sort();
res
}
pub fn add(&mut self, item: &str) {
let key = String::from(item).trim().to_lowercase();
self.words
.insert(key, self.words.get(item).map_or(0, |&c| c) + 1);
}
pub fn get(&self, word: &str) -> u32 {
self.words.get(word).map_or(0, |&c| c)
}
pub fn total_count(&self) -> u32 {
self.words.values().sum()
}
}
impl std::fmt::Display for WordCounter {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
let mut res = vec![format!("WordCounter, total count: {}", self.words.len())];
let mut w = self.words.iter().collect::<Vec<_>>();
w.sort_by(|(w1, c1), (w2, c2)| c1.cmp(c2).reverse().then(w1.cmp(w2)));
res.extend(w.iter().map(|(word, count)| format!("{}: {}", word, count)));
let mut s = res.join("\n");
s.push('\n');
f.write_str(s.as_ref())
}
}
pub const ALPHABET_EN: &'static str = "abcdefghijklmnopqrstuvwxyz";
pub const ALPHABET_BG: &'static str = "абвгдежзийклмнопрстуфхцчшщъьюя";
pub struct SpellChecker<'a> {
wc: WordCounter,
alphabet: &'a str,
}
impl<'a> SpellChecker<'a> {
pub fn new(corpus: &str, alphabet: &'a str) -> Self {
SpellChecker {
wc: WordCounter::from_str(corpus),
alphabet,
}
}
pub fn correction(&self, word: &str) -> String {
let word_str = word.trim().to_lowercase();
let word = word_str.as_ref();
self.candidates(word)
.into_iter()
.max_by(|cand1, cand2| {
self.probability(cand1.as_ref())
.partial_cmp(&self.probability(cand2.as_ref()))
.unwrap()
})
.unwrap_or_else(|| word_str)
}
fn in_alphabet(&self, word: &str) -> bool {
word.chars().all(|c| self.alphabet.contains(c))
}
pub fn probability(&self, word: &str) -> f64 {
self.wc.get(word) as f64 / self.wc.total_count() as f64
}
pub fn known<'b>(&self, words: &'b HashSet<String>) -> Vec<&'b String> {
words.iter().filter(|&w| self.wc.get(w) != 0).collect()
}
pub fn candidates(&self, word: &str) -> Vec<String> {
if self.wc.get(word) > 0 {
return vec![String::from(word)];
}
if !self.in_alphabet(word) {
return vec![String::from(word)];
}
let e1 = self.edits1(word);
let k1 = self.known(&e1);
if k1.len() > 0 {
return k1.iter().map(|&w| String::from(w)).collect();
}
let e2 = self.edits2(word);
let k2 = self.known(&e2);
if k2.len() > 0 {
return k2.iter().map(|&w| String::from(w)).collect();
}
vec![String::from(word)]
}
pub fn edits1(&self, word: &str) -> HashSet<String> {
let mut res = HashSet::new();
- for i in 0..=word.len() {
- let (left, right) = split_str(word, i);
- let (r_first, r_rem) = split_str(right, 1);
- let (r_second, r_rem_rem) = split_str(r_rem, 1);
- res.insert(format!("{}{}", left, r_rem));
- res.insert(format!("{}{}{}{}", left, r_second, r_first, r_rem_rem));
+ let mut total_size = 0;
+ loop {
+ let (left, right) = word.split_at(total_size);
+ let mut r_chars = right.chars();
+ let r_first = to_str(r_chars.next());
+ let r_second = to_str(r_chars.next());
+ let r_rem = r_chars.collect::<String>();
+
+ res.insert(format!("{}{}{}", left, r_second, r_rem));
+ res.insert(format!("{}{}{}{}", left, r_second, r_first, r_rem));
+
for letter in self.alphabet.chars() {
res.insert(format!("{}{}{}", left, letter, r_rem));
Refactoring gone wrong :D
Трябваше да е format!("{}{}{}{}", left, letter, r_second, r_rem)
res.insert(format!("{}{}{}", left, letter, right));
}
+
+ if total_size == word.len() {
+ break;
+ }
+ total_size += r_first.len();
}
res
}
pub fn edits2(&self, word: &str) -> HashSet<String> {
let mut res = HashSet::new();
for edited in self.edits1(word) {
res.extend(self.edits1(edited.as_ref()));
}
res
}
}
-fn split_str(s: &str, at: usize) -> (&str, &str) {
- let left = s.get(..at).map_or("", |s| s);
- let right = s.get(at..).map_or("", |s| s);
- (left, right)
+fn to_str(oc: Option<char>) -> String {
+ match oc {
+ Some(c) => c.to_string(),
+ None => String::from(""),
+ }
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_cleanline() {
assert_eq!(clean_line("\t^abcd, e-f g'h"), String::from("abcd e-f g'h"));
assert_eq!(clean_line("\t 天花乱坠"), String::from("天花乱坠"));
}
#[test]
fn test_wordcount() {
let mut wc = WordCounter::from_str("\tLL kk, JJ ");
assert_eq!(
wc.words(),
vec![
&String::from("jj"),
&String::from("kk"),
&String::from("ll")
]
);
wc.add(" aa \t");
wc.add("\t\tUU");
wc.add("ll");
assert_eq!(
wc.words(),
vec![
&String::from("aa"),
&String::from("jj"),
&String::from("kk"),
&String::from("ll"),
&String::from("uu"),
]
);
assert_eq!(wc.get("ll"), 2);
assert_eq!(wc.get("aa"), 1);
assert_eq!(wc.total_count(), 6);
}
#[test]
fn test_spellcheck() {
let spellchecker = SpellChecker::new("and and an apples at", ALPHABET_EN);
let mut cand = spellchecker.candidates("ad");
cand.sort();
assert_eq!(cand, vec!["an", "and", "at"]);
assert_eq!(spellchecker.correction("ad"), String::from("and"));
}
#[test]
fn test_spellcheck_unicode() {
let spellchecker = SpellChecker::new("ден два дни дай дек дело", ALPHABET_BG);
let mut cand = spellchecker.candidates("д");
cand.sort();
assert_eq!(cand, vec!["дай", "два", "дек", "ден", "дни"]);
let mut cand = spellchecker.candidates("де");
cand.sort();
assert_eq!(cand, vec!["дек", "ден"]);
}
#[test]
fn test_spellcheck_edits1() {
let spellchecker = SpellChecker::new("ab bc ca", "abc");
assert_eq!(spellchecker.edits1("a"), {
let mut res = HashSet::new();
res.extend(vec![
String::from(""),
String::from("aa"),
String::from("ba"),
String::from("ca"),
String::from("a"),
String::from("b"),
String::from("c"),
String::from("ab"),
String::from("ac"),
]);
res
})
}
#[test]
fn test_spellcheck_edits2() {
let spellchecker = SpellChecker::new("ab bc ca", "abc");
assert_eq!(spellchecker.edits2("a"), {
let mut res = HashSet::new();
let edits11 = vec![
String::from(""),
String::from("aa"),
String::from("ba"),
String::from("ca"),
String::from("a"),
String::from("b"),
String::from("c"),
String::from("ab"),
String::from("ac"),
]
.into_iter()
.map(|e1| spellchecker.edits1(e1.as_ref()))
.collect::<Vec<_>>();
for edit in edits11 {
res.extend(edit);
}
res
})
}
#[test]
fn test_spellchek_probability() {
let spellchecker = SpellChecker::new("hello hello the of", ALPHABET_EN);
assert_eq!(spellchecker.probability("hello"), 0.5);
assert_eq!(spellchecker.probability("the"), 0.25);
assert_eq!(spellchecker.probability("and"), 0.0);
}
}
Refactoring gone wrong :D
Трябваше да е
format!("{}{}{}{}", left, letter, r_second, r_rem)