Parse the T9 problem (predictive text)
Input data:
The file input.txt, which describes the sequences of numbers that mimic user input:
48 26624637 843 476877 63 5388377 66 3224 74663 539 9484 2 3278 222377 3428466279 63 96737
48 56657 87 46 843 3428466279 255 96737 2677377663464 86 843 73783623 63 5397876537 263 673377 8436 29 373783629 63 873
2 26678837 47 2 776472662253 6224463 8428 73234837 46788 786737 263 62647852837 3282 263 77684337 688788 46 2 873385 367628
2 644486273 47 2 37326 8428 226 22873 2 787664 63428483 366846625 73776673 3766 843 7533737 897422559 3327 67 467767
746784263 47 26 22273842833 79626542 9748464 638463 8428 462732737 77333 67 2738489 63 9748464 27 26672733 86 2 667625 638463 63 9748464 2 52648243
843 7762377 63 9748464 46 746784263 47 225533 78366472749
843 8376 625664 47 622274662559 8733 86 73337 86 2 666 778273 732826453
The dictionary.txt file that contains a list of English words that is used to predict a given word from the sequence above.
Output:
The output.txt file, which contains all the possible suggestions.
Analysis of the problem
The first thing that comes to mind is to use prefix trees… Let’s do it. To do this, at the beginning, we need to determine the map of correspondence between letters and numbers in the keyboard layout of a push-button telephone:
Mapping
export const keyMap: Record<string, number> = {
a: 2,
b: 2,
c: 2,
d: 3,
e: 3,
f: 3,
g: 4,
h: 4,
i: 4,
j: 5,
k: 5,
l: 5,
m: 6,
n: 6,
o: 6,
p: 7,
q: 7,
r: 7,
s: 7,
t: 8,
u: 8,
v: 8,
w: 9,
x: 9,
y: 9,
z: 9,
};
Next, let’s start implementing the prefix tree directly. For these purposes, we will create a Trie class that implements two main methods insert and getPredictions… Here, insertion works by traversing the tree according to the string to be inserted, and then adding new nodes not contained in the tree to the string suffix:
Class code
import { keyMap, rootGenerator } from "./helpers";
import { Root, Prediction } from './interfaces';
class Trie {
root: Root;
constructor() {
this.root = rootGenerator();
}
insert(word: string): void {
if (word.length === 0) throw new Error("A word has to be specified");
let currentNode = this.root;
Array.from(word).forEach((letter, index) => {
const digit = keyMap[letter];
let isLastNode = false;
if (word.length === index + 1) isLastNode = true;
if (!digit) throw new Error("Not a valid digit");
if (!currentNode.children) currentNode.children = {};
if (!currentNode.children[digit]) currentNode.children[digit] = rootGenerator();
currentNode = currentNode.children[digit] as Root;
if (!isLastNode) currentNode.predictions.deep.push(word);
});
currentNode.predictions.current.push(word);
}
getPredictions(keyString: string): Prediction | undefined {
const state = rootGenerator().predictions;
let currentNode = this.root;
let predictions;
Array.from(keyString).forEach((nodeKey) => {
if (!currentNode.children || !currentNode.children[nodeKey]) {
predictions = state;
return;
}
currentNode = currentNode.children[nodeKey] as Root;
predictions = currentNode.predictions;
});
return predictions;
}
}
export default Trie;
Now let’s implement a class for filling the tree and parsing the dictionary. There are no unique moments here, so no comments.
Parsing
import { readFileSync } from "fs";
import Trie from "./trie";
import { rootGenerator } from "./helpers";
import { Prediction } from "./interfaces";
class TrieParser {
trie: Trie;
filePath: string;
constructor(filePath: string) {
this.trie = new Trie();
this.filePath = filePath;
}
createTrie(words?: string[]): void {
const wordArray = words || this.parseDictionary();
// Creating trie from words array
wordArray.forEach((word) => {
this.trie.insert(word);
});
}
parseDictionary(): string[] {
const filePath = this.filePath;
const data = readFileSync(filePath, {
encoding: "utf8",
});
const regex = /r?n/;
const array = data.toString().split(regex);
return array;
}
getPredictions(keyString: string): Prediction | undefined {
if (!keyString) return rootGenerator().predictions;
return this.trie.getPredictions(keyString);
}
}
export default TrieParser;
The matter is small – to decide how you can get lists of all available combinations. The classic approach to generating such lists is – heap algorithm, however, in our case, it does not fit, because we want to get only unique options, and therefore the most suitable option is cartesian product, the implementation of which will look like this:
export function cartesian<T>(...words: T[][]): T[][] {
// iteratively get the cartesian product
return words.reduce<T[][]>(
// part - cartesian product of the first few arrays from words
(part, array) =>
part.flatMap((cartesianPart) =>
//cartesianPart is an array-prefix of one of the elements of the cartesian product
array.map((element) => [...cartesianPart, element])
),
[[]]
);
}
This is where the hardest part ends, and all that remains is to call the implemented methods. The code with comments is below:
Final listing
import { readFileSync, createWriteStream, unlinkSync, existsSync } from "fs";
import { cartesian } from "./utils/cartesian";
import TrieParser from "./trie/trie-parser.js";
const dictionaryFilePath = "src/data/dictionary.txt";
const combinationsFilePath = "src/data/combinations.txt";
const outputFilePath = "src/data/output.txt";
const trieDictionaryInstance = new TrieParser(dictionaryFilePath);
trieDictionaryInstance.createTrie();
export function main() {
try {
if (existsSync(outputFilePath)) unlinkSync(outputFilePath);
// create file stream
const output = createWriteStream(outputFilePath, {
flags: "a",
});
const rawData = readFileSync(combinationsFilePath, { encoding: "utf-8" })
.toString()
.split(/r?n/);
for (const rawCombination of rawData) {
const combinations = rawCombination.split(" ");
const predictionArray: string[][] = combinations.map((combination) => {
// Get predictions for each combination in the string
const result = trieDictionaryInstance.getPredictions(combination);
const targetArray = result?.current.flatMap((item) =>
Array.isArray(item) ? item : [item]
);
return targetArray;
}) as string[][];
// Get cartesian products
const possibleProducts = (
[...cartesian(...(predictionArray as [string[]]))] as string[][]
).map((permutation) => permutation.join(" "));
for (const permutation of possibleProducts) {
output.write(permutation + "r");
}
}
output.end();
} catch (e) {
console.error(e);
}
}
main();
The task code is available at github… Thank you for your attention!