The most challenging and interesting aspect in designing an English to Assyrian dictionary is in ensuring data integrity. What does that even mean? Here’s a straightforward example… When you search for something like the color “red”, you expect definitions to appear in a ranked order matching your intent. You expect to see results (Assyrian words) whose definitions are the color “red” and then a ranked list in descending order of related terms like “reddish” or “red rose”. You don’t expect to see words like “shREDded” which contain the letters “red” but don’t actually match your intent.
Here’s a glimpse into how this is made possible.
This is made possible thanks to String similar algorithms, lucene search indexing for scoring, MongoDB text search index, and basic String heuristics to ‘guess’ at the most important words in a definition sentence.
Another useful feature is the ability to suggest related terms. Maybe the user mistyped what they were looking for. Mongodb has a really nice text search feature that makes this possible