Jieba is the first open source program written in Java [4]. When breaking words, it will first establish a trie, prefix each word in the sentence, find out all the extended words, and then generate a directed acyclic graph for the sentence, DAG, according to the occurrence frequency of words in dictionaries, uses dynamic programming method to find the maximum segmentation combination, which belongs to the full segmentation method, and uses hidden Markov model to judge unknown words, and calculates the probability of unknown data from other observable data. That is to say, the sentence structure can be inferred through the sequence state of words, and the arrangement and combination of each structure can be compared Find out the most suitable word break position, find out the best word break structure and unknown word.<br>
正在翻譯中..
