libthai
0.1.28
|
Thai word segmentation. More...
Functions | |
ThBrk * | th_brk_new (const char *dictpath) |
Create a dictionary-based word breaker. More... | |
void | th_brk_delete (ThBrk *brk) |
Delete a word breaker. More... | |
int | th_brk_find_breaks (ThBrk *brk, const thchar_t *s, int pos[], size_t pos_sz) |
Find word break positions in Thai string. More... | |
int | th_brk_insert_breaks (ThBrk *brk, const thchar_t *in, thchar_t *out, size_t out_sz, const char *delim) |
Insert word delimitors in given string. More... | |
int | th_brk (const thchar_t *s, int pos[], size_t pos_sz) |
Find word break positions in Thai string. More... | |
int | th_brk_line (const thchar_t *in, thchar_t *out, size_t out_sz, const char *delim) |
Insert word delimitors in given string. More... | |
Thai word segmentation.
int th_brk | ( | const thchar_t * | s, |
int | pos[], | ||
size_t | pos_sz | ||
) |
Find word break positions in Thai string.
s | : the input string to be processed |
pos | : array to keep breaking positions |
pos_sz | : size of pos[] |
Finds word break positions in Thai string s and stores at most n breaking positions in pos[], from left to right. Uses the shared word breaker.
(This function is deprecated since version 0.1.25, in favor of th_brk_find_breaks(), which is more thread-safe.)
void th_brk_delete | ( | ThBrk * | brk | ) |
Delete a word breaker.
brk | : the word breaker |
Frees memory associated with the word breaker.
(Available since version 0.1.25, libthai.so.0.3.0)
int th_brk_find_breaks | ( | ThBrk * | brk, |
const thchar_t * | s, | ||
int | pos[], | ||
size_t | pos_sz | ||
) |
Find word break positions in Thai string.
brk | : the word breaker |
s | : the input string to be processed |
pos | : array to keep breaking positions |
pos_sz | : size of pos[] |
Finds word break positions in Thai string s and stores at most pos_sz breaking positions in pos[], from left to right.
(Available since version 0.1.25, libthai.so.0.3.0)
int th_brk_insert_breaks | ( | ThBrk * | brk, |
const thchar_t * | in, | ||
thchar_t * | out, | ||
size_t | out_sz, | ||
const char * | delim | ||
) |
Insert word delimitors in given string.
brk | : the word breaker |
in | : the input string to be processed |
out | : the output buffer |
out_sz | : the size of out |
delim | : the word delimitor to insert |
Analyzes the input string and store the string in output buffer with the given word delimitor inserted at every word boundary.
(Available since version 0.1.25, libthai.so.0.3.0)
Insert word delimitors in given string.
in | : the input string to be processed |
out | : the output buffer |
out_sz | : the size of out |
delim | : the word delimitor to insert |
Analyzes the input string and store the string in output buffer with the given word delimitor inserted at every word boundary. Uses the shared word breaker.
(This function is deprecated since version 0.1.25, in favor of th_brk_insert_breaks(), which is more thread-safe.)
ThBrk* th_brk_new | ( | const char * | dictpath | ) |
Create a dictionary-based word breaker.
dictpath | : the dictionary path, or NULL for default |
Loads the dictionary from the given file and returns the created word breaker. If dictpath is NULL, first searches in the directory given by the LIBTHAI_DICTDIR environment variable, then in the library installation directory. Returns NULL if the dictionary file is not found or cannot be loaded.
The returned ThBrk object should be destroyed after use using th_brk_delete().
In multi-thread environments, th_brk_new() and th_brk_delete() should be used to create and destroy a word breaker instance inside critical sections (i.e. with mutex). And the word breaker methods can then be safely called in parallel during its lifetime.
(Available since version 0.1.25, libthai.so.0.3.0)