Stanford Word Segmenter currently supports Arabic and Chinese that provided segmentation schemes have been found to work well for a variety of applications the system requires Java 1.8+ to be installed, it recommend at least 1G of memory for documents that contain long sentences. For files with shorter sentences (e.g., 20 tokens), decrease the memory requirement by changing the option java -mx1g in the run scripts.
Stanford Word Segmenter