개발
Sentence Tokenized English
쩌비군
2020. 5. 29. 23:56
728x90
영어로 된 문장은 아래와 같이 NLTK 라이브러리를 이용하면 Tokenize 할 수 있다.
from nltk.tokenize import word_tokenize
sentence = "this is my iphone(6s)"
print("입력 :"+sentence)
tokens = word_tokenize(sentence)
print("배열로 나눈 결과 :" + str(tokens))
nltkTokenizedSentence = (' '.join(str(e) for e in tokens))
print("공백으로 합친 결과 :" +nltkTokenizedSentence)
print("\n------------------------------------------------\n")
from nltk.tokenize import sent_tokenize
text = "this's a sent tokenize test. this is sent two. is this sent three? sent 4 is cool! Now it's your turn."
print("입력 :"+text)
sent_tokenize_list = sent_tokenize(text)
print("배열로 나눈 결과 :" + str(sent_tokenize_list))