개발

Sentence Tokenized English

쩌비군 2020. 5. 29. 23:56
728x90

 

영어로 된 문장은 아래와 같이 NLTK 라이브러리를 이용하면 Tokenize 할 수 있다.

 

from nltk.tokenize import word_tokenize

sentence = "this is my iphone(6s)"
print("입력 :"+sentence)

tokens = word_tokenize(sentence)
print("배열로 나눈 결과 :" + str(tokens))

nltkTokenizedSentence = (' '.join(str(e) for e in tokens))
print("공백으로 합친 결과 :" +nltkTokenizedSentence)

print("\n------------------------------------------------\n")

from nltk.tokenize import sent_tokenize
text = "this's a sent tokenize test. this is sent two. is this sent three? sent 4 is cool! Now it's your turn."
print("입력 :"+text)

sent_tokenize_list = sent_tokenize(text)
print("배열로 나눈 결과 :" + str(sent_tokenize_list))