† #자연어처리 #언어

이 노트에 대하여

자연어처리는 인간 언어를 컴퓨터가 다룰 수 있는 구조로 바꾸는 작업이다. 말뭉치 구축, 형태 분석, 구문 분석, 개체명 인식 같은 단계가 그 기반을 이룬다. 언어학과 데이터, 도구 생태계가 만나는 넓은 실천 영역이다.

히스토리

[2025-06-12 Thu 12:58] 정리
[2024-05-24 Fri 15:43] 올드노트 섞여 있음 비공개 -> 정리 필요

BIBLIOGRAPHY

nlp nltk python

nlp 는 다 파이썬 뿐이다. 근데 툴도 많다. 뭐가 뭣인가? 라는 고민을 하기 쉽다. 온갖 책 이야기들이 많기 때문이다.

핵심은 소규모와 대규모의 차이다. 간단하게 로컬 툴을 만들 것이라면 nltk 를 사용하면 된다. 아니 그 유사한 구성을 하면 된다. 머신 러닝으로 갈 필요가 없다.

KEYWORDS

Text Corpus

url:: https://en.wikipedia.org/wiki/Text_corpus

In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Annotated, they have been used in corpus linguistics for statistical hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In search technology, a corpus is the collection of documents which is being searched.

언어학 및 자연어 처리에서 말뭉치(원문: 코퍼스) 또는 텍스트 말뭉치는 주석이 있거나 주석이 없는, 기본적으로 디지털화된 오래된 언어 자원으로 구성된 데이터 집합입니다. 주석이 있는 말뭉치는 코퍼스 언어학에서 통계적 가설 테스트, 특정 언어 영역 내에서 발생 빈도 확인 또는 언어 규칙 검증에 사용되었습니다. 검색 기술에서 코퍼스는 검색 중인 문서의 집합을 의미합니다

junghanacs🧠

Table of Contents

Backlinks

† #자연어처리 #언어

히스토리

관련메타

BIBLIOGRAPHY

nlp nltk python

KEYWORDS

Text Corpus

Webmentions

Comments

Explorer

Backlinks