[실습] 애널라이저(Analyzer)가 토큰을 어떻게 나누는 지 확인하는 방법

author

JSCODE 박재성

✅ standard analyzer가 토큰을 어떻게 나누는 지 확인하는 방법

문법 (Analyze API)


// 방법 1 
GET /_analyze
{
  "text": "_________",
  "analyzer": "standard"
}


// 방법 2 (standard analyer의 구성을 직접 명시)
GET /_analyze 
{
  "text": "_________",
  "char_filter": [],
  "tokenizer": "standard",
  "filter": ["lowercase"]
}

실제 적용


// 방법 1 
GET /_analyze
{
  "text": "Apple 2025 맥북 에어 13 M4 10코어",
  "analyzer": "standard"
}

// 방법 2
GET /_analyze
{
  "text": "Apple 2025 맥북 에어 13 M4 10코어",
  "char_filter": [],
  "tokenizer": "standard",
  "filter": ["lowercase"]
}

위 2가지 방식은 완전히 똑같이 작동한다. 그런데 앞으로 실습에서 Analyze의 각 요소에 따라 어떻게 작동하는 지 하나씩 뜯어보기 위해 2번째 방법을 활용해 가지고 놀아볼 것이다.

응답값 분석


{
  "tokens": [
    {
      "token": "apple",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "2025",
      "start_offset": 6,
      "end_offset": 10,
      "type": "<NUM>",
      "position": 1
    },
    {
      "token": "맥북",
      "start_offset": 11,
      "end_offset": 13,
      "type": "<HANGUL>",
      "position": 2
    },
    {
      "token": "에어",
      "start_offset": 14,
      "end_offset": 16,
      "type": "<HANGUL>",
      "position": 3
    },
    {
      "token": "13",
      "start_offset": 17,
      "end_offset": 19,
      "type": "<NUM>",
      "position": 4
    },
    {
      "token": "m4",
      "start_offset": 20,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 5
    },
    {
      "token": "10코어",
      "start_offset": 23,
      "end_offset": 27,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}

토큰으로 분리한 결과를 보면 standard tokenizer(공백 또는 ,, ., !, ?와 같은 문장 부호를 기준으로 문자열을 자름)와 lowercase token filter(소문자로 변환)가 적용된 채로 토큰이 생성된 것을 확인할 수 있다.

👨🏻‍🏫

다음 강의에서는 Analyze의 구성 요소를 이것저것 써보면서 어떤 식으로 토큰을 분리할 수 있는 지 알아보자.

author

JSCODE 박재성

category

Elasticsearch

createdAt

Dec 6, 2025 03:54 AM

isPublic

series

실전에서 바로 써먹는 Elasticsearch 입문 (검색 최적화편)

slug

type

series-footer

updatedAt

📎

이 글은 실전에서 바로 써먹는 Elasticsearch 입문 (검색 최적화편) 강의의 수업 자료 중 일부입니다.

// 방법 1 GET /_analyze { "text": "_________", "analyzer": "standard" } // 방법 2 (standard analyer의 구성을 직접 명시) GET /_analyze { "text": "_________", "char_filter": [], "tokenizer": "standard", "filter": ["lowercase"] }

// 방법 1 GET /_analyze { "text": "Apple 2025 맥북 에어 13 M4 10코어", "analyzer": "standard" } // 방법 2 GET /_analyze { "text": "Apple 2025 맥북 에어 13 M4 10코어", "char_filter": [], "tokenizer": "standard", "filter": ["lowercase"] }

{ "tokens": [ { "token": "apple", "start_offset": 0, "end_offset": 5, "type": "<ALPHANUM>", "position": 0 }, { "token": "2025", "start_offset": 6, "end_offset": 10, "type": "<NUM>", "position": 1 }, { "token": "맥북", "start_offset": 11, "end_offset": 13, "type": "<HANGUL>", "position": 2 }, { "token": "에어", "start_offset": 14, "end_offset": 16, "type": "<HANGUL>", "position": 3 }, { "token": "13", "start_offset": 17, "end_offset": 19, "type": "<NUM>", "position": 4 }, { "token": "m4", "start_offset": 20, "end_offset": 22, "type": "<ALPHANUM>", "position": 5 }, { "token": "10코어", "start_offset": 23, "end_offset": 27, "type": "<ALPHANUM>", "position": 6 } ] }