한글(korean)이 제대로 검색되지 않는 현상

author

JSCODE 박재성

✅ 한글(korean)이 제대로 검색되지 않는 현상

한글로 이루어진 데이터가 검색이 잘 되는 지 확인해보자.

인덱스 생성하기


// 기존 인덱스 삭제
DELETE /boards

// 인덱스 생성 + 매핑 정의 + Custom Analyzer 적용
PUT /boards
{
  "settings": {
    "analysis": {
      "analyzer": {
        "boards_content_analyzer": {
          "char_filter": [],
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "stemmer"]
        }
      }
    }
  },
  "mappings": {
	  "properties": {
	    "content": {
	      "type": "text",
	      "analyzer": "boards_content_analyzer"
	    }
	  }
	}
}

// 잘 생성됐는 지 확인
GET /boards

데이터 삽입하기


POST /boards/_doc
{
  "content": "백화점에서 쇼핑을 하다가 친구를 만났다."
}

검색해보기


GET /boards/_search
{
  "query": {
    "match": {
      "content": "백화점"
    }
  }
}

GET /boards/_search
{
  "query": {
    "match": {
      "content": "쇼핑"
    }
  }
}

GET /boards/_search
{
  "query": {
    "match": {
      "content": "친구"
    }
  }
}

위 쿼리로 검색해보면 아무 데이터도 조회되지 않는다. 왜 그런지 Analyze API를 사용해서 분석해보자.

Analyze API 활용해 디버깅하기

영어는 띄어쓰기로 단어가 명확하게 구분되다보니 standard tokenizer(공백 또는 ,, ., !, ?와 같은 문장 부호를 기준으로 자름)로도 잘 나눌 수 있다.


GET /boards/_analyze
{
  "field": "content",
  "text": "I like bananas"
}

응답값


{
  "tokens": [
    {
      "token": "i",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "like",
      "start_offset": 2,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "banana",
      "start_offset": 7,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

하지만 한글은 좀 다르다. 조사(-는, -를), 어미(-다, -해요)를 붙여서 쓰는 말이 많다. 그리고 띄어쓰기가 비교적 자유롭다. 그러다보니 standard tokenizer로 단어를 나눠보면 제대로 잘 나누지 못하는 문제가 생긴다.


POST /boards/_analyze
{
  "field": "content",
  "text": "백화점에서 쇼핑을 하다가 친구를 만났다."
}

응답값


{
  "tokens": [
    {
      "token": "백화점에서",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<HANGUL>",
      "position": 0
    },
    {
      "token": "쇼핑을",
      "start_offset": 6,
      "end_offset": 9,
      "type": "<HANGUL>",
      "position": 1
    },
    {
      "token": "하다가",
      "start_offset": 10,
      "end_offset": 13,
      "type": "<HANGUL>",
      "position": 2
    },
    {
      "token": "친구를",
      "start_offset": 14,
      "end_offset": 17,
      "type": "<HANGUL>",
      "position": 3
    },
    {
      "token": "만났다",
      "start_offset": 18,
      "end_offset": 21,
      "type": "<HANGUL>",
      "position": 4
    }
  ]
}

위와 같이 토큰이 나뉘기 때문에 백화점, 쇼핑, 친구로 검색하더라도 검색이 안 되는 문제가 발생한 것이다. 이 문제를 해결하려면 한글에 맞는 전용 Analyzer인 Nori(노리) Analyzer를 써야 한다.

👨🏻‍🏫

다음 강의에서는 Nori Analyzer를 적용시키는 방법에 대해 알아보자.

author

JSCODE 박재성

category

Elasticsearch

createdAt

Dec 6, 2025 03:54 AM

isPublic

series

실전에서 바로 써먹는 Elasticsearch 입문 (검색 최적화편)

slug

type

series-footer

updatedAt

📎

이 글은 실전에서 바로 써먹는 Elasticsearch 입문 (검색 최적화편) 강의의 수업 자료 중 일부입니다.

// 기존 인덱스 삭제 DELETE /boards // 인덱스 생성 + 매핑 정의 + Custom Analyzer 적용 PUT /boards { "settings": { "analysis": { "analyzer": { "boards_content_analyzer": { "char_filter": [], "tokenizer": "standard", "filter": ["lowercase", "stop", "stemmer"] } } } }, "mappings": { "properties": { "content": { "type": "text", "analyzer": "boards_content_analyzer" } } } } // 잘 생성됐는 지 확인 GET /boards

GET /boards/_search { "query": { "match": { "content": "백화점" } } } GET /boards/_search { "query": { "match": { "content": "쇼핑" } } } GET /boards/_search { "query": { "match": { "content": "친구" } } }

{ "tokens": [ { "token": "i", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "like", "start_offset": 2, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "banana", "start_offset": 7, "end_offset": 14, "type": "<ALPHANUM>", "position": 2 } ] }

{ "tokens": [ { "token": "백화점에서", "start_offset": 0, "end_offset": 5, "type": "<HANGUL>", "position": 0 }, { "token": "쇼핑을", "start_offset": 6, "end_offset": 9, "type": "<HANGUL>", "position": 1 }, { "token": "하다가", "start_offset": 10, "end_offset": 13, "type": "<HANGUL>", "position": 2 }, { "token": "친구를", "start_offset": 14, "end_offset": 17, "type": "<HANGUL>", "position": 3 }, { "token": "만났다", "start_offset": 18, "end_offset": 21, "type": "<HANGUL>", "position": 4 } ] }