Spring Boot + MongoDB 대용량 트래픽 처리: Bulk Insert 성능 최적화 (ft. bson4jackson)

Spring Data MongoDB는 복잡한 드라이버 코드를 작성하지 않아도 처리할 수 있는 추상화를 제공함으로써 개발자에게 편리함을 준다. 하지만 서비스가 성장하고 트래픽이 몰리기 시작하거나, 혹은 수십만 건의 데이터를 배치로 처리해야 하는 경우 이러한 편리함은 성능적인 병목을 가져오기도 한다. 이번 포스팅에서는 대량의 Bulk Insert시에 bson4jackson 라이브러리와 RawBsonDocument를 활용하여 성능을 끌어올리는 방법에 대해서 정리하고자 한다.

Spring Data MongoDB 데이터 변환 방식

흔히들 Spring Data MongoDB가 Java 객체를 JSON 문자열로 변환한 뒤에 DB에 넣는다고 오해하곤 한다. 나도 얼마전까지그렇게 생각하고 있었다. 하지만 팩트는 다음과 같다.

Spring Data의 MappingMongoConverter는 다음과 같은 과정을 거친다.

Java POJO (DTO 객체)
Document (java Map 구조의 org.bson.Document)
BSON (Binary JSON, Driver가 인코딩 한다.)

텍스트 기반의 JSON 변환은 없지만 Document라는 중간 객체를 생성하는 과정은 존재한다. 문제는 대량의 데이터를 처리할 때 이 중간 객체 생성 비용과 GC 부하가 생각보다 엄청나다는 점이다.
Document 하나를 저장할 때 생성되는 객체를 간단히 계산해 보자.

Document 객체 1개
Map 내부 필드 객체들 (필드 수만큼 N개)
Key 객체: 필드명 “name”, “price” … 과 같은 String 객체 (N개)
Value 객체: 값 “product-1”, 100.0 .. 각각 객체 (N개)

Document 하나당 최소 2N + 1개 이상의 객체가 생성된다. (ex. 필드가 10개면 20 ~30개의 자잘한 객체 생성)
Value 객체의 값이 단일 값이 아닌 클래스라고 하면 더 많이 늘어날 것이다.
이러한 Document를 10만개 bulk로 저장한다고 생각해보자. 생성되는 객체의 수가 우리가 생각했던 것 이상으로 상당히 많이 생성되고 GC가 일어날 때 그만큼 부하도 발생할 것이다. 이는 곧 성능 저하로 이어진다.

병목의 원인: 무거운 중간 다리

일반적인 트래픽에서는 문제가 없겠지만 위에서 살펴본 바와 같이 10만건, 100만건을 Bulk Insert 하는 상황이라면 얘기가 달라진다.

DOM 방식의 한계: Spring Data의 방식은 XML의 DOM 파서처럼 전체 객체 구조를 메모리에 Map 형태로 펼쳐 놓는다.
CPU & 메모리 낭비: 단순히 데이터를 DB로 옮기고 싶은데 중간에 거대한 객체들을 힙 메모리에 만들었다 부수는 작업을 반복한다.

우리가 원하는 것은 POJO를 바로 바이너리(BSON)으로 직렬화하여 DB에 넣는 것이다.

해결책: Jackson과 RawBsonDocument의 만남 (ft. bson4jackson)

이 문제를 해결하기 위해 Jackson 라이브러리의 스트리밍 직렬화와 MongoDB 드라이버의 RawBsonDocument를 조합하는 전략을 사용하여 중간에 Document로 변환 하는 과정을 생략할 수 있다.

Spring Data Converter 우회: 무거운 Spring Converter 대신 bson4jackson 라이브러리와 함께 속도가 빠른 Jackson을 사용한다.
RawBsonDocument: 이미 바이트로 변환된 데이터를 감싸는 래퍼 클래스다. 드라이버는 이 객체를 받으면 재해석(Parsing) 없이 그대로 전송한다.

우리가 저장하고자 하는 POJO 객체 RawBsonDocument로 변환하는 코드는 다음과 같다.

public RawBsonDocument toRawBsonDocument( Product product ) throws IOException {
  ObjectMapper bsonObjectMapper = new ObjectMapper( new BsonFactory() );
  byte[] bytes = bsonObjectMapper.writeValueAsBytes( product );
  return new RawBsonDocument( bytes );
}

public RawBsonDocument toRawBsonDocument( Product product ) throws IOException {
  ObjectMapper bsonObjectMapper = new ObjectMapper( new BsonFactory() );
  byte[] bytes = bsonObjectMapper.writeValueAsBytes( product );
  return new RawBsonDocument( bytes );
}

Java

ObjectMapper 생성자에 BsonFactory 인스턴스를 전달하면 데이터를 일반적인 텍스트 기반의 JSON이 아닌 BSON(Binary JSON) 형식으로 직렬화/역직렬화 한다. Jackson은 추상화가 잘 되어 있어서 내부의 JsonFactory를 교체하는 것만으로 데이터 포멧을 변경할 수 있다. BsonFactory를 주입함으로써 이 Mapper는 이제 텍스트가 아닌 BSON 바이너리 데이터를 생성하고 읽어들이게 되는 것이다.

위 코드는 Product POJO 객체를 BSON bytes로 직렬화하여 바로 RawBsonDocument에 저장하는 것이다.
이런 방식으로 native 하게 변환을 수행하면 Spring Data MongoDB 처리 플로우가 중간에서 Document로 변환하는 처리 없이 바로 bson 데이터로 직렬화할 수 있다.
BsonFactory 클래스는 다음의 의존성이 필요하다.

implementation 'de.undercouch:bson4jackson:2.18.0'

implementation 'de.undercouch:bson4jackson:2.18.0'

Groovy

bson4jackson 라이브러리고 포스팅을 작성하고 있는 현재 시점의 최신 버전은 2.18.0이다.
Bulk로 대량의 POJO 객체를 한번에 insert할 때 코드는 다음과 같다.

public void bulkInsert( List<Product> products ) {
    ObjectMapper bsonObjectMapper = new ObjectMapper( new BsonFactory() );
    
    // POJO -> RawBsonDocument 변환
    List<RawBsonDocument> rawDocs = products.stream()
            .map( product -> {
                byte[] bytes;
                try {
                    bytes = bsonObjectMapper.writeValueAsBytes( product );
                }
                catch ( JsonProcessingException e ) {
                    throw new RuntimeException( e );
                }
                return new RawBsonDocument( bytes );
            } )
            .toList();

    // Native Collection 꺼내서 InsertMany
    MongoCollection<RawBsonDocument> mongoCollection =
            mongoTemplate.getCollection( mongoTemplate.getCollectionName( Product.class ) )
                    .withDocumentClass( RawBsonDocument.class );

    mongoCollection.insertMany( rawDocs );
}

public void bulkInsert( List<Product> products ) {
    ObjectMapper bsonObjectMapper = new ObjectMapper( new BsonFactory() );
    
    // POJO -> RawBsonDocument 변환
    List<RawBsonDocument> rawDocs = products.stream()
            .map( product -> {
                byte[] bytes;
                try {
                    bytes = bsonObjectMapper.writeValueAsBytes( product );
                }
                catch ( JsonProcessingException e ) {
                    throw new RuntimeException( e );
                }
                return new RawBsonDocument( bytes );
            } )
            .toList();

    // Native Collection 꺼내서 InsertMany
    MongoCollection<RawBsonDocument> mongoCollection =
            mongoTemplate.getCollection( mongoTemplate.getCollectionName( Product.class ) )
                    .withDocumentClass( RawBsonDocument.class );

    mongoCollection.insertMany( rawDocs );
}

Java

Spring Data MongoDB의 Repository에도 Custom Repository를 정의하여 사용할 수 있다.

public interface ProductRepositoryCustom {
  void bulkInsert( List<Product> products );
}

public interface ProductRepositoryCustom {
  void bulkInsert( List<Product> products );
}

Java

@RequiredArgsConstructor
public class ProductRepositoryCustomImpl implements ProductRepositoryCustom {
  private final MongoTemplate mongoTemplate;
  // configuration 클래스에서 ObjectMapper bsonObjectMapper = new ObjectMapper(new BsonFactory())
  // 로 생성된 Bean 이다.
  private final ObjectMapper bsonObjectMapper;

  @Override
  public void bulkInsert( List<Product> products ) {
      // POJO -> RawBsonDocument 변환
      List<RawBsonDocument> rawDocs = products.stream()
              .map( product -> {
                  byte[] bytes;
                  try {
                      bytes = bsonObjectMapper.writeValueAsBytes( product );
                  }
                  catch ( JsonProcessingException e ) {
                      throw new RuntimeException( e );
                  }
                  return new RawBsonDocument( bytes );
              } )
              .toList();

      // Native Collection 꺼내서 InsertMany
      MongoCollection<RawBsonDocument> mongoCollection =
              mongoTemplate.getCollection( mongoTemplate.getCollectionName( Product.class ) )
                      .withDocumentClass( RawBsonDocument.class );

      mongoCollection.insertMany( rawDocs );
  }
}

@RequiredArgsConstructor
public class ProductRepositoryCustomImpl implements ProductRepositoryCustom {
  private final MongoTemplate mongoTemplate;
  // configuration 클래스에서 ObjectMapper bsonObjectMapper = new ObjectMapper(new BsonFactory())
  // 로 생성된 Bean 이다.
  private final ObjectMapper bsonObjectMapper;

  @Override
  public void bulkInsert( List<Product> products ) {
      // POJO -> RawBsonDocument 변환
      List<RawBsonDocument> rawDocs = products.stream()
              .map( product -> {
                  byte[] bytes;
                  try {
                      bytes = bsonObjectMapper.writeValueAsBytes( product );
                  }
                  catch ( JsonProcessingException e ) {
                      throw new RuntimeException( e );
                  }
                  return new RawBsonDocument( bytes );
              } )
              .toList();

      // Native Collection 꺼내서 InsertMany
      MongoCollection<RawBsonDocument> mongoCollection =
              mongoTemplate.getCollection( mongoTemplate.getCollectionName( Product.class ) )
                      .withDocumentClass( RawBsonDocument.class );

      mongoCollection.insertMany( rawDocs );
  }
}

Java

// productRepository.bulkInsert(...) 로 사용 가능하다.
public interface ProductRepository extends MongoRepository<Product, String>, ProductRepositoryCustom {
}

// productRepository.bulkInsert(...) 로 사용 가능하다.
public interface ProductRepository extends MongoRepository<Product, String>, ProductRepositoryCustom {
}

Java

성능 비교

bson4jackson 라이브러리와 RawBsonDocument를 사용하여 중간 변환 과정없이 바로 BSON으로 변환하는 것은 성능적인 이점도 있지만 메모리 절약, GC 부하 감소와 같은 효과들이 있다.
정말로 성능적인 이점이 있는지 10만건의 데이터를 저장하는 4가지 케이스의 테스트 코드를 통해서 확인해 보자.

10만번 Looping 하면서 Repository save 호출
Repository saveAll 호출 (Spring Data 기본 Bulk 저장 모드)
MongoTemplate의 bulkOps 호출 (Spring Data 최적화 기능)
bson4jackson + RawBsonDocument 조합 (Native 호출)

MongoDB를 테스트 코드에 사용하기 위해서 별도로 mongodb를 설치할 필요 없이 Spring Boot Docker Compose Support를 사용하면 편리하게 환경을 구성할 수 있다. 아래 포스팅을 참고하면 도움이 될 것 같다.
spring boot docker compose support 로컬에서 인프라 올려서 테스트 하기 (spring boot 3.1)

테스트에 사용할 엔티티 클래스는 다음과 같다.

@Document( collection = "products" )
@NoArgsConstructor
@Getter
@Setter
public class Product {
    @Id
    private String id;
    private String name;
    private String category;
    private double price;
    private Map<String, String> attributes;
    private LocalDateTime createdAt;

    // 생성자, Getter, Setter 생략 (Lombok @Data 사용 권장)
    public Product(String name, double price) {
        this.name = name;
        this.price = price;
        this.category = "Test-Category";
        this.attributes = Map.of("color", "red", "size", "L");
        this.createdAt = LocalDateTime.now();
    }
}

@Document( collection = "products" )
@NoArgsConstructor
@Getter
@Setter
public class Product {
    @Id
    private String id;
    private String name;
    private String category;
    private double price;
    private Map<String, String> attributes;
    private LocalDateTime createdAt;

    // 생성자, Getter, Setter 생략 (Lombok @Data 사용 권장)
    public Product(String name, double price) {
        this.name = name;
        this.price = price;
        this.category = "Test-Category";
        this.attributes = Map.of("color", "red", "size", "L");
        this.createdAt = LocalDateTime.now();
    }
}

Java

테스트 수행전에 실행되는 초기화 처리 코드는 다음과 같다.

private List<Product> testData;
private static final int DATA_SIZE = 100_000; 
    
@BeforeEach
void setup() {
    // DB 초기화
    mongoTemplate.dropCollection( Product.class );

    // 테스트 데이터 생성 (메모리에 미리 생성하여 측정 오차 제거)
    System.out.println( "Generating " + DATA_SIZE + " objects..." );
    testData = IntStream.range( 0, DATA_SIZE )
            .mapToObj( i -> new Product( "Product-" + i, i * 1.5 ) )
            .collect( Collectors.toList() );
}

private List<Product> testData;
private static final int DATA_SIZE = 100_000; 
    
@BeforeEach
void setup() {
    // DB 초기화
    mongoTemplate.dropCollection( Product.class );

    // 테스트 데이터 생성 (메모리에 미리 생성하여 측정 오차 제거)
    System.out.println( "Generating " + DATA_SIZE + " objects..." );
    testData = IntStream.range( 0, DATA_SIZE )
            .mapToObj( i -> new Product( "Product-" + i, i * 1.5 ) )
            .collect( Collectors.toList() );
}

Java

결과를 출력하는 코드다.

private void printResult( String method, long millis ) {
    System.out.printf( "--> [%-25s] : %5d ms (%.2f seconds)%n", method, millis, millis / 1000.0 );
}

private void printResult( String method, long millis ) {
    System.out.printf( "--> [%-25s] : %5d ms (%.2f seconds)%n", method, millis, millis / 1000.0 );
}

Java

10만번 Looping 하면서 Repository save 호출

@Test
@DisplayName( "1. [Worst] Loop Save: 반복문으로 하나씩 저장" )
void testLoopSave() {
    StopWatch sw = new StopWatch();
    sw.start();

    for ( Product product : testData ) {
        repository.save( product );
    }

    sw.stop();
    printResult( "Loop Save", sw.getTotalTimeMillis() );
}

@Test
@DisplayName( "1. [Worst] Loop Save: 반복문으로 하나씩 저장" )
void testLoopSave() {
    StopWatch sw = new StopWatch();
    sw.start();

    for ( Product product : testData ) {
        repository.save( product );
    }

    sw.stop();
    printResult( "Loop Save", sw.getTotalTimeMillis() );
}

Java

결과 (약 48초)

--> [Loop Save                ] : 47959 ms (47.96 seconds)

--> [Loop Save                ] : 47959 ms (47.96 seconds)

Plaintext

사실 repository.save()를 10만번 반복 호출하는 것은 데이터 변환도 있겠지만 mongodb 서버와의 통신 시간도 10만번을 수행하니까 그 시간의 비중이 상당할 것이라고 보여진다.

Repository saveAll 호출 (Spring Data 기본 Bulk 저장 모드)

@Test
@DisplayName( "2. [Normal] Repository SaveAll: Spring Data 기본 Bulk" )
void testRepositorySaveAll() {
    StopWatch sw = new StopWatch();
    sw.start();

    repository.saveAll( testData );

    sw.stop();
    printResult( "Repository.saveAll()", sw.getTotalTimeMillis() );
}

@Test
@DisplayName( "2. [Normal] Repository SaveAll: Spring Data 기본 Bulk" )
void testRepositorySaveAll() {
    StopWatch sw = new StopWatch();
    sw.start();

    repository.saveAll( testData );

    sw.stop();
    printResult( "Repository.saveAll()", sw.getTotalTimeMillis() );
}

Java

결과

--> [Repository.saveAll()     ] :  1602 ms (1.60 seconds)

--> [Repository.saveAll()     ] :  1602 ms (1.60 seconds)

Plaintext

MongoTemplate의 bulkOps 호출 (Spring Data 최적화 기능)

@Test
@DisplayName( "3. [Better] BulkOperations: Spring Data 최적화 기능" )
void testBulkOperations() {
    StopWatch sw = new StopWatch();
    sw.start();

    mongoTemplate.bulkOps( BulkOperations.BulkMode.UNORDERED, Product.class )
            .insert( testData )
            .execute();

    sw.stop();
    printResult( "BulkOperations", sw.getTotalTimeMillis() );
}

@Test
@DisplayName( "3. [Better] BulkOperations: Spring Data 최적화 기능" )
void testBulkOperations() {
    StopWatch sw = new StopWatch();
    sw.start();

    mongoTemplate.bulkOps( BulkOperations.BulkMode.UNORDERED, Product.class )
            .insert( testData )
            .execute();

    sw.stop();
    printResult( "BulkOperations", sw.getTotalTimeMillis() );
}

Java

결과

--> [BulkOperations           ] :  1600 ms (1.60 seconds)

--> [BulkOperations           ] :  1600 ms (1.60 seconds)

Plaintext

Repository.saveAll() 과 비교했을 때 거의 차이가 없다.

bson4jackson + RawBsonDocument 조합 (Native 호출)

@Test
@DisplayName( "4. [Best] RawBsonDocument: Jackson 직렬화 + Native Driver" )
void testRawBsonDocument() throws Exception {
    StopWatch sw = new StopWatch();
    sw.start();

    // 1. Jackson을 이용해 POJO -> RawBsonDocument 변환 (병렬 처리 가능)
    List<RawBsonDocument> rawDocs = new ArrayList<>( DATA_SIZE );
    for ( Product product : testData ) {
        // ID 자동 생성 로직이 없으므로 필요하다면 여기서 할당하거나 DB에 맡김
        // 여기서는 성능 측정을 위해 순수 변환 비용만 포함
        byte[] bytes = bsonObjectMapper.writeValueAsBytes( product );
        rawDocs.add( new RawBsonDocument( bytes ) );
    }

    // 2. Native Collection 꺼내기 (RawBsonDocument 타입으로)
    MongoCollection<RawBsonDocument> collection = 
              mongoTemplate.getCollection( mongoTemplate.getCollectionName( Product.class ) )
                          .withDocumentClass( RawBsonDocument.class );

    // 3. Driver 레벨 insertMany 실행
    collection.insertMany( rawDocs );

    sw.stop();
    printResult( "RawBsonDocument + Jackson", sw.getTotalTimeMillis() );
}

@Test
@DisplayName( "4. [Best] RawBsonDocument: Jackson 직렬화 + Native Driver" )
void testRawBsonDocument() throws Exception {
    StopWatch sw = new StopWatch();
    sw.start();

    // 1. Jackson을 이용해 POJO -> RawBsonDocument 변환 (병렬 처리 가능)
    List<RawBsonDocument> rawDocs = new ArrayList<>( DATA_SIZE );
    for ( Product product : testData ) {
        // ID 자동 생성 로직이 없으므로 필요하다면 여기서 할당하거나 DB에 맡김
        // 여기서는 성능 측정을 위해 순수 변환 비용만 포함
        byte[] bytes = bsonObjectMapper.writeValueAsBytes( product );
        rawDocs.add( new RawBsonDocument( bytes ) );
    }

    // 2. Native Collection 꺼내기 (RawBsonDocument 타입으로)
    MongoCollection<RawBsonDocument> collection = 
              mongoTemplate.getCollection( mongoTemplate.getCollectionName( Product.class ) )
                          .withDocumentClass( RawBsonDocument.class );

    // 3. Driver 레벨 insertMany 실행
    collection.insertMany( rawDocs );

    sw.stop();
    printResult( "RawBsonDocument + Jackson", sw.getTotalTimeMillis() );
}

Java

결과

--> [RawBsonDocument + Jackson] :   702 ms (0.70 seconds)

--> [RawBsonDocument + Jackson] :   702 ms (0.70 seconds)

Plaintext

saveAll과 bulkOps 호출과 비교했을 때 2배 정도 차이가 난다. 700ms 시간은 일상에서는 찰나의 시간이겠지만 컴퓨팅 세계에서는 엄청난 시간이다. 엔티티 구성이 복잡하면 복잡할 수록 이 시간의 차이는 더 벌어질 것이다. 게다가 생성되는 객체의 수가 훨씬 적어서 메모리 사용면에서, GC 측면에서도 훨씬 유리하다.

모든 코드를 bson4jackson과 RawBsonDocument를 사용하는 코드로 바꿀 필요는 없다. 상황에 맞춰 적절한 방식을 선택하는 것이 중요하다.

일반적인 비즈니스 로직 (단건 / 소량)
- 생산성과 유지보수성도 중요한 만큼 그냥 Repository를 쓰는게 편하다.
- Spring Data가 제공하는 좋은 기능을 포기할 이유가 없다.
적당한 배치 작업
- MongoTemplate의 bulkOps나 Repository의 saveAll을 써도 괜찮을 것 같다.
극한의 대용량 처리 / 로그 수집 / 초대형 배치
- 이 때 RawBsonDocument를 사용하는 것을 고려하자.
- Spring Data 추상화 대신에 native하게 가야할 때다.

참고글

https://www.baeldung.com/java-jackson-mongodb-pojo-mapping