6장

1. 컬렉터는 무엇인가?

컬렉터(Collector)는 최종 연산에서 스트림의 원소를 변환하고 처리하는 일련의 과정, 그리고 변환된 값을 의미한다. 기존 코드와 비교해서 컬렉터의 동작 방식을 알아보자. 다음은 자바 7에서 작성된 통화별로 트랜잭션을 그룹화한 코드이다.

Map<Currency, List<Transaction>> transactionsByCurrencies = new HashMap<>();

for (Transaction transaction : transactions) {
    Currency currency = transaction.getCurrency();
    List<Transaction> transactionsForcurrency = transactionsByCurrencies.get(currency);
    if (transactionsForCurrency == null) {
        transactionsForCurrency = new ArrayList<>();
        transactionsByCurrencies.put(currency, transactionsForCurrency);
    }
    transactionsForCurency.add(transaction);
}

collect는 다양한 요소 누적 방식을 인수로 받아 최종 결과로 도출한다. collect로 그룹화하는 코드는 다음과 같다.

Map<Currency, List<Transaction>> transactionsByCurrencies =
    transactions.stream()
                .collect(groupingBy(Transaction::getCurrency));

위 코드에서는 각 요소를 그룹화하기 위해 groupingBy를 사용했다. Collector 인터페이스의 구현은 스트림의 요소를 어떤 식으로 도출할지 지정한다.

1.2. 리듀싱 연산

이제 구체적으로 collect를 호출하면 어떤 일이 일어나는지 알아보자. 스트림에서 collect를 호출하면 내부적으로 리듀싱 연산이 일어난다. 컬렉터는 스트림의 각 원소를 방문하면서 변환 작업을 처리한다. 변환된 원소는 최종 결과를 저장하는 자료구조에 누적된다. 다음 그림은 통화별로 트랜잭션을 그룹화할 때 내부에서 일어나는 일을 요약한 것이다.

Collector 인터페이스의 메서드를 어떻게 구현하느냐에 따라 스트림이 어떤 리듀싱 연산을 수행할지 결정된다.

1.3. 팩토리 메서드

미리 정의된 컬렉터, groupingBy와 같이 Collectors 클래스에서는 팩토리 메서드를 제공하며 그 기능은 다음과 같다.

스트림 원소를 하나의 값으로 리듀스하고 요약
원소 그룹화
원소 분할

2. 리듀싱과 요약

counting() 팩토리 메서드는 원소 개수를 계산하고 반환한다.

// counting 활용
long howManyDishes = menu.stream().collect(Collectors.counting());

// count 활용
long howManyDishes = menu.stream().count();

2.1. 스트림의 최댓값과 최솟값

maxBy, minBy 두 팩토리 메서드를 활용해서 스트림의 최댓값과 최솟값을 계산할 수 있다.

Comparator<Dish> dishCaloriesComparator =
    Comparator.comparingInt(Dish::getCalories);

Optional<Dish> mostCalorieDish = 
    menu.stream()
        .collect(mayBy(dishCaloriesComparator));

menu가 비어있는 상황을 예상하고 Optional을 사용했음을 참고하자.

2.2. 요약 연산

summingInt, averagingInt, summarizingInt 등 요약 팩토리 메서드를 활용해서 객체의 숫자 필드 합계나 평균 등을 반환하는 요약(summarization) 연산을 사용할 수 있다. 요약 연산에는 리듀싱 기능이 제공된다.

int totalCalories = menu.stream().collect(summingInt(Dish::getCalories));

summingInt는 객체를 int로 매핑하는 함수를 인수로 받는다. 그리고 summingInt의 인수로 전달된 함수는 객체를 int로 매핑한 컬렉터를 반환한다. 그리고 summingInt가 collect 메서드로 전달되면 요약 연산을 수행한다.

2.3. 문자열 연결

joining 팩토리 메서드를 활용해서 추출한 문자열을 연결해서 반환할 수 있다.

// joining 활용
String shortMenu = menu.stream().map(Dish::getName).collect(joining());

// joining에 구분자 활용
String shortMenu = menu.stream().map(Dish::getName).collect(joining(", "));

실행결과는 다음과 같다.

# joining 활용
porkbeefchickenfrench friesriceseason fruitpizzaprawnssalmon

# joining에 구분자 활용
pork, beef, chicken, french fries, rice, season fruit, pizza, prawns, salmon

joining 메서드는 내부적으로 StringBuilder를 활용해서 문자열을 하나로 만든다.

2.4. 범용 리듀싱 요약 연산

지금까지 구현한 예제들은 reducing 팩토리 메서드로도 정의할 수 있다. 그러나 특화된 컬렉션을 사용할 수 있다면 사용하는 게 편의성과 가독성 측면에서 바람직하다.

// reducing 활용(인수 3개)
int totalCalories = 
    menu.stream()
        .collect(
            reducing(
                0, // 초깃값
                Dish::getCalories, // 합계 함수
                Integer::sum // 변환 함수, (i, j) -> i + j 람다 표현식으로 표현 가능
            )
        );

// reducing 활용(인수 1개)
Optional<Dish> mostCalorieDish =
    menu.stream()
        .collect(
            reducing((d1, d2) -> d1.getCalories() > d2.getCalories() ? d1 : d2)
        );

reducing의 인자가 3개일 때는 연산의 시작값과 BinaryOperator를 지정할 수 있었다. 그러나 reducing 인자가 1개일 때는 기본값을 지정할 수 없다. 따라서 스트림의 시작 원소를 첫 번째 인수로 받아 자신을 그대로 반환하는 항등 함수(identity function)을 두 번재 인수로 받는다. 시작 앖이 없어 빈 스트림이 넘겨졌을 때 시작값이 설정되지 않으므로 Optional 객체를 반환한다.

3. 그룹화

팩토리 메서드 Collectors.groupingBy를 활용하면 쉽게 메뉴를 그룹화할 수 있다.

// 람다 표현식 활용
public enum CaloricLevel { IDET, NORMAL, FAT }

Map<CaloricLevel, List<Dish>> dishesByCaloricLevel =
    menu.stream()
        .collect(
            groupingBy(dish -> {
                if (dish.getCalories() <= 400) return CaloricLevel.DIET;
                else if (dish.getCalories() <= 700) return CaloricLevel.NORMAL;
                else return CaloricLevel.FAT;
            })
        );

// 메서드 참조 활용
Map<Dish.Type, List<Dish>> dishesByType =
    menu.stream()
        .collect(groupingBy(Dish::getType));

위 코드의 실행 결과는 다음과 같다.

{
    FISH=[prawns, salmon], 
    OTHER=[french fries, rice, season fruit, pizza], 
    MEAT=[pork, beef, chicken]
}

스트림은 분류 함수(classification function)를 기준으로 그룹화된다. 그룹화 연산의 결과로 그룹화 함수가 반환하는 키 그리고 각 키에 대응하는 스트림의 모든 항목 리스트를 값으로 갖는 맵이 반환된다.

3.1. 그룹화된 요소 조작

요소를 그룹화 한 다음에는 각 결과 그룹의 요소를 조작할 수 있다.

Map<Dish.Type, List<Dish>> caloricDishesByType =
    menu.stream()
        .filter(dish -> dish.getCalories() > 500)
        .collect(groupingBy(Dish::getType));

만일 그룹화를 하기 전에 필터를 적용하면 다음과 같은 결과가 나올 수도 있다.

{
    OTHER=[french fries, pizza], 
    MEAT=[pork, beef]
}

필터를 만족하는 원소가 없는 경우 결과 맵에서 해당 키 자체가 사라진다. 이 문제를 해결하기 위해 Collector 클래스는 분류 함수에 Collector 형식의 두 번째 인수를 갖도록 groupingBy 팩토리 메서드를 오버로드한다.

// filtering 활용
Map<Dish.Type, List<Dish>> caloricDishesByType =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            filtering(dish -> dish.getCalories() > 500, toList())
        ));

위와 같이 필터링하면 결과 목록에서 비어있는 FISH 항목이 추가된다.

{
    OTHER=[french fries, pizza], 
    MEAT=[pork, beef],
    FISH=[]
}

mapping과 flatMapping을 활용해서 그룹화한 요소를 조작할 수 있다.

// mapping 활용
Map<Dish.Type, List<String>> dishNamesByType =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            mapping(Dish::getName, toList())
        ));

// flatMapping 활용
Map<Dish.Type, Set<String>> dishNamesByType =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            flatMapping(dish -> dishTags.get(dish.getName()).stream(), toSet())
        ));

3.2. 다수준 그룹화

Collectors.groupingBy를 이용해서 항목을 다수준으로 그룹화할 수 있다. groupingBy는 분류 함수와 컬렉터를 인수로 받으므로 두 수준으로 스트림 항목을 그룹화할 수 있다.

Map<Dish.Type, Map<CaloricLevel, List<Dish>>> dishesByTypeCaloricLevel =
    menu.stream()
        .collect(groupingBy(Dish::getType,
            groupingBy(dish -> {
                if (dish.getCalories() <= 400) return CaloricLevel.DIET;
                else if (dish getCalories() <= 700) return CaloricLevel.NORMAL;
                else return CaloricLevel.FAT;
            })
        ));

코드의 실행 결과는 다음과 같다.

{
    MEAT={
            DIET=[chicken], 
            NORMAL=[beef], 
            FAT=[pork]
    },
    FISH={
        DIET=[prawns],
        NORMAL=[salmon]
    },
    OTHER={
        DIET=[rice, seasonal fruit],
        NORMAL=[french fries, pizza]
    }
}

3.3. 서브 그룹으로 데이터 수집

같은 그룹으로 분류된 모든 요소에 리듀싱 작업을 수행할 때는 groupingBy의 두 번째 인수로 전달한 컬렉터를 사용한다.

// 요리의 수를 종류별로 계산
Map<Dish.Type, Long> typesCount =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            counting() 
        ));

// 실행 결과
// {MEAT=3, FISH=2, OTHER=4}

// 가장 높은 칼로리를 가지는 요리를 찾기
Map<Dish.Type, Optional<Dish>> mostCaloricByType =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            maxBy(comparingInt(Dish::getCalories))
        ));

// 실행 결과
// {FISH=Optional[salmon], OTHER=Optional[pizza], MEAT=Optional[pork]}

// 가장 높은 칼로리를 가지는 요리를 찾기(Optional 삭제)
Map<Dish.Type, Dish> mostCaloricByType =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            collectingAndThen(
                maxBy(comparingInt(Dish::getCalories)),
                Optional::get
            )
        ));

// 실행 결과
// {FISH=salmon, OTHER=pizza, MEAT=pork}

// 모든 요리의 칼로리 합계
Map<Dish.Type, Integer> totalCaloriesByType =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            summingInt(Dish::getCalories)
        ));

// 각 요리 형식에 존재하는 모든 칼로리 레벨의 값
Map<Dish.Type, Set<CaloricLevel>> caloricLevelsByType = 
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            mapping(dish -> {
                if (dish.getCalories() <= 400) return CaloricLevel.DIET;
                else if (dish.getCalories() <= 700) return CaloricLevel.NORMAL;
                else return CaloricLevel.FAT;
            },
            toSet())
        ));

// 실행 결과
// {OTHER=[DIET, NORMAL], MEAT=[DIET, NORMAL, FAT], FISH=[DIET, NORMAL]}

// 메서드 참조를 toCollection에 전달
Map<Dish.Type, Set<CaloricLevel>> caloricLevelIsByType =
    menu.stream()
        .collect(groupingBy(
            Dish::getType,
            mapping(dish -> {
                if (dish.getCalories() <= 400) return CaloricLevel.DIET;
                else if (dish.getCalories() <= 700) return CaloricLevel.NORMAL;
                else return CaloricLevel.FAT;                
            },
            toCollection(HashSet::new))
        ));

4. 분할

분할은 분할 함수(partitioning function)라 불리는 프레디케이트를 분류 함수로 사용하는 특수한 그룹화 기능이다. 분할 함수는 boolean을 반환하므로 맵의 키 형식은 Boolean이다. 그룹화 맵은 최대 두 개의 그룹으로 분류된다.

// partitioningBy 활용
Map<Boolean, List<Dish>> partitionedMenu =
    menu.stream()
        .collect(partitioningBy(Dish::isVegetarian));
List<Dish> vegetarianDishes = partitionedMenu.get(true);

// 프레디케이트로 필터링 후 결과 수집
List<Dish> vegetarianDishes =
    menu.stream()
        .filter(Dish::isVegetarian)
        .collect(toList());

코드의 실행 결과는 다음과 같다.

{
    false=[pork, beef, chicken, parwns, salmon],
    true=[french fries, rice, season fruit, pizza]
}

참, 거짓 두 가지 요소의 스트림 리스트를 모두 유지한다는 것이 분할의 장점이다.

// 채식 요리의 스트림과 채식이 아닌 요리의 스트림을 요리 종류로 그룹화
Map<Boolean, Map<Dish.Type, List<Dish>>> vegetarianDishesByType =
    menu.stream()
        .collect(partitioningBy(
            Dish::isVegetarian,
            groupingBy(Dish::getType)
        ));

// 채식 요리와 채식이 아닌 요리 중 가장 칼로리가 높은 요리
Map<Boolean, Dish> mostCaloricPartitionedByVegetarian =
    menu.stream()
        .collect(partitioningBy(
            Dish::isVegetarian,
            collectingAndThen(
                maxBy(comparingInt(Dish::getCalories)),
                Optional::get
            )
        ));

다음은 코드의 실행 결과이다.

# 채식 요리의 스트림과 채식이 아닌 요리의 스트림을 요리 종류로 그룹화
{
    false={
        FISH=[prawns, salmon], 
        MEAT=[pork, beef, chicken]
    },
    true={
        OTHER=[french fries, rice, season fruit, pizza]
    }
}

# 채식 요리와 채식이 아닌 요리 중 가장 칼로리가 높은 요리
{
    false=pork,
    true=pizza
}

5. Collector 인터페이스

Collector 인터페이스의 시그니처와 다섯 개의 메서드 정의는 다음과 같다.

public interface Collector<T, A, R> {
    Supplier<A> supplier();
    BiConsumer<A, T> accumulator();
    Function<A, R> finisher();
    BinaryOperator<A> combiner();
    Set<Characteristics> characteristics();
}

위 코드는 다음처럼 설명할 수 있다.

T는 수집될 스트림 항목의 제네릭 형식
A는 누적자, 수집 과정에서 중간 결과를 누적하는 객체의 형식
R은 수집 연산 결과 객체의 형식(대개 컬렉션 형식)

예를 들어 Stream의 모든 요소를 List로 수집하는 ToListcollector 클래스를 구현할 수 있다.

public class ToListCollector<T> implements Collector<T, List<T>, List<T>>

5.1. supplier 메서드

supplier 메서드는 빈 결과로 이루어진 Supplier를 반환한다.

// 람다 표현식 활용
public Supplier<List<T>> supplier() {
    return () -> new ArrayList<T>();
}

// 생성자 참조 활용
public Supplier<List<T>> supplier() {
    return ArrayList::new;
}

5.2. accumulator 메서드

accumulator 메서드는 리듀싱 연산을 수행하는 함수를 반환한다.

// 람다 표현식 활용
public BiConsumer<List<T>, T> accumulator() {
    return (list, item) -> list.add(item);
}

// 메서드 참조 활용
public BiConsumer<List<T>, T> accumulator() {
    return List::add;
}

5.3. finisher 메서드

finisher 메서드는 스트림 탐색을 끝내고 누적자 객체를 최종 결과로 변환하면서 누적 과정을 끝낼 때 호출할 함수를 반환한다. 누적자 객체가 최종 결과인 경우 변환 과정이 필요없으므로 finished 메서드가 항등 함수를 반환한다.

public Function<List<T>, List<T>> finisher() {
    return Function.identity();
}

지금까지 살펴본 메서드로 순차적 스트림 리듀싱 기능을 수행할 수 있다. 순차적 리듀싱 순서는 아래 그림과 같다.

5.4. combiner 메서드

combiner 메서드는 리듀싱 연산에서 사용할 함수를 반환한다. combiner는 스트림의 서로 다른 서브파트를 병렬로 처리할 때 누적자가 이 결과를 어떻게 처리할지 정의한다.

public BinaryOperator<List<T>> combiner() {
    return (list1, list2) -> {
        list1.addAll(list2);
        return list1;
    };
}

combiner 메서드의 논리적 순서는 아래 그림과 같다.

5.5. Characteristic 메서드

characteristics 메서드는 컬렉터의 연산을 정의하는 Characteristics 형식의 불변 집합을 반환한다. Characteristics는 다음 세 항목을 포함하는 열거형이다.

UNORDERED: 리듀싱 결과는 스트림 요소의 방문 순서나 누적 순서에 영향을 받지 않음
CONCURRENT: 다중 스레드에서 accumulator 함수를 동시에 호출할 수 있으며 병렬 리듀싱을 수행할 수 있음
IDENTITY_FINISH: 메서드가 반환하는 함수는 identity를 적용할 뿐이므로 생략 가능

참고 자료

모던 자바 인 액션 - 한빛미디어

Previous5장 Next7장

Last updated 11 months ago