Exhaustive Search
Search all possible combinations
ex) 변수 3개에 대해, 7개(=2^3-1) 조합 고려
x1, x2, x3 → y=f(x1), y=f(x2), ..., y=f(x1,x2,x3)
Forward Selection
From the model with no variables, significant variables are sequentially added
Once a variable is selected, it will never be removed
Backward Elimination
From the model with all variables, irrelevant variables are sequentially removed
Once a variable is removed, it will never be selected
Stepwise Selection
From the model with no variable, conduct the forward selection and backward elimination alternately
Takes longer time than forward selection/backward elimination, but has more chances to find the optimal set of variables
Variables that is either selected/removed can be reconsidered for selection/removal
The number of variables increase in the early period, but it can either increase or decrease
Genetic Algorithm
Find a superior solutions and preserve by repeating the reduction process
- Selection: Select a superior solution to improve the quality
- Crossover: Search various alternatives based on the current solutions
- Mutation: Give a chance to escape the local optima
Initialization (Encoding Chromosomes)
Chromosome은 d-차원의 binary vector
Gene은 Chromosome 내에 있는 각각의 항목. 0 또는 1
0은 모델링에 사용하지 않음을 의미, 1은 모델링에 사용함을 의미
Fitness Evaluation
각 Chromosome의 성능 평가
fitness function: $ R^{2} $, AIC, BIC
Selection
우수한 Chromosome 선택
- Deterministic selection
Rank를 기준으로 Top N% Chromosome 선택
- Probabilistic selection
Weight을 Chromosome이 선택될 확률로 사용
Crossover
부모 Cromosome의 정보를 섞어 자식 Cromosome 생성
Mutation
다음 세대의 다양성을 위해 돌연변이 생성
Mutation 사용으로 local optima에서 벗어날 수 있음
Feature Selection 알고리즘 별 성능, 소요시간
출처
- 01-2: Dimensionality Reduction - Supervised Selection
https://youtu.be/A69fxxdU0mk - 01-3 Dimensionality Reduction - Genetic Algorithm
https://youtu.be/yUW8yg4_j6w
'Business Analytics' 카테고리의 다른 글
AARRR 개념 (5) | 2024.01.10 |
---|---|
1-3. 차원축소 - Feature Extraction(1); PCA, MDA (0) | 2023.01.11 |
1-1. 차원 축소 - Overview (0) | 2022.06.25 |
댓글