본문 바로가기
Business Analytics

1-2. 차원축소 - Feature Selection

by yu901 2022. 6. 26.

Exhaustive Search

Search all possible combinations

ex) 변수 3개에 대해, 7개(=2^3-1) 조합 고려

      x1, x2, x3  y=f(x1), y=f(x2), ..., y=f(x1,x2,x3)

 

Forward Selection

From the model with no variables, significant variables are sequentially added

Once a variable is selected, it will never be removed

 

Backward Elimination

From the model with all variables, irrelevant variables are sequentially removed

Once a variable is removed, it will never be selected

 

Stepwise Selection

From the model with no variable, conduct the forward selection and backward elimination alternately

Takes longer time than forward selection/backward elimination, but has more chances to find the optimal set of variables

Variables that is either selected/removed can be reconsidered for selection/removal

The number of variables increase in the early period, but it can either increase or decrease

 

 

Genetic Algorithm

Find a superior solutions and preserve by repeating the reduction process

- Selection: Select a superior solution to improve the quality

- Crossover: Search various alternatives based on the current solutions

- Mutation: Give a chance to escape the local optima

 

 

Initialization (Encoding Chromosomes)

Chromosome은 d-차원의 binary vector

Gene은 Chromosome 내에 있는 각각의 항목. 0 또는 1

0은 모델링에 사용하지 않음을 의미, 1은 모델링에 사용함을 의미

 

Fitness Evaluation

각 Chromosome의 성능 평가

fitness function: $ R^{2} $, AIC, BIC

 

Selection

우수한 Chromosome 선택

 

  • Deterministic selection
    Rank를 기준으로 Top N% Chromosome 선택

 

  • Probabilistic selection
    Weight을 Chromosome이 선택될 확률로 사용

 

Crossover 

부모 Cromosome의 정보를 섞어 자식 Cromosome 생성

 

Mutation

다음 세대의 다양성을 위해 돌연변이 생성

Mutation 사용으로 local optima에서 벗어날 수 있음

 

 


 

Feature Selection 알고리즘 별 성능, 소요시간

 

 


출처

'Business Analytics' 카테고리의 다른 글

AARRR 개념  (5) 2024.01.10
1-3. 차원축소 - Feature Extraction(1); PCA, MDA  (0) 2023.01.11
1-1. 차원 축소 - Overview  (0) 2022.06.25

댓글