์—ญํ•  dbt๊ฐ€ ์ž˜ํ•จ Python์ด ๋” ๋‚ซ๋‹ค
์ •ํ˜• ๋ฐ์ดํ„ฐ ์ •์ œ (staging) โœ… ๊ฐ€๋Šฅ์€ ํ•˜์ง€๋งŒ ๋ถˆํŽธํ•จ
๋งˆํŠธ ํ…Œ์ด๋ธ” ๊ตฌ์กฐ ์„ค๊ณ„ โœ… ๊ฐ€๋Šฅ์€ ํ•จ
์‚ฌ์šฉ์ž๋ณ„๋กœ ๋‹ฌ๋ผ์ง€๋Š” ๊ณ„์‚ฐ โŒ ๋ถˆํŽธํ•จ โœ… ๋งค์šฐ ์œ ์—ฐํ•จ
์ ์ˆ˜ํ™”, ์กฐ๊ฑด ๋งค์นญ, if-else ๋กœ์ง โŒ ๋งค์šฐ ๋ฒˆ๊ฑฐ๋กœ์›€ โœ… ์ ํ•ฉ
์‚ฌ์šฉ์ž ์ž…๋ ฅ ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง โŒ ๋ถˆ๊ฐ€๋Šฅ โœ… ํ•ต์‹ฌ ๊ธฐ๋Šฅ
์ถ”์ฒœ ์ด์œ  ์„ค๋ช…, ๋กœ์ง ํŠœ๋‹ โŒ โœ… ์™„์ „ ๋งž์ถคํ˜• ๊ตฌํ˜„ ๊ฐ€๋Šฅ

์˜ˆ๋ฅผ ๋“ค์–ด

-- dbt์—์„œ๋Š” ์ด๋Ÿฐ ๋กœ์ง์ด ์•„์ฃผ ํž˜๋“ค๋‹ค...
SELECT
  CASE 
    WHEN user.age BETWEEN policy.min_age AND policy.max_age THEN 30
    ELSE 0
  END +
  CASE 
    WHEN user.income < policy.income_ceiling THEN 25
    ELSE 0
  END + ...
  • dbt์—์„œ๋Š” โ€œuserโ€๋ž€ ์กด์žฌ ์ž์ฒด๊ฐ€ ์—†์Œ
  • dbt๋Š” โ€œ๋ชจ๋“  ์‚ฌ์šฉ์ž์—๊ฒŒ ๋™์ผํ•˜๊ฒŒ ์ ์šฉ๋˜๋Š” ๋ชจ๋ธโ€์„ ์„ค๊ณ„ํ•˜๋Š” ๋„๊ตฌ
  • ๋ฐ˜๋ฉด Python์—์„œ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•  ๋•Œ๋งˆ๋‹ค ์ถ”์ฒœ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ฌ๋ผ์ง€๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ

๐Ÿ‘‰ dbt๋Š” ์ •์ (Static) ๋ชจ๋ธ๋ง์— ์ ํ•ฉํ•˜์ง€๋งŒ ์‚ฌ์šฉ์ž ์ž…๋ ฅ ๊ธฐ๋ฐ˜์˜ ๋™์ (Dynamic) ์ถ”์ฒœ ์‹œ์Šคํ…œ์€ Python์ด ๋” ๋‚ฌ๋‹ค.

์ƒํ™ฉ ๊ฐ€์ •

API๋กœ ๋ฐ›์€ ์ •์ฑ… ๋ฐ์ดํ„ฐ ์˜ˆ์‹œ (JSON or CSV๋กœ ์ €์žฅ๋จ)

{
  "policy_name": "์ฒญ๋…„ ์›”์„ธ ์ง€์›",
  "eligibility": "๋งŒ 19์„ธ ์ด์ƒ 34์„ธ ์ดํ•˜ / ๋ฌด์ง ๋˜๋Š” ์žฌ์ง / ๋…๋ฆฝ๊ฐ€๊ตฌ",
  "target_region": "์ „๊ตญ",
  "link": "https://example.com"
}

์กฐ๊ฑด์ด ํ…์ŠคํŠธ๋กœ ๋˜์–ด์žˆ๋‹ค.
์ถ”์ฒœ ์ ์ˆ˜ ๊ณ„์‚ฐ์„ ํ•˜๋ ค๋ฉด ์ˆซ์ž๋‚˜ ๊ฐ’์œผ๋กœ ๋œ ์ปฌ๋Ÿผ์ด ํ•„์š”ํ•˜๋‹ค.

๊ทธ๋ž˜์„œ ํ•„์š”ํ•œ ๊ฒŒ ๋ฐ”๋กœ dbt ๊ตฌ์กฐํ™”!

  • “๋งŒ 19์„ธ ์ด์ƒ 34์„ธ ์ดํ•˜” โ†’ min_age = 19, max_age = 34
  • “๋ฌด์ง ๋˜๋Š” ์žฌ์ง” โ†’ job_status = ‘๋ฌด์ง,์žฌ์ง’
  • “๋…๋ฆฝ๊ฐ€๊ตฌ” โ†’ household_type = ‘๋…๋ฆฝ๊ฐ€๊ตฌ’
  • ๊ทธ๋ฆฌ๊ณ  target_region, income_ceiling ๋“ฑ๋“ฑ๋„ ์ปฌ๋Ÿผํ™”

dbt ๊ตฌ์กฐํ™” ํ๋ฆ„

1. raw_policies ํ…Œ์ด๋ธ” (API ๋ฐ›์•„์„œ ์ €์žฅํ•œ ๊ทธ๋Œ€๋กœ)

SELECT * FROM {{ source('raw', 'policies') }}

2. stg_policies.sql (์ •์ œ ๋‹จ๊ณ„)

SELECT
  policy_name,
  REGEXP_EXTRACT(eligibility, '๋งŒ (\d{2})์„ธ ์ด์ƒ')::INT AS min_age,
  REGEXP_EXTRACT(eligibility, '(\d{2})์„ธ ์ดํ•˜')::INT AS max_age,
  CASE
    WHEN eligibility LIKE '%๋ฌด์ง%' AND eligibility LIKE '%์žฌ์ง%' THEN '๋ฌด์ง,์žฌ์ง'
    WHEN eligibility LIKE '%๋ฌด์ง%' THEN '๋ฌด์ง'
    WHEN eligibility LIKE '%์žฌ์ง%' THEN '์žฌ์ง'
    ELSE '๋ฌด๊ด€'
  END AS job_status,
  CASE
    WHEN eligibility LIKE '%๋…๋ฆฝ๊ฐ€๊ตฌ%' THEN '๋…๋ฆฝ๊ฐ€๊ตฌ'
    ELSE '๋ฌด๊ด€'
  END AS household_type,
  target_region,
  link
FROM {{ source('raw', 'policies') }}

3. mart_policies.sql (์ถ”์ฒœ ์—”์ง„์ด ์‚ฌ์šฉํ•  ์ตœ์ข… ํ…Œ์ด๋ธ”)

SELECT
  policy_name,
  min_age,
  max_age,
  job_status,
  household_type,
  target_region,
  link
FROM {{ ref('stg_policies') }}
WHERE min_age IS NOT NULL AND max_age IS NOT NULL

๊ฒฐ๋ก 

  • API๋กœ ๋ฐ›์€ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ์ •์ฑ… ์กฐ๊ฑด์„ โ†’ ์ •์ œ๋œ ์ˆ˜์น˜/๊ฐ’ ๊ธฐ๋ฐ˜ ํ…Œ์ด๋ธ”๋กœ ๋งŒ๋“œ๋Š” ๊ฒŒ dbt ๊ตฌ์กฐํ™”
  • ๊ทธ๋ž˜์•ผ Python ์ถ”์ฒœ ๋กœ์ง์—์„œ ์กฐ๊ฑด ํ•„ํ„ฐ๋ง/์ ์ˆ˜ ๊ณ„์‚ฐ์„ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.