Spark, Kafka, Docker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•˜๋˜ ์ค‘, Kerberos๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋„ ์•Š์•˜๋Š”๋ฐ Kerberos ์ธ์ฆ๊ณผ ๊ด€๋ จ๋œ Spark ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.

org.apache.hadoop.security.KerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name

โ“ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ ์›์ธ์€?

  • ๊ณต์‹ apache/spark:3.5.0 Docker ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • Docker ๋‚ด๋ถ€์˜ Spark๊ฐ€ Hadoop์˜ ๊ธฐ๋ณธ ์ธ์ฆ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ์‹œ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • Hadoop์€ ๋‹ค์Œ์„ ํ†ตํ•ด ํ˜„์žฌ OS ์‚ฌ์šฉ์ž๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค:
UnixPrincipal(name)
  • Docker ์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€์—์„œ ์•ฑ์ด ์ ์ ˆํ•œ ์‚ฌ์šฉ์ž๋ช… ๋งคํ•‘์ด ์—†๋Š” UID/GID๋กœ ์‹คํ–‰๋˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์ด๋กœ ์ธํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค:
invalid null input: name

UnixPrincipal()์ด null์„ ๋ฐ›์•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๐Ÿ”Ž ๊ทผ๋ณธ ์›์ธ

  • Spark๋Š” ๋‚ด๋ถ€์ ์œผ๋กœ Hadoop์˜ UserGroupInformation์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๋ช…์‹œ์ ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์‚ฌ์šฉ์ž๊ฐ€ ์—†์œผ๋ฉด Hadoop์€ ์‹œ์Šคํ…œ ์‚ฌ์šฉ์ž๋กœ ํด๋ฐฑํ•ฉ๋‹ˆ๋‹ค.
  • ํ•˜์ง€๋งŒ Docker ์ปจํ…Œ์ด๋„ˆ๋Š” ๋™์ ์œผ๋กœ ํ• ๋‹น๋œ ์‚ฌ์šฉ์ž์— ๋Œ€ํ•ด ํ•ญ์ƒ ์œ ํšจํ•œ /etc/passwd ํ•ญ๋ชฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ์œ ํšจํ•œ ์‚ฌ์šฉ์ž๋ช…์ด ์—†์Œ โ†’ Hadoop ํฌ๋ž˜์‹œ โ†’ Kerberos ์˜ˆ์™ธ.

โš ๏ธ ์ฐธ๊ณ : ์ด ๋ฌธ์ œ๋Š” Kerberos๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„๋„ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค! ์˜ˆ์™ธ ์ด๋ฆ„์ด ์˜คํ•ด๋ฅผ ๋ถˆ๋Ÿฌ์ผ์œผํ‚ค๋Š”๋ฐ, ๋‹จ์ˆœํžˆ Hadoop์ด ํ˜„์žฌ ์‚ฌ์šฉ์ž๋ฅผ ์ฐพ์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๐Ÿ”ง ํ•ด๊ฒฐ์ฑ…: Docker ๋‚ด๋ถ€์—์„œ ์œ ํšจํ•œ ์‚ฌ์šฉ์ž๋ช… ์„ค์ •

Spark๋‚˜ Hadoop ์„ค์ •์„ ํŒจ์น˜ํ•˜๋Š” ๋Œ€์‹ , ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊น”๋”ํ•˜๊ฒŒ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค:

1๏ธโƒฃ Dockerfile ๋‚ด๋ถ€์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ์œ ํšจํ•œ ์‚ฌ์šฉ์ž ์ƒ์„ฑ

FROM apache/spark:3.5.0
USER root
# airflow ์‚ฌ์šฉ์ž ์ƒ์„ฑ (๋˜๋Š” ๋‹ค๋ฅธ ์œ ํšจํ•œ ์‚ฌ์šฉ์ž)
RUN useradd --create-home --shell /bin/bash airflow
USER airflow
WORKDIR /opt/spark-app

2๏ธโƒฃ docker-compose.yml์—์„œ HOME ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •

environment:
  - HOME=/home/airflow
  - HADOOP_USER_NAME=airflow

3๏ธโƒฃ โœ… ์ด์ œ Hadoop์˜ UserGroupInformation์ด ์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€์—์„œ ์‚ฌ์šฉ์ž ์‹ ์›์„ ์„ฑ๊ณต์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”ฌ ์ด ํ•ด๊ฒฐ์ฑ…์ด ์ž‘๋™ํ•˜๋Š” ์ด์œ 

  • Hadoop์€ UID๋‚˜ GID๋ฅผ ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ์œ ํšจํ•œ ์‚ฌ์šฉ์ž๋ช… ๋ฌธ์ž์—ด๋งŒ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • Docker ๋‚ด๋ถ€์—์„œ ๋ช…๋ช…๋œ ์‚ฌ์šฉ์ž๋ฅผ ์ƒ์„ฑํ•˜๋ฉด ์ด ์š”๊ตฌ์‚ฌํ•ญ์„ ์ถฉ์กฑํ•ฉ๋‹ˆ๋‹ค.
  • ์™„์ „ํ•œ Kerberos, Kinit ๋˜๋Š” ๋ณด์•ˆ ํ‹ฐ์ผ“ ์ธํ”„๋ผ๋ฅผ ๊ตฌ์„ฑํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

โœ… ์ฃผ์š” ํฌ์ธํŠธ

  • Spark ๋‚ด๋ถ€์˜ Hadoop์€ ์œ ํšจํ•œ OS ์‚ฌ์šฉ์ž์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค.
  • Docker๋Š” ๋•Œ๋•Œ๋กœ ์ต๋ช… UID๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค โ†’ ํ•ญ์ƒ ๋ช…์‹œ์ ์œผ๋กœ ์‚ฌ์šฉ์ž๋ฅผ ์ƒ์„ฑํ•˜์„ธ์š”.
  • HOME๊ณผ HADOOP_USER_NAME์„ ์ผ์น˜ํ•˜๋„๋ก ์„ค์ •ํ•˜์„ธ์š”.
  • SPARK_CLASSPATH=* ์Šคํƒ€์ผ์˜ ์™€์ผ๋“œ์นด๋“œ ์„ค์ •์„ ํ”ผํ•˜๊ณ  ๋ช…์‹œ์ ์ธ JAR ๋งˆ์šดํŒ…์„ ์„ ํ˜ธํ•˜์„ธ์š”.
  • ์ฒด๊ณ„์ ์œผ๋กœ ๋””๋ฒ„๊น…ํ•˜์„ธ์š”: ๋กœ๊ทธ๋ฅผ ์œ„์—์„œ ์•„๋ž˜๋กœ ์ฝ์œผ์„ธ์š”. ๊ทผ๋ณธ ์›์ธ์€ ๋ณดํ†ต ์ƒ๊ฐ๋ณด๋‹ค ํ›จ์”ฌ ์•ž์„œ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์ž‘์€ Dockerfile ์กฐ์ •์œผ๋กœ ๋ช‡ ์‹œ๊ฐ„์˜ ๊ณ ํ†ต์Šค๋Ÿฌ์šด ๋””๋ฒ„๊น…์„ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.