๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋กœ์„œ ๋ณต์žกํ•œ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋ฉด์„œ ๋งˆ์ง€๋ง‰ ์žฅ์• ๋ฌผ์„ ๋„˜์—ˆ์„ ๋•Œ ๋А๋ผ๋Š” ์„ฑ์ทจ๊ฐ์€ ์ •๋ง ํŠน๋ณ„ํ•ฉ๋‹ˆ๋‹ค.

์˜ค๋Š˜์€ Apache Spark Structured Streaming + Kafka๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๊ต‰์žฅํžˆ ๋‹ต๋‹ตํ•œ ๋ฌธ์ œ๋ฅผ ๊ณต์œ ํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค:

๐Ÿ‘‰ ๋ฐ”๋กœ ์•…๋ช… ๋†’์€ Failed to find data source: kafka ์—๋Ÿฌ์ž…๋‹ˆ๋‹ค.

๐Ÿงจ ๋ฌธ์ œ: ๋ชจ๋“  ๊ฒŒ ์ •์ƒ์ธ๋ฐ Kafka๋งŒ ์•ˆ ๋œ๋‹ค

์ƒํ™ฉ์„ ๋– ์˜ฌ๋ ค๋ด…์‹œ๋‹ค:

  • Spark ํด๋Ÿฌ์Šคํ„ฐ ์ •์ƒ ๊ตฌ๋™
  • Postgres ์—ฐ๊ฒฐ๋„ ๋ฌธ์ œ ์—†์Œ
  • Kafka์—์„œ ์ด๋ฒคํŠธ๋„ ์ž˜ ๋ฐœํ–‰๋จ
  • ์ฝ”๋“œ์—์„œ .readStream.format("kafka") ํ˜ธ์ถœ

๊ทธ๋Ÿฐ๋ฐ ๊ฐ‘์ž๊ธฐ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒ:

pyspark.errors.exceptions.captured.AnalysisException: Failed to find data source: kafka.

๊ทธ๋ž˜์„œ ๋ชจ๋“  ๊ฑธ ๋‹ค์‹œ ํ™•์ธํ•ด ๋ด…๋‹ˆ๋‹ค:

  • โœ… JAR ํŒŒ์ผ ์ž˜ ๋งˆ์šดํŠธ๋จ
  • โœ… classpath๋„ ์ •์ƒ
  • โœ… Kafka์™€ Spark ๋ฒ„์ „ ํ˜ธํ™˜

๊ทธ๋Ÿฐ๋ฐ๋„ Spark๋Š” ๊ณ„์† ๋ถˆํ‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒŒ ๋ฐ”๋กœ Spark์—์„œ ์ž์ฃผ ๊ฒช๋Š” ์ˆจ๊ฒจ์ง„ ํ•จ์ •์ž…๋‹ˆ๋‹ค.

๐Ÿงช ๊ทผ๋ณธ ์›์ธ: ๋‹จ์ˆœ JAR ๋กœ๋”ฉ vs. ํ”Œ๋Ÿฌ๊ทธ์ธ ๋“ฑ๋ก

๋ฌธ์ œ๋Š” JAR ํŒŒ์ผ์ด ์—†๋Š” ๊ฒŒ ์•„๋‹™๋‹ˆ๋‹ค. Spark๊ฐ€ ํ”Œ๋Ÿฌ๊ทธ์ธ์„ ๋‚ด๋ถ€์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ๋“ฑ๋กํ•˜๋Š”๊ฐ€์˜ ์ฐจ์ด์ž…๋‹ˆ๋‹ค.

์กฐ๊ธˆ ๋” ์ž์„ธํžˆ ์‚ดํŽด๋ณผ๊ฒŒ์š”:

JDBC: ์‰ฌ์šด ์ผ€์ด์Šค

JDBC๋ฅผ ์‚ฌ์šฉํ•  ๋•:

1๏ธโƒฃ df.write.jdbc() ํ˜ธ์ถœ 2๏ธโƒฃ Spark๊ฐ€ classpath์—์„œ JDBC ๋“œ๋ผ์ด๋ฒ„ ๊ฒ€์ƒ‰ 3๏ธโƒฃ ๋ โ€” --jars ๋งŒ์œผ๋กœ ์ถฉ๋ถ„

Kafka: ํŠน๋ณ„ํ•œ ์ผ€์ด์Šค

Kafka (Structured Streaming)๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค:

1๏ธโƒฃ df.readStream.format("kafka") ํ˜ธ์ถœ 2๏ธโƒฃ Spark๊ฐ€ Kafka๋ฅผ DataSourceV2 ํ”Œ๋Ÿฌ๊ทธ์ธ์œผ๋กœ ๋“ฑ๋ก ์‹œ๋„ 3๏ธโƒฃ Spark ์ดˆ๊ธฐํ™” ๊ณผ์ •์—์„œ ํ”Œ๋Ÿฌ๊ทธ์ธ ๋“ฑ๋ก ์ง„ํ–‰ 4๏ธโƒฃ ๋‹จ์ˆœํžˆ classpath์— JAR ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ๋“ฑ๋ก์ด ์™„๋ฃŒ๋˜์ง€ ์•Š์Œ

๊ทธ๋ž˜์„œ Kafka์—์„œ๋Š” --jars๋งŒ์œผ๋กœ ์•ˆ ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.

๐Ÿ”ง ํ•ด๊ฒฐ์ฑ…: ์‹ค์ œ๋กœ ๋™์ž‘ํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•

โœ… ๋ฐฉ๋ฒ• 1: ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ฐฉ์‹ (ํ”„๋กœ๋•์…˜ ์ถ”์ฒœ)

Kafka ์ปค๋„ฅํ„ฐ๋Š” --packages๋กœ, ๋‚˜๋จธ์ง€ ๋“œ๋ผ์ด๋ฒ„๋Š” ๋กœ์ปฌ JAR๋กœ ๊ด€๋ฆฌ:

/opt/spark/bin/spark-submit \
  --master spark://spark-master:7077 \
  --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 \
  --jars /opt/spark/extra-jars/postgresql-42.6.0.jar,/opt/spark/extra-jars/kafka-clients-3.5.0.jar \
  --conf spark.driver.extraClassPath="/opt/spark/extra-jars/*" \
  --conf spark.executor.extraClassPath="/opt/spark/extra-jars/*" \
  transaction_streaming.py

์™œ ์ด๊ฒŒ ์ž˜ ๋™์ž‘ํ•˜๋‚˜? --packages๋Š” ๋‹จ์ˆœํžˆ JAR๋งŒ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ, ํ”Œ๋Ÿฌ๊ทธ์ธ ๋“ฑ๋ก๊นŒ์ง€ ๋‚ด๋ถ€์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ด ์ค๋‹ˆ๋‹ค.

โœ… ๋ฐฉ๋ฒ• 2: ์ˆœ์ˆ˜ ๋กœ์ปฌ JAR ๋ฐฉ์‹ (ํ’€ ๋„์ปค ํ™˜๊ฒฝ์—์„œ ์„ ํ˜ธ)

์ „์ฒด ์ œ์–ด๊ถŒ์„ ์œ ์ง€ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด:

/opt/spark/bin/spark-submit \
  --master spark://spark-master:7077 \
  --jars /opt/spark/extra-jars/postgresql-42.6.0.jar,/opt/spark/extra-jars/spark-sql-kafka-0-10_2.12-3.5.0.jar \
  --conf spark.sql.streaming.kafka.allowAutoTopicCreation=true \
  --conf spark.driver.extraClassPath="/opt/spark/extra-jars/*" \
  --conf spark.executor.extraClassPath="/opt/spark/extra-jars/*" \
  transaction_streaming.py

โš ๏ธ ์ฃผ์˜: Kafka์˜ ๋ชจ๋“  ์ข…์† JAR (transitive dependencies) ๋ฅผ ๋กœ์ปฌ์— ์ง์ ‘ ์ค€๋น„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿงญ ์ด ์ฐจ์ด๊ฐ€ ์ค‘์š”ํ•œ ์ด์œ 

  • ๐Ÿ” ๋””๋ฒ„๊น… โ€” ํ”Œ๋Ÿฌ๊ทธ์ธ ๋“ฑ๋ก ๋ฐฉ์‹์„ ์•Œ๋ฉด ๋์—†๋Š” ์‚ฝ์งˆ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Œ
  • ๐Ÿšข ๋ฐฐํฌ ์„ ํƒ์ง€ โ€” Docker ํ™˜๊ฒฝ์€ ๋กœ์ปฌ JAR ์„ ํ˜ธ, ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ๋Š” --packages๊ฐ€ ๋” ๊ฐ„ํŽธ
  • โฑ ์„ฑ๋Šฅ โ€” --packages๋Š” ์‹คํ–‰ ์‹œ ๋‹ค์šด๋กœ๋“œ โ†’ ์ดˆ๊ธฐ ๋ถ€ํŒ… ์•ฝ๊ฐ„ ๋А๋ฆฌ์ง€๋งŒ ๋ฒ„์ „ ํ˜ธํ™˜ ๋ณด์žฅ
  • ๐Ÿ‘ฅ ํŒ€ ํ˜‘์—… โ€” ๋ช…ํ™•ํ•œ ๋ฌธ์„œํ™”๊ฐ€ “๋‚ด ์ปดํ“จํ„ฐ์—์„  ๋˜๋Š”๋ฐ?” ์ƒํ™ฉ ๋ฐฉ์ง€

๐Ÿš€ ํ”„๋กœ๋•์…˜์—์„œ ๋‚ด๊ฐ€ ๋”ฐ๋ฅด๋Š” ๋ฃฐ

์ปดํฌ๋„ŒํŠธ ์ถ”์ฒœ ๋ฐฉ์‹
Kafka ์ปค๋„ฅํ„ฐ --packages ์‚ฌ์šฉ
DB ๋“œ๋ผ์ด๋ฒ„ ๋กœ์ปฌ --jars ์‚ฌ์šฉ
๋‚ด๋ถ€ ์ปค์Šคํ…€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋กœ์ปฌ --jars ์‚ฌ์šฉ
ํŒ€ ํ™˜๊ฒฝ ์‚ฌ์šฉ ๋ฐฉ์‹์„ ๋ฐ˜๋“œ์‹œ ๋ฌธ์„œํ™”

๐Ÿงต ํ•œ ๋‹จ๊ณ„ ๋” ๊นŠ์ด

์ด ๋ฌธ์ œ๋Š” Spark ๋‚ด๋ถ€ ๋™์ž‘์„ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์ด ์™œ ์ค‘์š”ํ•œ์ง€๋ฅผ ์ž˜ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

classpath์— JAR์ด ์žˆ๋А๋ƒ vs. data source๊ฐ€ ํ”Œ๋Ÿฌ๊ทธ์ธ์œผ๋กœ ๋“ฑ๋ก๋๋А๋ƒ โ€” ์ด ์ฐจ์ด๊ฐ€ ์ŠคํŠธ๋ฆฌ๋ฐ ํŒŒ์ดํ”„๋ผ์ธ์ด ์ž˜ ๋„๋Š”์ง€, ๋””๋ฒ„๊น… ์ง€์˜ฅ์— ๋น ์ง€๋Š”์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ๋ฒˆ์— ์•„๋ž˜ ์—๋Ÿฌ๋ฅผ ๋งŒ๋‚˜๋ฉด:

Failed to find data source: kafka

โ†’ JAR์ด ์žˆ๋Š”์ง€๋งŒ ๋ณด์ง€ ๋ง๊ณ , โ†’ Spark๊ฐ€ ์–ด๋–ป๊ฒŒ ํ”Œ๋Ÿฌ๊ทธ์ธ์„ ๋“ฑ๋กํ•˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

๋งˆ์ง€๋ง‰ ์žฅ์• ๋ฌผ์€ ์‚ฌ์‹ค ์–ด๋ ค์šด ๊ฒŒ ์•„๋‹ˆ๋ผ ๊ทธ๋ƒฅ ์˜ˆ์ƒ ๋ฐ–์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.