_대문 | 방명록 | 최근글 | 홈피소개 | 주인놈
FrontPage › Hive서버와HivePython클라이언트사용하기

Contents

[-]
1 Abstraction
2 Hive
3 Apache Thrift
4 Hive Server
5 Hive Python 클라이언트
6 References



1 Abstraction #

Hive Thrift 서버와 Python 클라이언트를 사용하는 방법에 대하여 알아본다.

2 Hive #

  • Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files.
  • http://hadoop.apache.org/hive/

3 Apache Thrift #

  • Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
  • http://incubator.apache.org/thrift/

4 Hive Server #

Hive 서버는 Thrift 서버로 동작한다.

서버 시작
$ hive --service hiveserver 
[1] 9818

$ Starting Hive Thrift Server

09/12/17 16:59:39 INFO service.HiveServer: Starting hive server on port 10000



.

.



$

5 Hive Python 클라이언트 #

Hadoop & Hive 설치 및 확인 (Hadoop 0.20.1 & Hive 0.4.0)
$ rpm -qa | grep hadoop-0.20

hadoop-0.20-jobtracker-0.20.1+133-1

hadoop-0.20-libhdfs-0.20.1+133-1

hadoop-0.20-tasktracker-0.20.1+133-1

hadoop-0.20-0.20.1+133-1

hadoop-0.20-datanode-0.20.1+133-1

hadoop-0.20-secondarynamenode-0.20.1+133-1

hadoop-0.20-conf-pseudo-0.20.1+133-1

hadoop-0.20-pipes-0.20.1+133-1

hadoop-0.20-namenode-0.20.1+133-1

hadoop-0.20-native-0.20.1+133-1

hadoop-0.20-docs-0.20.1+133-1

$

$ rpm -qa | grep hive

hadoop-hive-webinterface-0.4.0+14-1

hadoop-hive-0.4.0+14-1

샘플 데이터
$ cat /tmp/r.txt

a       1       1.0

b       2       2.0

c       3       3.0

$

PYTHONPATH 설정 (Hive Python 라이브러리)
$ export PYTHONPATH="/usr/lib/hive/lib/py"

$ env | grep PYTHONPATH

PYTHONPATH=/usr/lib/hive/lib/py

코드
import sys



from hive_service import ThriftHive

from hive_service.ttypes import HiveServerException

from thrift import Thrift

from thrift.transport import TSocket

from thrift.transport import TTransport

from thrift.protocol import TBinaryProtocol



try:

    transport = TSocket.TSocket('localhost', 10000)

    transport = TTransport.TBufferedTransport(transport)

    protocol = TBinaryProtocol.TBinaryProtocol(transport)



    client = ThriftHive.Client(protocol)

    transport.open()



    client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS TEXTFILE")

    client.execute("LOAD DATA LOCAL INPATH '/tmp/r.txt' OVERWRITE INTO TABLE r")

    client.execute("SELECT * FROM r")

    for row in client.fetchAll():

      print row



    transport.close()



except Thrift.TException, tx:

    print '%s' % (tx.message)

실행
{{{
$ python hive_py.py

a       1       1.0

b       2       2.0

c       3       3.0

6 References #

  • Hive Wiki
  • Apache Thrift
EditText : Print : Mobile : FindPage : DeletePage : LikePages : Powered by MoniWiki : Last modified 2018-04-13 23:12:52

보람 있게 보낸 하루가 편안한 잠을 가져다주듯이 값지게 쓰여진 인생은 편안한 죽음을 가져다준다. (레오나르도 다빈치)