.Build Apache Drill v1.0

<In-Progress>

Pre-requisites

  • OpenJDK8
  • Zookeeper
  • git
  • maven@v3.3.9

Install OpenJDK

$ sudo apt-get install openjdk-8-jdk


Make sure you have the right OpenJDK version 

$ java -version

It should display 1.8.0_111

Set JAVA_HOME

$ export JAVA_HOME=`readlink -f /usr/bin/java sed "s:jre/bin/java::"`


Building Apache Zookeeper


Some distributions like Ubuntu/Debian comes with latest zookeeper.  Hence you can just install using apt-get command "sudo apt-get install zookeeper".  If your distribution does not come with zookeeper then just go for latest download and unzip the Zookeeper package from Official Apache archive in all machines that will be used for zookeeper quorum as shown below:

Edit the /etc/hosts file across all the nodes and add the ipaddress and hostname (nodenames). If the hostnames are not right, change them in /etc/hosts file



192.168.1.102 node1
192.168.1.103 node2
192.168.1.105 node3


Create zookeeper user

$ sudo adduser zookeeper


Configure zookeeper

To make an ensemble with Master-slave architecture,  we needed to have odd number of zookeeper server .i.e.{1, 3 ,5,7....etc}. 

Now, Create the directory zookeeper under /var/lib folder which will serve as Zookeeper data directory and create another zookeeper directory under /var/log where all the Zookeeper logs will be captured. Both of the directory ownership need to be changed as zookeeper.

$ sudo mkdir /var/lib/zookeeper

$ cd /var/lib

$ sudo chown zookeeper:zookeeper zookeeper/

$ sudo mkdir /var/log/zookeeper

$ cd /var/log

$ sudo chown zookeeper:zookeeper zookeeper/


Note: While running the zookeeper if you get a message something like below you may need to check/change for permissions of the files under /var/lib/zookeeper and /var/log/zookeeper.

Since I have loged-in as linaro and running zookeeper.  I have changed the permission to linaro user.


linaro@node1:~/drill-setup/zookeeper-3.4.12$ ./bin/zkServer.sh start

ZooKeeper JMX enabled by default

Using config: /home/linaro/drill-setup/zookeeper-3.4.12/bin/../conf/zoo.cfg

Starting zookeeper ... ./bin/zkServer.sh: line 149: /var/lib/zookeeper/zookeeper_server.pid: Permission denied

FAILED TO WRITE PID


Edit the bashrc for the zookeeper user via setting up the following Zookeeper environment variables.

$ export ZOO_LOG_DIR=/var/log/zookeeper


Source the .bashrc in current login session:

$ source ~/.bashrc


Create the server id for the ensemble. Each zookeeper server should have a unique number in the myid file within the ensemble and should have a value between 1 and 255.

In Node1

$ sudo sh -c "echo '1' > /var/lib/zookeeper/myid"


In Node2

$ sudo sh -c "echo '2' > /var/lib/zookeeper/myid"


In Node3

$ sudo sh -c "echo '3' > /var/lib/zookeeper/myid"


Now, go to the conf folder under the Zookeeper home directory (location of the Zookeeper directory after Archive has been unzipped/extracted).

$ cd /home/zookeeper/zookeeper-3.4.13/conf/


By default, a sample conf file with name 
zoo_sample.cfg will be present in conf directory. Make a copy of it with name zoo.cfg as shown below, and edit new zoo.cfg as described across all the nodes.



$ cp zoo_sample.cfg zoo.cfg


Edit zoo.cfg and the below


$ vi zoo.cfg


dataDir=/var/lib/zookeeper
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888


Now, do the below changes in 
log4.properties file as follows.

$ vi log4j.properties


zookeeper.log.dir=/var/log/zookeeper 
zookeeper.tracelog.dir=/var/log/zookeeper 
log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE

After the configuration has been done in zoo.cfg file in all three nodes, start zookeeper in all the nodes one by one, using following command:

$ /home/zookeeper/zookeeper-3.4.12/bin/zkServer.sh start


Zookeeper Service Start on all the Nodes.

ZooKeeper JMX enabled by default
Using config: /home/ganesh/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED


The log file will be created in /var/log/zookeeper of zookeeper named zookeeper.log, tail the file to see logs for any errors.

$ tail -f /var/log/zookeeper/zookeeper.log


Verify the Zookeeper Cluster and Ensemble

In Zookeeper ensemble out of three servers, one will be in leader mode and other two will be in follower mode. You can check the status by running the following commands.

$ /home/zookeeper/zookeeper-3.4.13/bin/zkServer.sh status


Zookeeper Service Status Check.

In Zookeeper ensemble If you have 3 nodes, out of them, one will be in leader mode and other two will be in follower mode. You can check the status by running the following commands. If you have just one then it will be standalone.

With three nodes:

node1

ZooKeeper JMX enabled by default
Using config: /home/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: leader

node2

ZooKeeper JMX enabled by default
Using config: /home/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: follower

node3

ZooKeeper JMX enabled by default
Using config: /home/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: follower

standalone

ZooKeeper JMX enabled by default
Using config: /home/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: standalone


$ echo stat | nc node1 2181


Lists brief details for the server and connected clients.

Lists brief details for the server and connected clients


$ echo mntr | nc node1 2181


Zookeeper list of variables for cluster health monitoring.

Zookeeper list of variables for cluster health monitoring


$ echo srvr | nc localhost 2181


Lists full details for the Zookeeper server.

Lists full details for the Zookeeper server.


If you need to check and see the znode, you can connect by using the below command on any of the zookeeper node:

$ /home/zookeeper/zookeeper-3.4.12/bin/zkCli.sh -server `hostname -f`:2181


Connect to Zookeeper data node and lists the contents.

Connect to Zookeeper data node and lists the contents.


ent:user.name=root
2019-02-18 02:26:36,822 [myid:] - INFO [main:Environment@100] - Client environm
ent:user.home=/root
2019-02-18 02:26:36,822 [myid:] - INFO [main:Environment@100] - Client environm
ent:user.dir=/home/ganesh
2019-02-18 02:26:36,823 [myid:] - INFO [main:ZooKeeper@441] - Initiating client
connection, connectString=:2181 sessionTimeout=30000 watcher=org.apache.zookeep
er.ZooKeeperMain$MyWatcher@4b9af9a9
Welcome to ZooKeeper!
2019-02-18 02:26:36,846 [myid:] - INFO [main-SendThread(localhost:2181):ClientC
nxn$SendThread@1028] - Opening socket connection to server localhost/127.0.0.1:2
181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2019-02-18 02:26:36,927 [myid:] - INFO [main-SendThread(localhost:2181):ClientC
nxn$SendThread@878] - Socket connection established to localhost/127.0.0.1:2181,
initiating session
2019-02-18 02:26:36,948 [myid:] - INFO [main-SendThread(localhost:2181):ClientC
nxn$SendThread@1302] - Session establishment complete on server localhost/127.0.
0.1:2181, sessionid = 0x1000001cfe00002, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: :2181(CONNECTED) 0]


Install Pre-requisites for Build

$ sudo apt-get install git

Setup environment

Add environment variables to profile file

# setup environments
export LANG="en_US.UTF-8"
export PATH=${HOME}/gradle/bin:$PATH
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-arm64
export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"
 

$ source ~/.bashrc


Hooking up upstream Maven 3.6.0 (for Debian Jessie only)

$ wget http://mirrors.gigenet.com/apache/maven/maven-3/3.6.0/binaries/apache-maven-3.6.0-bin.tar.gz

$ tar xvf apache-maven-3.6.0-bin.tar.gz 

$ cd apache-maven-3.6.0/bin 

$ export PATH=$PWD:$PATH

$ mvn --version # should list the version as 3.6.0

Clone and Build Apache Drill

$ git clone https://gitbox.apache.org/repos/asf/drill.git

$ cd drill

$ git branch v1.15.0 origin/1.15.0

$ git checkout v1.15.0


To build .deb package 

$ mvn clean -X package -Pdeb -DskipTests


To build .rpm package 

$ mvn clean -X package -Prpm -DskipTests


After successful compilation. Edit your computer /etc/hosts file and make sure that the loopback is commented. e.g. and replace with your host <IP-Address>

$ cd distribution/target/apache-drill-1.15.0/apache-drill-1.15.0

#127.0.0.1 localhost

#127.0.1.1 ubuntu


<IP-address> ubuntu
<IP-address> localhost


Because in distributed mode the loopback IP 127.0.1.1 cannot be binded reference https://stackoverflow.com/questions/40506221/how-to-start-drillbit-locally-in-distributed-mode

Next you need to edit the conf/drill-override.conf and change the zookeeper cluster ID e.g. as below

drill.exec:

{ cluster-id: "1", zk.connect: "<IP-address>:2181" }


Now you can run the drillbit and watchout the log. To play more with drillbit you can refer drill-override-example.conf file.

$ apache-drill-1.15.0$ ./bin/drillbit.sh help
Usage: drillbit.sh [--config|--site <site-dir>] (start|stop|status|restart|run|graceful_stop) [args]


In one of the terminal switch on the logs with the tail command

$ apache-drill-1.15.0$ tail -f log/drillbit.log

$ apache-drill-1.15.0$ ./bin/drillbit.sh start


Starting drillbit, logging to /mnt/nvme0n1p3/Projects/Apache-Components-Build/drill/distribution/target/apache-drill-1.15.0/apache-drill-1.15.0/log/drillbit.out

$ apache-drill-1.15.0$ ./bin/drillbit.sh status

drillbit is running.


$ apache-drill-1.15.0$ ./bin/drillbit.sh graceful_stop
Stopping drillbit
...


You can either stop or do a graceful stop. We can repeat the same steps on more than one machines (nodes).

I could able to run the Drill and access the http://IP-Address:8047 and run a sample querry in distributed mode. So In order to do in a distributed mode. I just need to do a similar setup on multiple machines (nodes). Reference - https://drill.apache.org/docs/starting-the-web-ui/


If you are using the CentOS 7   you should be little careful because the connection errors may be caused because of the firewall issues. I have used below set of commands to disable the firewall.

$ sudo systemctl stop firewalld

$ sudo firewall-cmd --zone=public --add-port=2181/udp --add-port=2181/tcp --permanent
[sudo] password for centos:
success

$ sudo firewall-cmd --reload
success

$ zkServer.sh restart
ZooKeeper JMX enabled by default
Using config: /home/centos/zookeeper-3.4.12/bin/../conf/zoo.cfg
ZooKeeper JMX enabled by default
Using config: /home/centos/zookeeper-3.4.12/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: /home/centos/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED


REFERENCE:

https://stackoverflow.com/questions/13316776/zookeeper-connection-error

https://www.tutorialspoint.com/zookeeper/index.htm

https://blog.redbranch.net/2018/04/19/zookeeper-install-on-centos-7/

https://drill.apache.org/docs/distributed-mode-prerequisites/