Friday, March 13, 2015

App Timeline Server not starting Or Downgrade Ambari

This can also be used as a downgrade guide from Ambari 1.7.0 to 1.6.1.

If using HDP 2.1.2 with Ambari 1.7.0 your App Timeline Server does not start, you come to the right place. 
Symptoms: running the ATS from Ambari throws:
Fail: Execution of ‘ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1′ returned 1.
All services work fine. I have set the recommended configuration for HDP 2.1.2
yarn.timeline-service.store-class = org.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore

The History Server is running fine. 

Not the ideal solution but it worked for me. Since this is kind of related to Ambari versions I reverted back to 1.6.1, steps:
1. Stopped and removed ambari server and all agents
2. Deleted repo and any directories for ambari
3. Downloaded and installed ambari 1.6.1
4. Re-configured/installed cluster, as HDP version remained the same
5. Formatted namenode and hbase
6. Change the config: 


yarn.timeline-service.store-class = org.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore
7. Start ATS, failed, checked logs for historyserver, error: 

Permission denied on /hadoop/yarn/timeline/leveldb.timeline-store.ldb/LOCK
8. Deleted the leveldb-timeline-store.ldb
9. Restarted ATS, worked fine!

Usually I never got this issue for other cluster installs using HDP 2.2 and Ambari 1.7.0.

Changing Storage in Hadoop

After I installed a HDP 2.1.2 cluster, I noticed that all the nodes were not using the drive partition planned for storage. The Linux boxes had OS partition and data data partition. Assigned during OS install one set to OS and other for data storage.

Somehow the data storage was not available on cluster installation most probably since it was not mounted. Following are the steps performed to change HDFS storage location, along with any drive configuration needed.

First format and optimized the partition or drive.
mkfs -t ext4 -m 1 -O dir_index,extent,sparse_super /dev/sdb

Create a mount directory
mkdir -p /disk/sdb1

Mount with optimized settings
mount -noatime -nodiratime /dev/sdb /disk/sdb1

Append to fstab file so that the partition is mounted on boot (very critical)
echo "/dev/sdb /disk/sdb1 ext4 defaults,noatime,nodiratime 1 2" >> /etc/fstab

Add folder for hdfs data
mkdir -p /disk/sdb1/data

Location to store Namenode data
mkdir -p /disk/sdb1/hdfs/namenode

Location to store Secondary Namenode
mkdir -p /disk/sdb1/hdfs/namesecondary

Set these in hdfs-site.xml or through Ambari
dfs.namenode.name.dir = /disk/sdb1/hdfs/namenode
dfs.namenode.name.dir = /disk/sdb1/hdfs/namesecondary
dfs.datanode.data.dir = /disk/sdb1/data

Set permissions
sudo chown -R hdfs:hadoop /disk/sdb1/data

Format namenode
hadoop namenode -format

Start namenode through ambari or CLI
hadoop namenode start

Start all nodes and services. The new drive should be listed.

References:
http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud


Thursday, December 11, 2014

RStudio setup on Hortonworks Hadoop 2.1 Cluster

Here is a complete set of steps I performed to set up R and RStudio on a small cluster.

Installing R and Rstudio
-- R should be installed on the node which have Hive server
-- RStudio can be installed anywhere. (I installed on edge node)

sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo yum -y install git wget R
ls /etc/default
sudo ln -s /etc/default/hadoop /etc/profile.d/hadoop.sh
cat /etc/profile.d/hadoop.sh | sed 's/export //g' > ~/.Renviron

Check latest version of RStudio @
http://www.rstudio.com/products/rstudio/download-server/
(It should have installation steps, follow those)
Listing them here for completion with current release version)
$ sudo yum install openssl098e # Required only for RedHat/CentOS 6 and 7
$ wget http://download2.rstudio.org/rstudio-server-0.98.1091-x86_64.rpm
$ sudo yum install --nogpgcheck rstudio-server-0.98.1091-x86_64.rpm

Create a new system user and set password
sudo useradd rstudio
sudo passwd rstudio
>> hadoop

Login to RStudio at http://hostname:8787

Install required packages either from
In RStudio >> Tool >> Install packages
OR
install.packages( c('RJSONIO', 'itertools', 'digest', 'Rcpp', 'functional', 'plyr', 'stringr'), repos='http://cran.revolutionanalytics.com')
install.packages( c('bitops', 'reshape2'), repos='http://cran.revolutionanalytics.com')
install.packages( c('RHive'), repos='http://cran.revolutionanalytics.com')

Download latest rmr2 package from:
https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads
Winscp tar.gz file and install through Rstudio

### Need to run every time RStudio is initialized or restarted
Set environment variables in RStudio
Sys.setenv(HADOOP_HOME="your hadoop installation directory here e.g. /usr/lib/hadoop")
Sys.setenv(HIVE_HOME="your hive installation directory here e.g. /usr/lib/hive")
XX Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf/") do not execute!

Sys.setenv("RHIVE_FS_HOME"="your RHive installation directory here e.g. /home/rhive")
This needs to be local directory on the node with hive installed, create one if doesnt exist. The user created (rstudio) have chown -R rights on this local directory.
If not this is the error:
Error: java.io.IOException: Mkdirs failed to create file:/home/rhive/lib/2.0-0.2

library(RHive)
rhive.init()
rhive.connect(host="IP ADDRESS/Hostname", port=10000, hiveServer2=TRUE)

If error
Error: java.sql.SQLException: Error while processing statement: file:///rhive/lib/2.0-0.2/rhive_udf.jar does not exist.
check if the jar file is in the said directory and rstudio user has permission on it.

Hope it helps.

Cheers!

References and Thanks:
http://jsolderitsch.wordpress.com/hortonworks-sandbox-r-and-rstudio-install/

Monday, December 9, 2013

Excel + SQL Server Linked Server Troubleshooting

Following is a great article to troubleshoot errors thrown by OLEDB when connecting an Excel file from OPENROWSET query:

OLE DB provider "MICROSOFT.JET.OLEDB.4.0" for linked server "(null)" returned message "Unspecified error".

Gives step by step resolution to multiple errors like:
- Cannot initialize the data source object of OLE DB provider "MICROSOFT.JET.OLEDB.4.0" for linked server "(null)".
- Cannot get the column information from OLE DB provider "MICROSOFT.JET.OLEDB.4.0" for linked server "(null)".
SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part
of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure.
For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online.

On client deployments my recurring issue was setting my access to temp folders for service accounts.

Tuesday, November 26, 2013

Build Hierarchy From Delimited String

In this post we create a parent child hierarchy (n level) from a delimited string like [Parent_Child_LeafChild].

What we are looking for is taking a n length string like "Stage1_Stage2_Stage3" and moving it into a table to build this:
Stage1
--Stage2
----Stage3

DECLARE @separator_position INT -- This is used to locate each separator character
DECLARE @array_value VARCHAR(1000)-- this holds each array value as it is returned
DECLARE @separator char(1) --Used in WHERE clause

declare @StageLevel varchar(255) = 'Stage1_Stage2_Stage3'
declare @holder varchar(255) = @stagelevel
SET @separator = '_' --Separator A.K.A. Delimiter
SET @StageLevel = @StageLevel + @separator --append ',' at the end
-- select PATINDEX('%[' + @separator + ']%', @StageLevel)
Declare @Level int = 0
Declare @ParentId int

  WHILE PATINDEX('%[' + @separator + ']%', @StageLevel) <> 0
      BEGIN -- patindex matches the a pattern against a string
             SELECT @separator_position = PATINDEX('%[' + @separator + ']%',@StageLevel)

             SELECT @array_value = LEFT(@StageLevel, @separator_position - 1) 
                      set @level = @level +1

                     --select @array_value, @level
                     IF @level =1 --This is parent node
                     Begin
                           insert into Hierarchy (StageLevelName, StageLevel)    
                           values (@array_value, @Level)
                     END   
                     ELSE
                     BEGIN
                           insert into Hierarchy (StageLevelName, StageLevel, ParentID)
                           values (@array_value, @Level, @ParentId)
                     END
                     set @ParentId = @@IDENTITY
                     select @ParentId
                     --Moving to end of array
                     SELECT @StageLevel = STUFF( @StageLevel, 1, @separator_position, '')
      END

The end result will be a table with each row as a recursive hierarchy. Plus it has level information for depth.

Wednesday, October 16, 2013

Moving Closer To MCSA: SQL Server 2012



Last week I passed:
Exam 457 Transition your MCTS on SQL Server 2008 to MCSA: SQL Server 2012 -Part 1

Which will help me to qualify for Microsoft Certified Solutions Associate for SQL Server 2012. Now am preparing for 458 (Part 2).


The exam was moderately hard and I had doubts about couple of questions. Anyways getting through the course material helped me a lot in knowing SQL Server 2012.


Cheers!

Wednesday, June 20, 2012

Content Organizer Rule Manager for SharePoint 2010


I am proud to release the beta version of Content Organizer Rule Manager (codename CORMa) for SharePoint 2010 today.

Background:
Working on a SharePoint 2010 enterprise project we had tons of content types on differest site. Creating and tracking content organizer rules from SharePoint interface was tedious going back and forth between pages. There is no option in SharePoint Designer to manager content organizer rules. Lastly there was a need to bulk create rules as some of our libraries had multiple content types associated.

CORMa to the rescue!
  • List and create content organizer rules in bulk.
  • Delete rules.
  • Works for all field types inlcuing Taxonomy fields.
  • Export all site content types with associated lists to text file.
  • Check inherent SharePoint rules for rule creation.
Download, test and give feedback...