<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.0">Jekyll</generator><link href="https://risdenk.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://risdenk.github.io/" rel="alternate" type="text/html" /><updated>2022-01-07T13:10:44-06:00</updated><id>https://risdenk.github.io/feed.xml</id><title type="html">Kevin Risden’s Blog</title><subtitle>Kevin Risden's Blog</subtitle><entry><title type="html">My Development Environment 2018</title><link href="https://risdenk.github.io/2018/12/06/my-development-environment-2018.html" rel="alternate" type="text/html" title="My Development Environment 2018" /><published>2018-12-06T08:00:00-06:00</published><updated>2018-12-06T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/12/06/my-development-environment-2018</id><content type="html" xml:base="https://risdenk.github.io/2018/12/06/my-development-environment-2018.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;I was asked the other day about what my development environment looked like since I was able to test a lot of different configurations quickly. I am writing this post to capture some of the stuff I do to be able to iterate quickly. First some background on why it has historically been important for me to be able to change test environments quickly.&lt;/p&gt;

&lt;h4 id=&quot;background&quot;&gt;Background&lt;/h4&gt;
&lt;p&gt;I previously worked as a software consultant with &lt;a href=&quot;https://www.avalonconsult.com/&quot;&gt;Avalon Consulting, LLC&lt;/a&gt;. We worked on a variety of projects for a number of different clients. Some of the projects were long and others were shorter. I focused primarily on big data and search. &lt;a href=&quot;https://hadoop.apache.org/&quot;&gt;Apache Hadoop&lt;/a&gt; with security has a lot of different configurations. It wasn’t practical to spin up cloud environments (hotel wifi sucks) for each little test. This meant I needed to find a way to test things on my 8GB Macbook Pro.&lt;/p&gt;

&lt;h3 id=&quot;development-laptop&quot;&gt;Development Laptop&lt;/h3&gt;
&lt;p&gt;I currently have 2 laptops for development. A 2012 8GB RAM Macbook Pro that is starting to show its age, but was worth every penny. A second work laptop that I won’t go into too much detail. Both laptops are configured very similarly. Key software includes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.iterm2.com/&quot;&gt;iTerm2&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://brew.sh/&quot;&gt;Homebrew&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.zsh.org/&quot;&gt;Zsh&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/robbyrussell/oh-my-zsh&quot;&gt;oh-my-zsh&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://git-scm.com/&quot;&gt;git&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.docker.com/docker-for-mac/&quot;&gt;Docker for Mac&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.virtualbox.org/&quot;&gt;VirtualBox&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.vagrantup.com/&quot;&gt;Vagrant&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.jetbrains.com/idea/&quot;&gt;IntelliJ IDEA Ultimate&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.google.com/chrome/&quot;&gt;Chrome&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I use my terminal quite a bit. I use it for git, ssh, docker, vagrant, etc. I typically leave my terminal up at all times since I am usually running something. I jump between Docker and Vagrant/Virtualbox quite a bit. There are lots of security and distributed computing setups where proper hostnames and DNS resolution works better with full virtual machines. There are fewer gotchas if you know you are working with “real” machines instead of fighting with Docker networking and DNS.&lt;/p&gt;

&lt;p&gt;I owe a big shoutout to &lt;a href=&quot;https://travis-ci.com/&quot;&gt;Travis CI&lt;/a&gt; since I use them a lot for my open source projects. I typically push a git branch to my Github fork and let Travis CI go to work. This allows me to work on multiple things at once when tests take 10s of minutes.&lt;/p&gt;

&lt;h3 id=&quot;intel-nuc-server&quot;&gt;Intel NUC Server&lt;/h3&gt;
&lt;p&gt;I recently added an &lt;a href=&quot;https://simplynuc.com/8i5beh-kit/&quot;&gt;Intel NUC&lt;/a&gt; to my development setup to help with offloading some of the long running tests from my laptop. It also has more RAM and CPU power that allows me to run continuous integration jobs as well as more Vagrant VMs. Some of the software I have running on my Intel NUC (mostly as Docker containers):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Dnsmasq&quot;&gt;Dnsmasq&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://jenkins.io/&quot;&gt;Jenkins&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.sonatype.com/download-oss-sonatype&quot;&gt;Nexus 3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://gogs.io/&quot;&gt;Gogs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.sonarqube.org/&quot;&gt;Sonarqube&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dnsmasq ensures that I can get a consistent DNS both on my Intel NUC and within my private network. Jenkins runs most of my continuous integration builds. It helps keep track of logs and allows me to spin up jobs for different purposes (like repeatedly testing a feature branch). Jenkins spins up separate Docker containers for each build so I don’t have to worry about dependency conflicts. Nexus allows me to cache Maven repositories, Docker images, static files, and more. This ensures that I don’t need to wait to redownload the same dependencies over and over again. Gogs is a standalone Git server that painlessly lets me mirror repos internally. This avoids me having to pull big repos from the internet over and over again. Sonarqube enables me to run some additional static build checks against some of the Jenkins builds.&lt;/p&gt;

&lt;h3 id=&quot;yubikey&quot;&gt;Yubikey&lt;/h3&gt;
&lt;p&gt;I want to talk a little bit about my use of a Yubikey. I had been thinking about getting one for a few years and finally got one when Yubikey 5 came out. I use it all the time now for GPG and SSH. I am able to not store any private keys on my new devices and can even SSH from a Chromebook back to my server if necessary. I configured my Yubikey to handle both GPG for signing and authentication. This allows me to use GPG with SSH as well. The GPG agent takes a little configuring, but once setup you can easily use it for both GPG and SSH. I wish more websites supported U2F instead of OATH/Authenticator codes. I like the simplicity and would recommend it for most developers.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;My setup hasn’t changed too much over the past 5 years when it comes to development laptops. I have started to use more cloud based automated testing like Travis CI. I added the Intel NUC to be able to do more testing internally across bigger VMs. I will say that I have learned more trying to fit a distributed system on an 8GB RAM laptop than anything else. (Who else can say they have run Hadoop on 3 Linux VMs and 1 Windows AD VM on 8GB of RAM). Who knows what is to come in 2019, but I am happy and productive with what I have in 2018.&lt;/p&gt;</content><author><name></name></author><category term="development" /><category term="environment" /><category term="2018" /><summary type="html">Overview I was asked the other day about what my development environment looked like since I was able to test a lot of different configurations quickly. I am writing this post to capture some of the stuff I do to be able to iterate quickly. First some background on why it has historically been important for me to be able to change test environments quickly.</summary></entry><entry><title type="html">Apache Hadoop YARN - “Vulnerability” FUD</title><link href="https://risdenk.github.io/2018/12/04/apache-yarn-vulnerability-fud.html" rel="alternate" type="text/html" title="Apache Hadoop YARN - “Vulnerability” FUD" /><published>2018-12-04T08:00:00-06:00</published><updated>2018-12-04T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/12/04/apache-yarn-vulnerability-fud</id><content type="html" xml:base="https://risdenk.github.io/2018/12/04/apache-yarn-vulnerability-fud.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;There are reports of an &lt;a href=&quot;http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html&quot;&gt;Apache Hadoop YARN&lt;/a&gt; “vulnerability” but want to share some more details that have missed the few articles I’ve come across. Here are a few of the articles/links:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.extrahop.com/company/blog/2018/detect-demonbot-exploiting-hadoop-yarn-remote-code-execution/&quot;&gt;https://www.extrahop.com/company/blog/2018/detect-demonbot-exploiting-hadoop-yarn-remote-code-execution/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/vulhub/vulhub/blob/master/hadoop/unauthorized-yarn/exploit.py&quot;&gt;https://github.com/vulhub/vulhub/blob/master/hadoop/unauthorized-yarn/exploit.py&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;demonbot-vulnerability-requires-an-unsecure-cluster&quot;&gt;Demonbot vulnerability requires an &lt;strong&gt;unsecure&lt;/strong&gt; cluster&lt;/h3&gt;
&lt;p&gt;The key point I want to make is that the report misleads the reader to assume that all Apache Hadoop YARN environments are insecure. This is &lt;strong&gt;false&lt;/strong&gt;. The clusters described have no security and are akin to having your front door unlocked. Kerberized clusters are secure since they require a valid user account to be usable. Furthermore, clusters should not be exposed to the internet for most usecases (especially not endpoints that allow for remote job submission).&lt;/p&gt;

&lt;h3 id=&quot;explain-the-vulnerability-like-im-five&quot;&gt;Explain the “vulnerability” like I’m five&lt;/h3&gt;
&lt;p&gt;Imagine that one day you get home and find a whole bunch of extra lamps plugged into your outlets. You are annoyed because the lamps are using your electicity. You remember that you forgot to lock your door when you went on vacation. Instead of someone stealing stuff from your home, they decided to plug in lamps.&lt;/p&gt;

&lt;p&gt;Now you might be thinking, it is expected that something bad would happen if you left your door unlocked when you went on vacation. This is the exact same thing as an unsecure Apache Hadoop YARN cluster. No one should leave their cluster unsecured and exposed to the outside world.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;There have been multiple reports of “big data” endpoints being exposed to the internet and not being secured. This has affected Elasticsearch, Mongodb, and others. There is no reason to expose a cluster to the internet without security. Cloudera wrote a blog post that covers the same topic as well &lt;a href=&quot;https://blog.cloudera.com/blog/2018/11/protecting-hadoop-clusters-from-malware-attacks/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="hadoop" /><category term="yarn" /><category term="security" /><category term="vulnerability" /><category term="fud" /><summary type="html">Overview There are reports of an Apache Hadoop YARN “vulnerability” but want to share some more details that have missed the few articles I’ve come across. Here are a few of the articles/links:</summary></entry><entry><title type="html">Apache Solr - Hide/Redact Senstive Properties</title><link href="https://risdenk.github.io/2018/11/27/apache-solr-hide-redact-sensitive-properties.html" rel="alternate" type="text/html" title="Apache Solr - Hide/Redact Senstive Properties" /><published>2018-11-27T08:00:00-06:00</published><updated>2018-11-27T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/11/27/apache-solr-hide-redact-sensitive-properties</id><content type="html" xml:base="https://risdenk.github.io/2018/11/27/apache-solr-hide-redact-sensitive-properties.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://lucene.apache.org/solr&quot;&gt;Apache Solr&lt;/a&gt; is a full text search engine that is built on &lt;a href=&quot;https://lucene.apache.org/solr/&quot;&gt;Apache Lucene&lt;/a&gt;. One of the common questions on the &lt;a href=&quot;http://lucene.apache.org/solr/community.html#mailing-lists-irc&quot;&gt;solr-user&lt;/a&gt; mailing list (ie: &lt;a href=&quot;http://lucene.472066.n3.nabble.com/Disabling-jvm-properties-from-ui-td4413066.html&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://lucene.472066.n3.nabble.com/jira-Commented-SOLR-11369-Zookeeper-credentials-are-showed-up-on-the-Solr-Admin-GUI-td4405383.html&quot;&gt;here&lt;/a&gt;) is how to hide sensitive values from the &lt;a href=&quot;https://lucene.apache.org/solr/guide/7_5/overview-of-the-solr-admin-ui.html&quot;&gt;Solr UI&lt;/a&gt;. There is a little known setting that enables hiding these sensitive values.&lt;/p&gt;

&lt;h3 id=&quot;apache-solr-and-hiding-sensitive-properties&quot;&gt;Apache Solr and Hiding Sensitive Properties&lt;/h3&gt;
&lt;p&gt;Apache Solr has a few places where sensitive values can be seen on the Solr UI. The keystore and truststore passwords are two examples that came up as part of &lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-10076&quot;&gt;SOLR-10076&lt;/a&gt;. Starting in Solr 6.6 and 7.0, Solr will hide any property in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/admin/info/system&lt;/code&gt; API that contains the word &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;password&lt;/code&gt; when the system property &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;solr.redaction.system.enabled&lt;/code&gt; is set to true. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/admin/info/system&lt;/code&gt; API is used to power the Solr UI. This works well for most cases, but the implementation is more generic enabling it to hide any custom properties.&lt;/p&gt;

&lt;p&gt;The property &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;solr.redaction.system.pattern&lt;/code&gt; is a system property that takes a regular expression. If the regular expression matches the property name then the system property value will be redacted. This can enable hiding sensitive values for custom libraries or other use cases.&lt;/p&gt;

&lt;p&gt;The table below lays out the two properties that can be configured in Solr 6.6 or later.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Property&lt;/th&gt;
      &lt;th&gt;Default Value&lt;/th&gt;
      &lt;th&gt;Purpose&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;solr.redaction.system.enabled&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt; in Solr 6.6; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; in Solr 7.0&lt;/td&gt;
      &lt;td&gt;Enables or disables the redaction&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;solr.redaction.system.pattern&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.*password.*&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Regex for the properties to redact&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;apache-solr-and-hiding-metrics-properties&quot;&gt;Apache Solr and Hiding Metrics Properties&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&quot;https://lucene.apache.org/solr/guide/7_5/metrics-reporting.html&quot;&gt;Solr Metrics API&lt;/a&gt; can leak sensitive information as well. There is a &lt;a href=&quot;https://lucene.apache.org/solr/guide/7_5/metrics-reporting.html#the-metrics-hiddensysprops-element&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hiddenSysProps&lt;/code&gt; configuration&lt;/a&gt; that can prevent certain properties from being exposed via the metrics API. If additional properties need to be hidden then they need to be configured in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hiddenSysPropes&lt;/code&gt; section.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Currently, there is limited documentation about the available options for hiding sensitive values. It is frustrating to have to configure hiding sensitive values in two places, but there is hope for improvement. &lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-12976&quot;&gt;SOLR-12976&lt;/a&gt; was created earlier this month to try to address the duplication and documentation.&lt;/p&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="solr" /><category term="security" /><category term="hide" /><category term="redact" /><category term="sensitive" /><summary type="html">Overview Apache Solr is a full text search engine that is built on Apache Lucene. One of the common questions on the solr-user mailing list (ie: here and here) is how to hide sensitive values from the Solr UI. There is a little known setting that enables hiding these sensitive values.</summary></entry><entry><title type="html">Apache Solr - Hadoop Authentication Plugin - LDAP</title><link href="https://risdenk.github.io/2018/11/20/apache-solr-hadoop-authentication-plugin-ldap.html" rel="alternate" type="text/html" title="Apache Solr - Hadoop Authentication Plugin - LDAP" /><published>2018-11-20T08:00:00-06:00</published><updated>2018-11-20T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/11/20/apache-solr-hadoop-authentication-plugin-ldap</id><content type="html" xml:base="https://risdenk.github.io/2018/11/20/apache-solr-hadoop-authentication-plugin-ldap.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://lucene.apache.org/solr&quot;&gt;Apache Solr&lt;/a&gt; is a full text search engine that is built on &lt;a href=&quot;https://lucene.apache.org/solr/&quot;&gt;Apache Lucene&lt;/a&gt;. One of the questions I’ve been asked about in the past is LDAP support for Apache Solr authentication. While there are commercial additions that add LDAP support like &lt;a href=&quot;https://lucidworks.com/products/fusion-server/&quot;&gt;Lucidworks Fusion&lt;/a&gt;, Apache Solr doesn’t have an LDAP authentication plugin out of the box. Lets explore what the current state of authentication is with Apache Solr.&lt;/p&gt;

&lt;h3 id=&quot;apache-solr-and-authentication&quot;&gt;Apache Solr and Authentication&lt;/h3&gt;
&lt;p&gt;Apache Solr 5.2 released with a pluggable authentication module from &lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-7274&quot;&gt;SOLR-7274&lt;/a&gt;. This paved the way for future authentication implementations such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BasicAuth&lt;/code&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-7692&quot;&gt;SOLR-7692&lt;/a&gt;) and Kerberos (&lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-7468&quot;&gt;SOLR-7468&lt;/a&gt;). In Apache Solr 6.1, delegation token support (&lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9200&quot;&gt;SOLR-9200&lt;/a&gt;) was added to the Kerberos authentication plugin. Apache Solr 6.4 added a significant feature for hooking the &lt;a href=&quot;https://hadoop.apache.org/docs/current/hadoop-auth/Configuration.html&quot;&gt;Hadoop authentication framework&lt;/a&gt; directly into Solr as an authentication plugin (&lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9513&quot;&gt;SOLR-9513&lt;/a&gt;). There haven’t been much more work on authentication plugins lately. Some work is being done to add a JWT authentication plugin currently (&lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-12121&quot;&gt;SOLR-12121&lt;/a&gt;). Each Solr authentication plugin provides additional capabilities for authenticating to Solr.&lt;/p&gt;

&lt;h3 id=&quot;hadoop-authentication-ldap-and-apache-solr&quot;&gt;Hadoop Authentication, LDAP, and Apache Solr&lt;/h3&gt;
&lt;h4 id=&quot;hadoop-authentication-framework-overview&quot;&gt;Hadoop Authentication Framework Overview&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&quot;https://hadoop.apache.org/docs/current/hadoop-auth/Configuration.html&quot;&gt;Hadoop authentication framework&lt;/a&gt; provides additional capabilities since it has added backends. The backends currently include Kerberos, AltKerberos, LDAP, SignerSecretProvider, and Multi-scheme. Each can be configured to support varying needs for authentication.&lt;/p&gt;

&lt;h4 id=&quot;apache-solr-and-hadoop-authentication-framework&quot;&gt;Apache Solr and Hadoop Authentication Framework&lt;/h4&gt;
&lt;p&gt;Apache Solr 6.4+ supports the Hadoop authentication framework due to the work of &lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9513&quot;&gt;SOLR-9513&lt;/a&gt;. The &lt;a href=&quot;https://lucene.apache.org/solr/guide/7_5/hadoop-authentication-plugin.html&quot;&gt;Apache Solr reference guide&lt;/a&gt; provides guidance on how to use the Hadoop Authentication Plugin. All the necessary configuration parameters can be passed down to the Hadoop authentication framework. As more backends are added to the Hadoop authentication framework, Apache Solr just needs to upgrade the Hadoop depdendency to gain support.&lt;/p&gt;

&lt;h4 id=&quot;apache-solr-75-and-ldap&quot;&gt;Apache Solr 7.5 and LDAP&lt;/h4&gt;
&lt;p&gt;LDAP support for the Hadoop authentication framework was added in Hadoop 2.8.0 (&lt;a href=&quot;https://issues.apache.org/jira/browse/HADOOP-12082&quot;&gt;HADOOP-12082&lt;/a&gt;). Sadly, the Hadoop dependency for Apache Solr 7.5 is only on &lt;a href=&quot;https://github.com/apache/lucene-solr/blob/branch_7_5/lucene/ivy-versions.properties#L156&quot;&gt;2.7.4&lt;/a&gt;. This means that when you try to configure the HadoopAuthenticationPlugin` with LDAP, you will get the following error:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Error initializing org.apache.solr.security.HadoopAuthPlugin: 
javax.servlet.ServletException: java.lang.ClassNotFoundException: ldap
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;manually-upgrading-the-apache-solr-hadoop-dependency&quot;&gt;Manually Upgrading the Apache Solr Hadoop Dependency&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; I don’t recommend doing this outside of experimenting and seeing what is possible.&lt;/p&gt;

&lt;p&gt;I put together a &lt;a href=&quot;https://github.com/risdenk/test-solr-hadoopauthenticationplugin-ldap&quot;&gt;simple test project&lt;/a&gt; that “manually” replaces the Hadoop 2.7.4 jars with 2.9.1 jars. This was designed to test if it is possible to configure the Solr &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HadoopAuthenticationPlugin&lt;/code&gt; with LDAP. I was able to configure Solr using the following &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;security.json&lt;/code&gt; file to use the Hadoop 2.9.1 LDAP backend.&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;authentication&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;class&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;solr.HadoopAuthPlugin&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;sysPropPrefix&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;solr.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;type&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ldap&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;authConfigs&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ldap.providerurl&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ldap.basedn&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ldap.enablestarttls&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;defaultConfigs&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;ldap.providerurl&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ldap://ldap&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;ldap.basedn&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;dc=example,dc=org&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;ldap.enablestarttls&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;false&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With this configuration and the Hadoop 2.9.1 jars, Apache Solr was protected by LDAP. There should be more testing done to see how this plays with multiple nodes and other types of integration required. The Hadoop authentication framework has limited support for LDAP but it should be usable for some usecases.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Apache Solr, as of 7.5, is currently limited as to what support it has for the Hadoop authentication framework. This is due to the depenency on Apache Hadoop 2.7.4. When the Hadoop dependency is updated (&lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9515&quot;&gt;SOLR-9515&lt;/a&gt;) in Apache Solr, there will be at least some initial support for LDAP integration out of the box with Solr.&lt;/p&gt;

&lt;h4 id=&quot;references&quot;&gt;References&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://lucene.apache.org/solr/guide/7_5/securing-solr.html&quot;&gt;https://lucene.apache.org/solr/guide/7_5/securing-solr.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://lucene.apache.org/solr/guide/7_5/hadoop-authentication-plugin.html&quot;&gt;https://lucene.apache.org/solr/guide/7_5/hadoop-authentication-plugin.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9513&quot;&gt;https://issues.apache.org/jira/browse/SOLR-9513&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://stackoverflow.com/questions/50647431/ldap-integration-with-solr&quot;&gt;https://stackoverflow.com/questions/50647431/ldap-integration-with-solr&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://community.hortonworks.com/questions/130989/solr-ldap-integration.html&quot;&gt;https://community.hortonworks.com/questions/130989/solr-ldap-integration.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/lucene-solr/blob/branch_7_5/lucene/ivy-versions.properties#L156&quot;&gt;https://github.com/apache/lucene-solr/blob/branch_7_5/lucene/ivy-versions.properties#L156&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HADOOP-12082&quot;&gt;https://issues.apache.org/jira/browse/HADOOP-12082&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="solr" /><category term="hadoop" /><category term="authentication" /><category term="security" /><category term="ldap" /><category term="configuration" /><summary type="html">Overview Apache Solr is a full text search engine that is built on Apache Lucene. One of the questions I’ve been asked about in the past is LDAP support for Apache Solr authentication. While there are commercial additions that add LDAP support like Lucidworks Fusion, Apache Solr doesn’t have an LDAP authentication plugin out of the box. Lets explore what the current state of authentication is with Apache Solr.</summary></entry><entry><title type="html">Apache Hadoop - TLS and SSL Notes</title><link href="https://risdenk.github.io/2018/11/15/apache-hadoop-tls-ssl-notes.html" rel="alternate" type="text/html" title="Apache Hadoop - TLS and SSL Notes" /><published>2018-11-15T08:00:00-06:00</published><updated>2018-11-15T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/11/15/apache-hadoop-tls-ssl-notes</id><content type="html" xml:base="https://risdenk.github.io/2018/11/15/apache-hadoop-tls-ssl-notes.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;I’ve collected notes on &lt;a href=&quot;https://en.wikipedia.org/wiki/Transport_Layer_Security&quot;&gt;TLS/SSL&lt;/a&gt; for a number of years now. Most of them are related to &lt;a href=&quot;https://hadoop.apache.org/&quot;&gt;Apache Hadoop&lt;/a&gt;, but others are more general. I was consulting when the &lt;a href=&quot;https://en.wikipedia.org/wiki/POODLE&quot;&gt;POODLE&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/wiki/Heartbleed&quot;&gt;Heartbleed&lt;/a&gt; vulnerabilities were released. Below is a collection of TLS/SSL related references. No guarantee they are up to date but it helps to have references in one place.&lt;/p&gt;

&lt;h3 id=&quot;tlsssl-general&quot;&gt;TLS/SSL General&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Great explaination of TLS/SSL: &lt;a href=&quot;http://www.zytrax.com/tech/survival/ssl.html&quot;&gt;http://www.zytrax.com/tech/survival/ssl.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;SSL Linux certificate location: &lt;a href=&quot;http://serverfault.com/questions/62496/ssl­certificate­location­on­unix­linux&quot;&gt;http://serverfault.com/questions/62496/ssl­certificate­location­on­unix­linux&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;SSL vs TLS: &lt;a href=&quot;http://security.stackexchange.com/questions/5126/whats­the­difference­between­ssl­tls­and­https&quot;&gt;http://security.stackexchange.com/questions/5126/whats­the­difference­between­ssl­tls­and­https&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;certificate-types&quot;&gt;Certificate Types&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://unmitigatedrisk.com/?p=381&quot;&gt;http://unmitigatedrisk.com/?p=381&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_guide_ssl_certs.html&quot;&gt;http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_guide_ssl_certs.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;generating-certificates&quot;&gt;Generating Certificates&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.sslshopper.com/article-most-common-openssl-commands.html&quot;&gt;https://www.sslshopper.com/article-most-common-openssl-commands.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://support.ssl.com/Knowledgebase/Article/View/19/0/der-vs-crt-vs-cer-vs-pem-certificates-and-how-to-convert-them&quot;&gt;https://support.ssl.com/Knowledgebase/Article/View/19/0/der-vs-crt-vs-cer-vs-pem-certificates-and-how-to-convert-them&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;existing-certificate-and-key-to-jks&quot;&gt;Existing Certificate and Key to JKS&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://stackoverflow.com/questions/11952274/how­can­i­create­keystore­from­an­existing­certificate­abc­crt­and­abc­key­fil&quot;&gt;http://stackoverflow.com/questions/11952274/how­can­i­create­keystore­from­an­existing­certificate­abc­crt­and­abc­key­fil&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;openssl pkcs12 ‐export ‐in abc.crt ‐inkey abc.key ‐out abc.p12
keytool ‐importkeystore ‐srckeystore abc.p12 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
        ‐srcstoretype PKCS12 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
        ‐destkeystore abc.jks &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
        ‐deststoretype JKS
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;trusting-ca-certificates&quot;&gt;Trusting CA Certificates&lt;/h3&gt;
&lt;h4 id=&quot;openssl&quot;&gt;OpenSSL&lt;/h4&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;update‐ca‐trust force‐enable
cp CERT.pem /etc/pki/tls/source/anchors/
update‐ca‐trust extract
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;openldap&quot;&gt;OpenLDAP&lt;/h4&gt;
&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vi /etc/openldap/ldap.conf&lt;/code&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;...
TLS_CAFILE /etc/pki/
# Comment out TLS_CERTDIR
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;java&quot;&gt;Java&lt;/h4&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/usr/java/JAVA_VERSION/jre/lib/security/cacerts
/etc/pki/ca‐trust/extracted/java/cacerts
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=1056224&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=1056224&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;poodle--sslv3&quot;&gt;POODLE ­ SSLv3&lt;/h3&gt;
&lt;h4 id=&quot;what-is-poodle&quot;&gt;What is POODLE?&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://poodle.io/servers.html&quot;&gt;https://poodle.io/servers.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.openssl.org/docs/apps/ciphers.html#SSL­v3.0­cipher­suites&quot;&gt;https://www.openssl.org/docs/apps/ciphers.html#SSL­v3.0­cipher­suites&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;testing-for-poodle&quot;&gt;Testing for POODLE&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://chrisburgess.com.au/how-to-test-for-the-sslv3-poodle-vulnerability/&quot;&gt;https://chrisburgess.com.au/how-to-test-for-the-sslv3-poodle-vulnerability/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Requires a relatively recent version of openssl installed
openssl s_client ‐connect HOST:PORT ‐ssl3
# ‐tls1 ‐tls1_1 ‐tls1_2
curl ‐v3 ‐i ‐X HEAD https://HOST:PORT
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;configuring-hadoop-for-cipher-suites-and-protocols&quot;&gt;Configuring Hadoop for Cipher Suites and Protocols&lt;/h3&gt;
&lt;p&gt;Each Hadoop component must be configured or have the proper version to disable certain SSL protocols and versions.&lt;/p&gt;

&lt;h4 id=&quot;ambari&quot;&gt;Ambari&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-advanced-security-options-for-ambari/content/ambari_sec_optional_configure_ciphers_and_protocols_for_ambari_server.html&quot;&gt;https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-advanced-security-options-for-ambari/content/ambari_sec_optional_configure_ciphers_and_protocols_for_ambari_server.html&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;security.server.disabled.ciphers=TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;security.server.disabled.protocols=SSL|SSLv2|SSLv3&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;hadoop&quot;&gt;Hadoop&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HADOOP-11243&quot;&gt;https://issues.apache.org/jira/browse/HADOOP-11243&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Hadoop 2.5.2 + 2.6 ­ Patches SSLFactory for TLSv1&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hadoop.ssl.enabled.protocols=TLSv1&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;(JDK6 can use TLSv1, JDK7+ can use TLSv1,TLSv1.1,TLSv1.2)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HADOOP-11218&quot;&gt;https://issues.apache.org/jira/browse/HADOOP-11218&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Hadoop 2.8 ­ Patches SSLFactory for TLSv1.1 and TLSv1.2&lt;/li&gt;
      &lt;li&gt;Java 6 doesn’t support TLSv1.1+. Requires Java 7.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HADOOP-11260&quot;&gt;https://issues.apache.org/jira/browse/HADOOP-11260&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Hadoop 2.5.2 + 2.6 ­ Patches Jetty to disable SSLv3&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;httpfs&quot;&gt;HTTPFS&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HDFS-7274&quot;&gt;https://issues.apache.org/jira/browse/HDFS-7274&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Hadoop 2.5.2 + 2.6 ­ Disables SSLv3 in HTTPFS&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;hive&quot;&gt;Hive&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HIVE-8675&quot;&gt;https://issues.apache.org/jira/browse/HIVE-8675&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Hive 0.14 ­ Removes SSLv3 from supported protocols&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.ssl.protocol.blacklist&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HIVE-8827&quot;&gt;https://issues.apache.org/jira/browse/HIVE-8827&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Hive 1.0 ­ Adds &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SSLv2Hello&lt;/code&gt; back to supported protocols&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hive.ssl.protocol.blacklist=SSLv2,SSLv3&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;oozie&quot;&gt;Oozie&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/OOZIE-2034&quot;&gt;https://issues.apache.org/jira/browse/OOZIE-2034&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Oozie 4.1.0 ­ Disable SSLv3&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/OOZIE-2037&quot;&gt;https://issues.apache.org/jira/browse/OOZIE-2037&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Add support for TLSv1.1 and TLSv1.2&lt;/li&gt;
      &lt;li&gt;Java 6 doesn’t support TLSv1.1+. Requires Java 7. Depends on OOZIE­2036&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;flume&quot;&gt;Flume&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLUME-2520&quot;&gt;https://issues.apache.org/jira/browse/FLUME-2520&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Flume 1.5.1 ­ HTTPSource disable SSLv3&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;hue&quot;&gt;Hue&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.cloudera.org/browse/HUE-2438&quot;&gt;https://issues.cloudera.org/browse/HUE-2438&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Hue 3.8 ­ Disable SSLv3&lt;/li&gt;
      &lt;li&gt;line 1670 of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/usr/lib/hue/desktop/core/src/desktop/lib/wsgiserver.py&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ctx.set_options(SSL.OP_NO_SSLv2 | SSL.OP_NO_SSLv3)&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssl_cipher_list = &quot;DEFAULT:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2&quot;&lt;/code&gt; (default)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;ranger&quot;&gt;Ranger&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/RANGER-158&quot;&gt;https://issues.apache.org/jira/browse/RANGER-158&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Ranger 0.4.0 ­ Ranger Admin and User Authentication disable SSLv3&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;knox&quot;&gt;Knox&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-455&quot;&gt;https://issues.apache.org/jira/browse/KNOX-455&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Knox 0.5.0 ­ Disable SSLv3&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssl.exclude.protocols&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;storm&quot;&gt;Storm&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/STORM-640&quot;&gt;https://issues.apache.org/jira/browse/STORM-640&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Storm 0.10.0 ­ Disable SSLv3&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;resources&quot;&gt;Resources&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://sysadvent.blogspot.co.uk/2010/12/day-3-debugging-ssltls-with-openssl1.html&quot;&gt;http://sysadvent.blogspot.co.uk/2010/12/day-3-debugging-ssltls-with-openssl1.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://gist.github.com/jankronquist/6412839&quot;&gt;https://gist.github.com/jankronquist/6412839&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="hadoop" /><category term="tls" /><category term="ssl" /><category term="security" /><category term="openssl" /><category term="certificate" /><category term="poodle" /><category term="heartbleed" /><summary type="html">Overview I’ve collected notes on TLS/SSL for a number of years now. Most of them are related to Apache Hadoop, but others are more general. I was consulting when the POODLE and Heartbleed vulnerabilities were released. Below is a collection of TLS/SSL related references. No guarantee they are up to date but it helps to have references in one place.</summary></entry><entry><title type="html">Apache Knox - Performance Improvements</title><link href="https://risdenk.github.io/2018/11/13/apache-knox-performance-improvements.html" rel="alternate" type="text/html" title="Apache Knox - Performance Improvements" /><published>2018-11-13T08:00:00-06:00</published><updated>2018-11-13T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/11/13/apache-knox-performance-improvements</id><content type="html" xml:base="https://risdenk.github.io/2018/11/13/apache-knox-performance-improvements.html">&lt;h3 id=&quot;tldr&quot;&gt;TL;DR&lt;/h3&gt;
&lt;p&gt;Apache Knox 1.2.0 should significantly improve:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html&quot;&gt;Apache Hadoop WebHDFS&lt;/a&gt; write performance due to &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1521&quot;&gt;KNOX-1521&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://hive.apache.org/&quot;&gt;Apache Hive&lt;/a&gt; and GZip performance due to &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1530&quot;&gt;KNOX-1530&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are using Java for TLS, then you should read &lt;a href=&quot;#java---tlsssl-performance&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://knox.apache.org/&quot;&gt;Apache Knox&lt;/a&gt; is a reverse proxy that simplifies security in front of a Kerberos secured &lt;a href=&quot;https://hadoop.apache.org/&quot;&gt;Apache Hadoop&lt;/a&gt; cluster and other related components. On the &lt;a href=&quot;https://mail-archives.apache.org/mod_mbox/knox-user/201809.mbox/%3CCACEuXj475wey-AzxO%2Bqf162Qe7ChEB8oNj1Hd6O1E4VNd8cH7g%40mail.gmail.com%3E&quot;&gt;knox-user mailing list&lt;/a&gt; and &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1221&quot;&gt;Knox Jira&lt;/a&gt;, there have been reports about Apache Knox not performing as expected. Two of the reported cases focused on &lt;a href=&quot;https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html&quot;&gt;Apache Hadoop WebHDFS&lt;/a&gt; performance specifically. I was able to reproduce the slow downs with Apache Knox although the findings were surprising. This blog details the performance findings as well as improvements that will be in Apache Knox 1.2.0.&lt;/p&gt;

&lt;h3 id=&quot;reproducing-the-performance-issues&quot;&gt;Reproducing the performance issues&lt;/h3&gt;
&lt;h4 id=&quot;apache-hadoop---webhdfs&quot;&gt;Apache Hadoop - WebHDFS&lt;/h4&gt;
&lt;p&gt;I started looking into the two reported WebHDFS performance issues (&lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1221&quot;&gt;KNOX-1221&lt;/a&gt; and &lt;a href=&quot;https://mail-archives.apache.org/mod_mbox/knox-user/201809.mbox/%3CCACEuXj475wey-AzxO%2Bqf162Qe7ChEB8oNj1Hd6O1E4VNd8cH7g%40mail.gmail.com%3E&quot;&gt;knox-user post&lt;/a&gt;). I found that the issue reproduced easily on a VM on my laptop. I tested read and write performance of WebHDFS natively with curl as well as going through Apache Knox. The results as posted to &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1221&quot;&gt;KNOX-1221&lt;/a&gt; were as follows:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WebHDFS Read Performance - 1GB file&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Test Case&lt;/th&gt;
      &lt;th&gt;Transfer Speed&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Native WebHDFS&lt;/td&gt;
      &lt;td&gt;252 MB/s&lt;/td&gt;
      &lt;td&gt;3.8s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/o TLS&lt;/td&gt;
      &lt;td&gt;264 MB/s&lt;/td&gt;
      &lt;td&gt;3.6s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/ TLS&lt;/td&gt;
      &lt;td&gt;54 MB/s&lt;/td&gt;
      &lt;td&gt;20s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Parallel Knox w/ TLS&lt;/td&gt;
      &lt;td&gt;2 at ~48MB/s&lt;/td&gt;
      &lt;td&gt;22s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;WebHDFS Write Performance - 1GB file&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Test Case&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Native WebHDFS&lt;/td&gt;
      &lt;td&gt;2.6s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/o TLS&lt;/td&gt;
      &lt;td&gt;29s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/ TLS&lt;/td&gt;
      &lt;td&gt;50s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The results were very surprising since the numbers were all over the board. What was consistent was that Knox performance was poor for reading with TLS and writing regardless of TLS. Another interesting finding was that parallel reads from Knox did not slow down, but instead each connection was limited independently. Details of the analysis are found below &lt;a href=&quot;#apache-hadoop---webhdfs-1&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;apache-hbase---hbase-rest&quot;&gt;Apache HBase - HBase Rest&lt;/h4&gt;
&lt;p&gt;After analyzing WebHDFS performance, I decided to look into other services to see if the same performance slowdowns existed. I looked at Apache HBase Rest as part of &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1525&quot;&gt;KNOX-1524&lt;/a&gt;. I decided to test without TLS for Knox since there was a slowdown identified as part of WebHDFS already.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scan Performance for 100 thousand rows&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Test Case&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;HBase shell&lt;/td&gt;
      &lt;td&gt;13.9s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HBase Rest - native&lt;/td&gt;
      &lt;td&gt;3.4s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HBase Rest - Knox&lt;/td&gt;
      &lt;td&gt;3.7s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The results were not too surprising. More details of the analysis are found below &lt;a href=&quot;#apache-hbase---hbase-rest-1&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;apache-hive---hiveserver2&quot;&gt;Apache Hive - HiveServer2&lt;/h4&gt;
&lt;p&gt;I also looked into HiveServer2 performance with and without Apache Knox as part of &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1524&quot;&gt;KNOX-1524&lt;/a&gt;. The testing below is again without TLS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Select * performance for 200 thousand rows&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Test Case&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;hdfs dfs -text&lt;/td&gt;
      &lt;td&gt;2.4s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline binary fetchSize=1000&lt;/td&gt;
      &lt;td&gt;6.2s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http fetchSize=1000&lt;/td&gt;
      &lt;td&gt;7.5s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http Knox fetchSize=1000&lt;/td&gt;
      &lt;td&gt;9.9s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline binary fetchSize=10000&lt;/td&gt;
      &lt;td&gt;7.3s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http fetchSize=10000&lt;/td&gt;
      &lt;td&gt;7.9s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http Knox fetchSize=10000&lt;/td&gt;
      &lt;td&gt;8.5s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;This showed there was room for improvement for Hive with Knox as well. Details of the analysis are found below &lt;a href=&quot;#apache-hive---hiveserver2-1&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;performance-analysis&quot;&gt;Performance Analysis&lt;/h3&gt;
&lt;h4 id=&quot;apache-hadoop---webhdfs-1&quot;&gt;Apache Hadoop - WebHDFS&lt;/h4&gt;
&lt;p&gt;While lookg at the WebHDFS results, I found that disabling TLS resulted in a big performance gain. Since changing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssl.enabled&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gateway-site.xml&lt;/code&gt; was the only change, that meant that TLS was the only factor for read performance differences. I looked into Jetty performance with TLS and found there were known performance issues with the JDK. For more details, see below &lt;a href=&quot;#java---tlsssl-performance&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The WebHDFS write performance difference could not be attributed to TLS performance since non TLS Knox was also ~20 seconds slower. I experimented with different buffersizes and upgrading httpclient before finding the root cause. The performance difference can be attributed to an issue with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UrlRewriteRequestStream&lt;/code&gt; in Apache Knox. There are multiple read methods on an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;InputStream&lt;/code&gt; and those were not implemented. For the fix details, see below &lt;a href=&quot;#knox---webhdfs-write-performance&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;apache-hbase---hbase-rest-1&quot;&gt;Apache HBase - HBase Rest&lt;/h4&gt;
&lt;p&gt;The &lt;a href=&quot;https://hbase.apache.org/book.html#shell&quot;&gt;HBase shell&lt;/a&gt; slowness is to be expected since it is written in &lt;a href=&quot;https://www.jruby.org/&quot;&gt;JRuby&lt;/a&gt; and not the best tool for working with HBase. Typically the &lt;a href=&quot;https://hbase.apache.org/book.html#hbase_apis&quot;&gt;HBase Java API&lt;/a&gt; is used. While looking at the results, there were no big bottlenecks that jumped out from the performance test. There is some overhead due to Apache Knox but much of this is due to the extra hops.&lt;/p&gt;

&lt;h4 id=&quot;apache-hive---hiveserver2-1&quot;&gt;Apache Hive - HiveServer2&lt;/h4&gt;
&lt;p&gt;It took me a few tries to create a test framework that would allow be to test the changes easily. One of the big findings was that Hive is significantly slower than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hdfs dfs -text&lt;/code&gt; for the same file. There can be some performance improvements to HiveServer2 itself. Another finding is that HiveServer2 binary vs http modes differed significantly with the default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fetchSize&lt;/code&gt; of 1000. My guess is that when HTTP compression was added in &lt;a href=&quot;https://issues.apache.org/jira/browse/HIVE-17194&quot;&gt;HIVE-17194&lt;/a&gt;, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fetchSize&lt;/code&gt; parameter should have been increased to improve over the wire efficiency. When ignoring the binary mode performance, there was still a difference between HiveServer2 http mode with and without Apache Knox. Details on the performance improvements can be found &lt;a href=&quot;#knox---gzip-handling&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;performance-improvements&quot;&gt;Performance Improvements&lt;/h3&gt;
&lt;h4 id=&quot;java---tlsssl-performance&quot;&gt;Java - TLS/SSL Performance&lt;/h4&gt;
&lt;p&gt;There are some performance issues when using the default JDK TLS implementation. I found a few references about the JDK and Jetty.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://nbsoftsolutions.com/blog/the-cost-of-tls-in-java-and-solutions&quot;&gt;https://nbsoftsolutions.com/blog/the-cost-of-tls-in-java-and-solutions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://nbsoftsolutions.com/blog/dropwizard-1-3-upcoming-tls-improvements&quot;&gt;https://nbsoftsolutions.com/blog/dropwizard-1-3-upcoming-tls-improvements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://webtide.com/conscrypting-native-ssl-for-jetty/&quot;&gt;https://webtide.com/conscrypting-native-ssl-for-jetty/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I was able to test with &lt;a href=&quot;https://github.com/google/conscrypt/&quot;&gt;Conscrypt&lt;/a&gt; and found that the performance slowdowns for TLS reads and writes went away. I also tested disabling GCM since there are references that GCM can cause performance issues with JDK 8.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.wowza.com/docs/how-to-improve-ssl-performance-with-java-8&quot;&gt;https://www.wowza.com/docs/how-to-improve-ssl-performance-with-java-8&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://stackoverflow.com/questions/25992131/slow-aes-gcm-encryption-and-decryption-with-java-8u20&quot;&gt;https://stackoverflow.com/questions/25992131/slow-aes-gcm-encryption-and-decryption-with-java-8u20&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The results of testing different TLS implementations are below:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Test Case&lt;/th&gt;
      &lt;th&gt;Transfer Speed&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Native WebHDFS&lt;/td&gt;
      &lt;td&gt;252MB/s&lt;/td&gt;
      &lt;td&gt;3.8s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/o TLS&lt;/td&gt;
      &lt;td&gt;264MB/s&lt;/td&gt;
      &lt;td&gt;3.6s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/ Conscrypt TLS&lt;/td&gt;
      &lt;td&gt;245MB/s&lt;/td&gt;
      &lt;td&gt;4.2s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/ TLS no GCM&lt;/td&gt;
      &lt;td&gt;125MB/s&lt;/td&gt;
      &lt;td&gt;8.7s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/ TLS&lt;/td&gt;
      &lt;td&gt;54.3MB/s&lt;/td&gt;
      &lt;td&gt;20s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Switching to a different TLS implementation provider for the JDK can significantly help performance. This goes across the board for any TLS handling with Java. Another otpion is to terminate TLS connections with a non Java based load balancer. Finally, turning off TLS for performance specific isolated use cases may be ok. These options are ones that should be considered when using TLS with Java.&lt;/p&gt;

&lt;h4 id=&quot;knox---webhdfs-write-performance&quot;&gt;Knox - WebHDFS Write Performance&lt;/h4&gt;
&lt;p&gt;I created &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1521&quot;&gt;KNOX-1521&lt;/a&gt; to add the missing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read&lt;/code&gt; methods on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UrlRewriteRequestStream&lt;/code&gt; class. This allows the underlying stream to read more efficiently than 1 byte at a time. With the changes from &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1521&quot;&gt;KNOX=1521&lt;/a&gt;, WebHDFS write performance is now much closer to native WebHDFS. The updated write performance after &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1521&quot;&gt;KNOX-1521&lt;/a&gt; results are below:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WebHDFS Write Performance - 1GB file - KNOX-1521&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Test Case&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Native WebHDFS&lt;/td&gt;
      &lt;td&gt;3.3s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/o TLS&lt;/td&gt;
      &lt;td&gt;29s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Knox w/o TLS w/ KNOX-1521&lt;/td&gt;
      &lt;td&gt;4.2s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;knox---gzip-handling&quot;&gt;Knox - GZip Handling&lt;/h4&gt;
&lt;p&gt;I found that Apache Knox had a few issues when it came to handling GZip compressed data. I opened &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1530&quot;&gt;KNOX-1530&lt;/a&gt; to address the underlying issues. The big improvement being that Knox after &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1530&quot;&gt;KNOX-1530&lt;/a&gt; will not decompress data that doesn’t need to be rewritten. This removes a lot of processing and should improvement Knox performance for other use cases like reading compressed files from WebHDFS and handling JS/CSS compressed files for UIs. After &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1530&quot;&gt;KNOX-1530&lt;/a&gt; was addressed, the &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1524?focusedCommentId=16673639&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16673639&quot;&gt;performance for Apache Hive HiveServer2 in http mode with and without Apache Knox&lt;/a&gt; was about the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Select * performance for 200 thousand rows with &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1530&quot;&gt;KNOX-1530&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Test Case&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;hdfs dfs -text&lt;/td&gt;
      &lt;td&gt;2.1s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline binary fetchSize=1000&lt;/td&gt;
      &lt;td&gt;5.4s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http fetchSize=1000&lt;/td&gt;
      &lt;td&gt;6.8s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http Knox fetchSize=1000&lt;/td&gt;
      &lt;td&gt;7.7s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline binary fetchSize=10000&lt;/td&gt;
      &lt;td&gt;6.8s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http fetchSize=10000&lt;/td&gt;
      &lt;td&gt;7.7s&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;beeline http Knox fetchSize=10000&lt;/td&gt;
      &lt;td&gt;7.8s&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fetchSize&lt;/code&gt; of 1000 slows down HTTP mode since there needs to be repeated requests to get more results.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;By reproducing the WebHDFS performance bottleneck, it showed that we could improve Knox performance. WebHDFS write performance for Apache Knox 1.2.0 should be significantly faster due to &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1521&quot;&gt;KNOX-1521&lt;/a&gt; changes. Hive perofrmance should be better in Apache Knox 1.2.0 due to &lt;a href=&quot;https://issues.apache.org/jira/browse/KNOX-1530&quot;&gt;KNOX-1530&lt;/a&gt; with better GZip handling. Apache Knox 1.2.0 should be released soon with these performance improvements and more.&lt;/p&gt;

&lt;p&gt;I posted the performance tests I used &lt;a href=&quot;https://github.com/risdenk/knox-performance-tests&quot;&gt;here&lt;/a&gt; so they can be used to find other performance bottle The performance benchmarking should be reproducible and I will use them for more performance testing soon.&lt;/p&gt;

&lt;p&gt;The performance testing done so far is for comparison against the native endpoint and not to show absolutely best performance numbers. This type of testing found some bottlenecks that have been addressed for Apache Knox 1.2.0. All of the tests done so far are without Kerberos authentication for the backend. There could be additional performance bottlenecks when Kerberos authentication is used and that will be another area I’ll be looking into.&lt;/p&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="knox" /><category term="security" /><category term="performance" /><category term="improvement" /><category term="hadoop" /><category term="hdfs" /><category term="webhdfs" /><category term="hbase" /><category term="hive" /><summary type="html">TL;DR Apache Knox 1.2.0 should significantly improve: Apache Hadoop WebHDFS write performance due to KNOX-1521 Apache Hive and GZip performance due to KNOX-1530</summary></entry><entry><title type="html">Apache HBase - Thrift 1 Server SPNEGO Improvements</title><link href="https://risdenk.github.io/2018/11/08/apache-hbase-thrift-1-server-spnego-improvements.html" rel="alternate" type="text/html" title="Apache HBase - Thrift 1 Server SPNEGO Improvements" /><published>2018-11-08T08:00:00-06:00</published><updated>2018-11-08T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/11/08/apache-hbase-thrift-1-server-spnego-improvements</id><content type="html" xml:base="https://risdenk.github.io/2018/11/08/apache-hbase-thrift-1-server-spnego-improvements.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt; provides the ability to perform realtime random read/write access to large datasets. HBase is built on top of &lt;a href=&quot;https://hadoop.apache.org/&quot;&gt;Apache Hadoop&lt;/a&gt; and can scale to billions of rows and millions of columns. One of the capabilities of Apache HBase is a &lt;a href=&quot;https://hbase.apache.org/book.html#thrift&quot;&gt;thrift server&lt;/a&gt; that provides the ability to interact with HBase from any language that supports &lt;a href=&quot;https://thrift.apache.org/&quot;&gt;Thrift&lt;/a&gt;. There are two different versions of the HBase Thrift server v1 and v2. This blog post focuses on v1 since that is the version that integrates with &lt;a href=&quot;https://gethue.com/&quot;&gt;Hue&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;apache-hbase-and-hue&quot;&gt;Apache HBase and Hue&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://gethue.com/the-web-ui-for-hbase-hbase-browser/&quot;&gt;Hue has support for Apache HBase&lt;/a&gt; through the v1 thrift server. The Hue UI allows for easily interacting with HBase for both querying and inserting. It is a quick and easy way to get started with HBase. The downside is that when using the HBase thrift v1 server, there was limited support for Kerberos.&lt;/p&gt;

&lt;h3 id=&quot;hbase-thrift-v1-and-kerberos&quot;&gt;HBase Thrift V1 and Kerberos&lt;/h3&gt;
&lt;p&gt;There have been a few &lt;a href=&quot;http://grokbase.com/p/cloudera/cdh-user/133pgawryt/hbase-thrift-with-kerberos-appears-to-ignore-keytab&quot;&gt;posts&lt;/a&gt; about getting the HBase Thrift V1 server to work properly with Kerberos. In many cases, the solution was to merge keytabs for the HTTP principal and the HBase server principal. The other solution was to add the HTTP principal as a proxy user. Both of these solutions require extra work that isn’t necessary. The HTTP principal should only be used for authenticating SPNEGO. The HBase server principal should be used to authenticate with the rest of HBase. I found this out after comparing the Apache Hive HiveServer2 thrift implementation with the HBase thrift server implementation.&lt;/p&gt;

&lt;h3 id=&quot;improving-the-hbase-thrift-v1-implementation&quot;&gt;Improving the HBase Thrift V1 Implementation&lt;/h3&gt;
&lt;p&gt;I emailed the &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/hbase-user/201801.mbox/%3CCAJU9nmh5YtZ%2BmAQSLo91yKm8pRVzAPNLBU9vdVMCcxHRtRqgoA%40mail.gmail.com%3E&quot;&gt;hbase-user mailing list&lt;/a&gt; to see if my findings were plausible or if I was missing something. Josh Elser reviewed it and said that this change would be useful. I opened &lt;a href=&quot;https://issues.apache.org/jira/browse/HBASE-19852&quot;&gt;HBASE-19852&lt;/a&gt; and put together a working patch over the next few months. It turns out the quick patch for our environment took some effort to contribute back to Apache HBase proper. The patch accomplished the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Avoid the existing 401 try/catch block by checking the authorization header up front before checking for Kerberos credentials&lt;/li&gt;
  &lt;li&gt;Add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hbase.thrift.spnego.principal&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hbase.thrift.spnego.keytab.file&lt;/code&gt; to allow configuring the SPNEGO principal specifically for the Thrift server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the first change, this prevents the logs from being filled with messages about failing Kerberos authentication when the authorization header is empty. The second change allows the SPNEGO principal to be configured in the hbase-site.xml file. The thrift server will then be configured to use the SPNEGO principal and keytab for HTTP authentication. This prevents the need to merge keytabs and allows an administrator to use existing SPNEGO principals and keytabs that are on the host (like one setup by Ambari).&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/HBASE-19852&quot;&gt;HBASE-19852&lt;/a&gt; was reviewed and merged in June 2018. It is a part of HBase 2.1.0 and greater. The Apache HBase community was great to work with since they were patient while I worked on the patch over a few months. The new configuration options allows the HBase Thrift V1 server to work seemlessly with Kerberos and Hue. There is no longer a need to merge keytabs or perform other workarounds. This change has been in use for over a year now with success using the Hue HBase Browser with HBase and Kerberos.&lt;/p&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="hbase" /><category term="thrift" /><category term="spnego" /><category term="kerberos" /><category term="security" /><category term="hue" /><summary type="html">Overview Apache HBase provides the ability to perform realtime random read/write access to large datasets. HBase is built on top of Apache Hadoop and can scale to billions of rows and millions of columns. One of the capabilities of Apache HBase is a thrift server that provides the ability to interact with HBase from any language that supports Thrift. There are two different versions of the HBase Thrift server v1 and v2. This blog post focuses on v1 since that is the version that integrates with Hue.</summary></entry><entry><title type="html">Apache HBase - Snappy Compression</title><link href="https://risdenk.github.io/2018/11/06/apache-hbase-snappy-compression.html" rel="alternate" type="text/html" title="Apache HBase - Snappy Compression" /><published>2018-11-06T08:00:00-06:00</published><updated>2018-11-06T08:00:00-06:00</updated><id>https://risdenk.github.io/2018/11/06/apache-hbase-snappy-compression</id><content type="html" xml:base="https://risdenk.github.io/2018/11/06/apache-hbase-snappy-compression.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt; provides the ability to perform realtime random read/write access to large datasets. HBase is built on top of &lt;a href=&quot;https://hadoop.apache.org/&quot;&gt;Apache Hadoop&lt;/a&gt; and can scale to billions of rows and millions of columns. One of the features of HBase is to enable &lt;a href=&quot;https://hbase.apache.org/book.html#compression&quot;&gt;different types of compression&lt;/a&gt; for a column family. It is recommended that testing be done for your use case, but this blog shows how &lt;a href=&quot;https://en.wikipedia.org/wiki/Snappy_(compression)&quot;&gt;Snappy compression&lt;/a&gt; can reduce storage needs while keeping the same query performance.&lt;/p&gt;

&lt;h3 id=&quot;evidence&quot;&gt;Evidence&lt;/h3&gt;
&lt;p&gt;Below are some images from some clusters where testing was done with Snappy compression. The charts show a variety of metrics from storage size to system metrics.&lt;/p&gt;

&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;800&quot; src=&quot;/images/posts/2018-11-06/dev_grafana_hbase_get_mutate_latencies.png&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;800&quot; src=&quot;/images/posts/2018-11-06/dev_grafana_hbase_size.png&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;800&quot; src=&quot;/images/posts/2018-11-06/test_grafana_hbase_get_mutate_latencies.png&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;800&quot; src=&quot;/images/posts/2018-11-06/test_grafana_hbase_size.png&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;800&quot; src=&quot;/images/posts/2018-11-06/test_grafana_system_disk_io.png&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;800&quot; src=&quot;/images/posts/2018-11-06/test_grafana_system_iowait.png&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;800&quot; src=&quot;/images/posts/2018-11-06/test_grafana_system_user.png&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;The charts above show &amp;gt;80% storage saving while only seeing a slight bump in mutate latencies. The clusters that this was tested on were loaded with simulated data and load. The production data matched this when deployed as well. This storage savings also helped backups and disaster recovery since we didn’t need to move as much data across the wire. References for implementing this yourself with more options for testing are below.&lt;/p&gt;

&lt;h4 id=&quot;references&quot;&gt;References&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://community.hortonworks.com/articles/54761/compression-in-hbase.html&quot;&gt;https://community.hortonworks.com/articles/54761/compression-in-hbase.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://hadoop-hbase.blogspot.com/2016/02/hbase-compression-vs-blockencoding_17.html&quot;&gt;http://hadoop-hbase.blogspot.com/2016/02/hbase-compression-vs-blockencoding_17.html&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blogs.apache.org/hbase/entry/the_effect_of_columnfamily_rowkey&quot;&gt;https://blogs.apache.org/hbase/entry/the_effect_of_columnfamily_rowkey&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines&quot;&gt;https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of&quot;&gt;http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://hbase.apache.org/book.html#compression&quot;&gt;https://hbase.apache.org/book.html#compression&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://hbase.apache.org/book.html#data.block.encoding.enable&quot;&gt;https://hbase.apache.org/book.html#data.block.encoding.enable&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="hbase" /><category term="snappy" /><category term="compression" /><category term="performance" /><category term="storage" /><summary type="html">Overview Apache HBase provides the ability to perform realtime random read/write access to large datasets. HBase is built on top of Apache Hadoop and can scale to billions of rows and millions of columns. One of the features of HBase is to enable different types of compression for a column family. It is recommended that testing be done for your use case, but this blog shows how Snappy compression can reduce storage needs while keeping the same query performance.</summary></entry><entry><title type="html">Apache Storm - Slow Topology Upload</title><link href="https://risdenk.github.io/2018/11/01/apache-storm-slow-topology-upload.html" rel="alternate" type="text/html" title="Apache Storm - Slow Topology Upload" /><published>2018-11-01T09:00:00-05:00</published><updated>2018-11-01T09:00:00-05:00</updated><id>https://risdenk.github.io/2018/11/01/apache-storm-slow-topology-upload</id><content type="html" xml:base="https://risdenk.github.io/2018/11/01/apache-storm-slow-topology-upload.html">&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is an old post from notes. This may not be applicable anymore but sharing in case it helps someone.&lt;/p&gt;

&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://storm.apache.org/&quot;&gt;Apache Storm&lt;/a&gt; after HDP 2.2 seems to have a hard time with large topology jars and takes a while to upload them. There have been a &lt;a href=&quot;https://mail-archives.apache.org/mod_mbox/storm-user/201603.mbox/%3CCAPC1M2i3OpKhC3n_+oTJke45Efuxq2PxMVurx71oEU-=Nqd9gQ@mail.gmail.com%3E&quot;&gt;few&lt;/a&gt; &lt;a href=&quot;https://community.hortonworks.com/questions/24517/topology­code­distribution­takes­too­much­time.html&quot;&gt;reports&lt;/a&gt; of Storm topology jars uploading slowly. I ran into this a few years ago. The fix is to increase the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nimbus.thrift.max_buffer_size&lt;/code&gt; setting.&lt;/p&gt;

&lt;h3 id=&quot;fix&quot;&gt;Fix&lt;/h3&gt;
&lt;p&gt;Increase &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nimbus.thrift.max_buffer_size&lt;/code&gt; from the default of 1048576 to 20485760.&lt;/p&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://mail-archives.apache.org/mod_mbox/storm-user/201403.mbox/%3CFC98EE12-4AED-4D06-9917-C449B96EB08A@gmail.com%3E&quot;&gt;https://mail-archives.apache.org/mod_mbox/storm-user/201403.mbox/%3CFC98EE12-4AED-4D06-9917-C449B96EB08A@gmail.com%3E&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://stackoverflow.com/questions/27092653/storm-supervisor-connectivity-error-downloading-the-jar-from-nimbus&quot;&gt;http://stackoverflow.com/questions/27092653/storm-supervisor-connectivity-error-downloading-the-jar-from-nimbus&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://qnalist.com/questions/4768442/nimbus-fails-after-uploading-topology-reading-too-large-of-frame-size&quot;&gt;https://qnalist.com/questions/4768442/nimbus-fails-after-uploading-topology-reading-too-large-of-frame-size&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="storm" /><category term="topology" /><category term="upload" /><category term="performance" /><summary type="html">Note: This is an old post from notes. This may not be applicable anymore but sharing in case it helps someone.</summary></entry><entry><title type="html">Apache Solr - Apache Calcite Avatica Integration</title><link href="https://risdenk.github.io/2018/10/30/apache-solr-apache-calcite-avatica-integration.html" rel="alternate" type="text/html" title="Apache Solr - Apache Calcite Avatica Integration" /><published>2018-10-30T09:00:00-05:00</published><updated>2018-10-30T09:00:00-05:00</updated><id>https://risdenk.github.io/2018/10/30/apache-solr-apache-calcite-avatica-integration</id><content type="html" xml:base="https://risdenk.github.io/2018/10/30/apache-solr-apache-calcite-avatica-integration.html">&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://lucene.apache.org/solr&quot;&gt;Apache Solr&lt;/a&gt; is a full text search engine that is built on &lt;a href=&quot;https://lucene.apache.org/solr/&quot;&gt;Apache Lucene&lt;/a&gt;. One of the capabilities of Apache Solr is to handle SQL like statements. This was introduced in Solr 6.0 and refined in subsequent releases. Initially the SQL support used the &lt;a href=&quot;https://github.com/prestodb/presto/blob/master/presto-parser/src/main/java/com/facebook/presto/sql/parser/SqlParser.java&quot;&gt;Presto SQL parser&lt;/a&gt;. This was replaced by &lt;a href=&quot;https://calcite.apache.org/&quot;&gt;Apache Calcite&lt;/a&gt; due to Presto not having an optimizer. Calcite provides the ability to push down execution of SQL to Apache Solr.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://calcite.apache.org/avatica/&quot;&gt;Apache Calcite Avatica&lt;/a&gt; is a subproject of Apache Calcite and provides a JDBC driver as well as JDBC server. The Avatica architecture diagram displays how this fits together.&lt;/p&gt;

&lt;p style=&quot;text-align:center&quot;&gt;&lt;img width=&quot;60%&quot; src=&quot;https://raw.githubusercontent.com/julianhyde/share/master/slides/avatica-architecture.png&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;apache-solr-and-apache-calcite-avatica&quot;&gt;Apache Solr and Apache Calcite Avatica&lt;/h3&gt;
&lt;p&gt;Apache Solr has historically built its own JDBC driver implementation. This takes quite a bit of effort since the JDBC specification has a lot of methods that need to be implemented. &lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9963&quot;&gt;SOLR-9963&lt;/a&gt; was created to try to integrate Apache Calcite Avatica into Solr. This would provide an endpoint for the Avatica JDBC driver and remove the need for a separate Apache Solr JDBC driver implementation.&lt;/p&gt;

&lt;h3 id=&quot;integrating-apache-calcite-avatica-as-an-apache-solr-handler&quot;&gt;Integrating Apache Calcite Avatica as an Apache Solr Handler&lt;/h3&gt;
&lt;p&gt;Since Apache Calcite Avatica is implemented in Jetty just like Apache Solr, I had the idea to add Avatica as just another handler in Solr. This would expose all the features of Avatica without changing any internals of Solr. The Avatica handler could then use the existing Calcite engine within Apache Solr to handle the queries.&lt;/p&gt;

&lt;p&gt;I created &lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9963&quot;&gt;SOLR-9963&lt;/a&gt; and by early February 2017 I had a working example of the integration of Avatica and Solr. I was able to use the existing Avatica JDBC driver directly with Apache Solr without any issues. Sadly I haven’t had time to finish merging this change yet.&lt;/p&gt;

&lt;h3 id=&quot;testing-apache-solr-with-apache-calcite-avatica-handler&quot;&gt;Testing Apache Solr with Apache Calcite Avatica Handler&lt;/h3&gt;
&lt;p&gt;One of the cool features of Apache Calcite Avatica is that you can interact with it over pure REST with a JSON payload. I created a simple test script to show how this was possible even with Apache Solr.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;./test_avatica_solr.sh &quot;http://localhost:8983/solr/test/avatica&quot; &quot;select * from test limit 10&quot;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;test_avatica_solr.sh&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;#!/usr/bin/env bash&lt;/span&gt;

&lt;span class=&quot;nb&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-u&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#set -x&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;AVATICA&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$1&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;SQL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$2&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;CONNECTION_ID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;conn-&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;whoami&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;date&lt;/span&gt; +%s&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;MAX_ROW_COUNT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;100
&lt;span class=&quot;nv&quot;&gt;NUM_ROWS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;2
&lt;span class=&quot;nv&quot;&gt;OFFSET&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;0

&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Open connection&quot;&lt;/span&gt;
curl &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-w&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$AVATICA&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;{&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;openConnection&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;connectionId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CONNECTION_ID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;}&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Example of how to set connection properties with info key&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;#curl -i &quot;$AVATICA&quot; -H &quot;Content-Type: application/json&quot; --data &quot;{\&quot;request\&quot;: \&quot;openConnection\&quot;,\&quot;connectionId\&quot;: \&quot;${CONNECTION_ID}\&quot;,\&quot;info\&quot;: {\&quot;zk\&quot;: \&quot;$ZK\&quot;,\&quot;lex\&quot;: \&quot;MYSQL\&quot;}}&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo

echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Create statement&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;STATEMENTRSP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;curl &lt;span class=&quot;nt&quot;&gt;-s&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$AVATICA&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;{&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;createStatement&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;connectionId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CONNECTION_ID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;}&quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;STATEMENTID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$STATEMENTRSP&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; | jq .statementId&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo

echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;PrepareAndExecuteRequest&quot;&lt;/span&gt;
curl &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-w&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$AVATICA&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;{&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;prepareAndExecute&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;connectionId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CONNECTION_ID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;statementId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$STATEMENTID&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$SQL&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;maxRowCount&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;MAX_ROW_COUNT&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;, &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;maxRowsInFirstFrame&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;NUM_ROWS&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;}&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Loop through all the results&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;ISDONE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;false
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$ISDONE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do
  &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;OFFSET&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;$((&lt;/span&gt;OFFSET &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; NUM_ROWS&lt;span class=&quot;k&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;FetchRequest - Offset=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$OFFSET&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
  &lt;span class=&quot;nv&quot;&gt;FETCHRSP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;curl &lt;span class=&quot;nt&quot;&gt;-s&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$AVATICA&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;{&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;fetch&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;connectionId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CONNECTION_ID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;statementId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$STATEMENTID&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;OFFSET&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;fetchMaxRowCount&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;NUM_ROWS&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;}&quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$FETCHRSP&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
  &lt;span class=&quot;nv&quot;&gt;ISDONE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$FETCHRSP&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; | jq .frame.done&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;echo
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;done

&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Close statement&quot;&lt;/span&gt;
curl &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-w&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$AVATICA&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;{&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;closeStatement&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;connectionId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CONNECTION_ID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;statementId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$STATEMENTID&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;}&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo

echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Close connection&quot;&lt;/span&gt;
curl &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-w&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$AVATICA&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;{&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;closeConnection&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;connectionId&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;: &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;CONNECTION_ID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;}&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;what-is-next&quot;&gt;What is next?&lt;/h3&gt;
&lt;p&gt;If this feature looks interesting it would be good to add your thoughts to &lt;a href=&quot;https://issues.apache.org/jira/browse/SOLR-9963&quot;&gt;SOLR-9963&lt;/a&gt;. If there is interest then we can work towards getting SOLR-9963 merged. The Apache Solr JDBC driver would need to then switch to wrapping an Avatica JDBC driver. Overall this should improve the SQL experience that comes with Apache Solr.&lt;/p&gt;</content><author><name></name></author><category term="bigdata" /><category term="apache" /><category term="solr" /><category term="calcite" /><category term="avatica" /><category term="integration" /><summary type="html">Overview Apache Solr is a full text search engine that is built on Apache Lucene. One of the capabilities of Apache Solr is to handle SQL like statements. This was introduced in Solr 6.0 and refined in subsequent releases. Initially the SQL support used the Presto SQL parser. This was replaced by Apache Calcite due to Presto not having an optimizer. Calcite provides the ability to push down execution of SQL to Apache Solr.</summary></entry></feed>