Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
While looking at KNOX-1524, I found that requesting compressed results can cause performance impacts. Knox currently does the following:
- Apache HttpClient transparently decompresses each request
This lead to recompressing some streams (KNOX-732,, KNOX-855KNOX-856) based on MimeTypes. Even if we disableContentCompression, KNOX-565 added the following which should only come into play with the above HttpClient transparent decompression disabled (or multipart Gzip files - KNOX-1518):
- Try to decompress the stream
- Currently uses try/catch
- Run any rewrite filter rules
- If decompressed, recompress the stream
For many use cases, there is no reason to decompress and recompress the same stream. This is because there are no rewrite rules that apply. One example of this is Hive where beeline requests compression and HiveServer2 added support for returning compressed results with HIVE-17194. Another is with WebHDFS where we don't want to change the content going back to the client.
I am planning to address this in a few pieces:
- Determine if any rewrite rules apply before decompressing
- If rewrite rules apply, then decompress and recompress as before
- If rewrite rules do not apply, then copy stream as is
- Remove gzip filter added by
KNOX-732- Figure out if there is another code path where decompress/recompress should happen
- We should not have to rely on Jetty to recompress content
- Disable httpclient content compression
- Need to make sure we handle decompress/recompress where necessary
With all 3 improvements in place we should end up with:
- One place where gzip decompress/recompress happens
- Only decompress/recompress if rewrite rules match
- Performance increases due to skipping unnecessary decompress/recompress
Attachments
Issue Links
- is related to
-
KNOX-165 Stress testing
- Open
- relates to
-
KNOX-732 Knox does not recompress javascript resources
- Closed
-
KNOX-855 Add "application/x-javascript" mime type to the list of compressed resources
- Closed
-
KNOX-1221 WebHDFS read/write performance limitations
- Closed
-
KNOX-856 Document gzip compression options
- Closed
-
HTTPCLIENT-834 Transparent Content Coding support
- Closed
-
KNOX-565 Supporting All the Quick Links on Ambari Dashboard to Go Through Knox
- Closed
-
HIVE-17194 JDBC: Implement Gzip compression for HTTP mode
- Closed
-
KNOX-1518 Large HDFS file downloads are incomplete when content is gzipped
- Closed
-
KNOX-1524 Hive "select *" performance evaluation
- Closed
-
KNOX-1525 HBase "scan" performance evaluation
- Closed