[HBASE-2066] Perf: parallelize puts - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.20.2, 0.20.3
Fix Version/s: 0.90.0
Component/s: None
Labels:
None

Description

Right now with large region count tables, the write buffer is not efficient. This is because we issue potentially N RPCs, where N is the # of regions in the table. When N gets large (lets say 1200+) things become sloowwwww.

Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!

This requires a RPC change...

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TestBatchPut.java
09/Feb/10 19:31
2 kB
Cosmin Lehene
HBASE-2066-v2.patch
25/Jan/10 23:33
25 kB
ryan rawson
HBASE-2066-branch.patch
22/Dec/09 05:28
22 kB
ryan rawson
HBASE-2066-3.patch
12/Feb/10 02:17
31 kB
ryan rawson
HBASE-2066-20-branch.txt
21/Mar/10 04:23
32 kB
ryan rawson

Issue Links

is related to

HBASE-1845 MultiGet, MultiDelete, and MultiPut - batched to the appropriate region servers

Closed

Activity

Ascending order - Click to sort in descending order

ryan rawson added a comment - 22/Dec/09 05:28

include rpc version bump

ryan rawson added a comment - 22/Dec/09 05:28 include rpc version bump

Michael Stack added a comment - 22/Dec/09 21:56

Patch looks great.

We can't do a version bump in 0.20 branch. Adding a new method to the interface w/o version bumping doesn't work I suppose. How about a version in 0.20 that doesn't pass ExcecutorService and a timeout and whose method is named processBatchOfRows rather processBatchOfPuts?

Any chance of some tests?

Will this fix help in 0.20 branch?

@@ -845,8 +855,9 @@ public class HConnectionManager implements HConstants {
 
           // by nature of the map, we know that the start key has to be < 
           // otherwise it wouldn't be in the headMap. 
-          if (KeyValue.getRowComparator(tableName).compareRows(endKey, 0, endKey.length,
-              row, 0, row.length) <= 0) {
+          if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) ||
+              KeyValue.getRowComparator(tableName).compareRows(endKey, 0, endKey.length,
+              row, 0, row.length) > 0) {
             // delete any matching entry
             HRegionLocation rl =
               tableLocations.remove(matchingRegions.lastKey());

Do you want to change these:

+            LOG.debug("Failed all from " + request.address + " due to ExecutionException");

.. so the are instead:

+            LOG.debug("Failed all from " + request.address, e);

Is this done once, getCurrentNrHRS, in the HTable constructor?

looks really good

Michael Stack added a comment - 22/Dec/09 21:56 Patch looks great. We can't do a version bump in 0.20 branch. Adding a new method to the interface w/o version bumping doesn't work I suppose. How about a version in 0.20 that doesn't pass ExcecutorService and a timeout and whose method is named processBatchOfRows rather processBatchOfPuts? Any chance of some tests? Will this fix help in 0.20 branch? @@ -845,8 +855,9 @@ public class HConnectionManager implements HConstants { // by nature of the map, we know that the start key has to be < // otherwise it wouldn't be in the headMap. - if (KeyValue.getRowComparator(tableName).compareRows(endKey, 0, endKey.length, - row, 0, row.length) <= 0) { + if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) || + KeyValue.getRowComparator(tableName).compareRows(endKey, 0, endKey.length, + row, 0, row.length) > 0) { // delete any matching entry HRegionLocation rl = tableLocations.remove(matchingRegions.lastKey()); Do you want to change these: + LOG.debug( "Failed all from " + request.address + " due to ExecutionException" ); .. so the are instead: + LOG.debug( "Failed all from " + request.address, e); Is this done once, getCurrentNrHRS, in the HTable constructor? looks really good

Jeff Hammerbacher added a comment - 20/Jan/10 11:30

How does this relate to ~~HBASE-1845~~?

Jeff Hammerbacher added a comment - 20/Jan/10 11:30 How does this relate to HBASE-1845 ?

ryan rawson added a comment - 20/Jan/10 22:04

This is much less ambitious than ~~HBASE-1845~~ and seeks to optimize the Put case only.

One of the problems with the original ~~HBASE-1845~~ patch is that it requires a new API to take advantage of it, thus requires porting code. Furthermore there is HTable handy things like write buffering, write buffer size settings, etc, etc. I started with the 1845 patch, and realized we also needed a way to parallelize puts in the normal API. This is much simpler than 1845 because we don't have to line up return codes (there are no return codes for puts, just exceptions due to temporary issues).

Short: this is a drop in replacement and makes things go fast now. ~~HBASE-1845~~ requires a new API.

ryan rawson added a comment - 20/Jan/10 22:04 This is much less ambitious than HBASE-1845 and seeks to optimize the Put case only. One of the problems with the original HBASE-1845 patch is that it requires a new API to take advantage of it, thus requires porting code. Furthermore there is HTable handy things like write buffering, write buffer size settings, etc, etc. I started with the 1845 patch, and realized we also needed a way to parallelize puts in the normal API. This is much simpler than 1845 because we don't have to line up return codes (there are no return codes for puts, just exceptions due to temporary issues). Short: this is a drop in replacement and makes things go fast now. HBASE-1845 requires a new API.

ryan rawson added a comment - 25/Jan/10 23:33

this is a trunk version with test.

ryan rawson added a comment - 25/Jan/10 23:33 this is a trunk version with test.

Michael Stack added a comment - 26/Jan/10 17:04

Patch looks good. Make sure all licenses are 2010 on commit and add some class comment to new classes saying what they do on commit. You don't up the RPC version? Otherwise it looks great RR.

Michael Stack added a comment - 26/Jan/10 17:04 Patch looks good. Make sure all licenses are 2010 on commit and add some class comment to new classes saying what they do on commit. You don't up the RPC version? Otherwise it looks great RR.

Michael Stack added a comment - 09/Feb/10 18:05

Hey man, commit already!

Michael Stack added a comment - 09/Feb/10 18:05 Hey man, commit already!

Cosmin Lehene added a comment - 09/Feb/10 19:23

Patch fails to apply on trunk.
After manually applying chunks I got these while doing puts

EXCEPTION 1

java.lang.NullPointerException
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.deleteCachedLocation(HConnectionManager.java:889)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1413)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:471)
at TestBatchPut$MyThread.run(TestBatchPut.java:65)

EXCEPTION 2

java.lang.NullPointerException
at java.util.TreeMap.rotateRight(TreeMap.java:2057)
at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2217)
at java.util.TreeMap.deleteEntry(TreeMap.java:2151)
at java.util.TreeMap.remove(TreeMap.java:585)
at org.apache.hadoop.hbase.util.SoftValueSortedMap.remove(SoftValueSortedMap.java:104)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.deleteCachedLocation(HConnectionManager.java:897)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1413)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:471)
at TestBatchPut$MyThread.run(TestBatchPut.java:65)

Also the throughput went down and the max seconds for a put went up (could be also from the hbase restart).

I'll attach the piece of code I'm using to benchmark it

Cosmin Lehene added a comment - 09/Feb/10 19:23 Patch fails to apply on trunk. After manually applying chunks I got these while doing puts EXCEPTION 1 java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.deleteCachedLocation(HConnectionManager.java:889) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1413) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:471) at TestBatchPut$MyThread.run(TestBatchPut.java:65) EXCEPTION 2 java.lang.NullPointerException at java.util.TreeMap.rotateRight(TreeMap.java:2057) at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2217) at java.util.TreeMap.deleteEntry(TreeMap.java:2151) at java.util.TreeMap.remove(TreeMap.java:585) at org.apache.hadoop.hbase.util.SoftValueSortedMap.remove(SoftValueSortedMap.java:104) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.deleteCachedLocation(HConnectionManager.java:897) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1413) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:471) at TestBatchPut$MyThread.run(TestBatchPut.java:65) Also the throughput went down and the max seconds for a put went up (could be also from the hbase restart). I'll attach the piece of code I'm using to benchmark it

Cosmin Lehene added a comment - 09/Feb/10 19:31

run TestBatchPut nr_of_threads nr_of_puts_per_call

Cosmin Lehene added a comment - 09/Feb/10 19:31 run TestBatchPut nr_of_threads nr_of_puts_per_call

ryan rawson added a comment - 09/Feb/10 22:22

looks like a basic thread concurrency problem here.

Now to the performance issues, the current code uses ONE threadpool for everyone, which is currently set to 10 threads static. The original code used a thread pool per HTable and sized it to the number of regionservers - that is impossible to do in HCM because of chicken-and-egg bootstrap problems (the call we'd use calls HCM.<init> which calls ...).

Maybe the threadpool should move back into HTable to support parallelism better? With 10 worker threads for way more than 10 client threads, yeah put performance is going to nosedive.

ryan rawson added a comment - 09/Feb/10 22:22 looks like a basic thread concurrency problem here. Now to the performance issues, the current code uses ONE threadpool for everyone, which is currently set to 10 threads static. The original code used a thread pool per HTable and sized it to the number of regionservers - that is impossible to do in HCM because of chicken-and-egg bootstrap problems (the call we'd use calls HCM.<init> which calls ...). Maybe the threadpool should move back into HTable to support parallelism better? With 10 worker threads for way more than 10 client threads, yeah put performance is going to nosedive.

ryan rawson added a comment - 12/Feb/10 02:17

here is the much awaited new version. i'll run some tests on it and then commit if things look good.

ryan rawson added a comment - 12/Feb/10 02:17 here is the much awaited new version. i'll run some tests on it and then commit if things look good.

ryan rawson added a comment - 12/Feb/10 03:03

i ran TestBatchPut for a while and inserted 3.3GB of data w/o problems. Ended up with like 4 table splits. No more concurrent exceptions, no major slowdown... the threads got slower as my machine bogged down, but it wasnt some crazy kind of exponential slowdown originally reported.

if there is no complaints, i'm going to commit this as-is.

ryan rawson added a comment - 12/Feb/10 03:03 i ran TestBatchPut for a while and inserted 3.3GB of data w/o problems. Ended up with like 4 table splits. No more concurrent exceptions, no major slowdown... the threads got slower as my machine bogged down, but it wasnt some crazy kind of exponential slowdown originally reported. if there is no complaints, i'm going to commit this as-is.

ryan rawson added a comment - 12/Feb/10 05:07

commited to trunk

ryan rawson added a comment - 12/Feb/10 05:07 commited to trunk

ryan rawson added a comment - 20/Mar/10 07:44

this will go into 0.20 branch since now we have ~~HBASE-2219~~ in there

ryan rawson added a comment - 20/Mar/10 07:44 this will go into 0.20 branch since now we have HBASE-2219 in there

ryan rawson added a comment - 20/Mar/10 07:44

adding to 0.20.4

ryan rawson added a comment - 20/Mar/10 07:44 adding to 0.20.4

ryan rawson added a comment - 21/Mar/10 04:23

for the branch

ryan rawson added a comment - 21/Mar/10 04:23 for the branch

Andrew Kyle Purtell added a comment - 21/Mar/10 11:06

Since ~~HBASE-2066~~ was committed on 0.20 branch o.a.h.h.client.TestGetRowVersions is hanging.

Before:

test-core:
    [mkdir] Created dir: /home/apurtell/src/Hadoop/hbase.git/build/test/logs
    [junit] Running org.apache.hadoop.hbase.client.TestGetRowVersions
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 36.048 sec

Now:

test-core:
    [mkdir] Created dir: /home/apurtell/src/Hadoop/hbase.git/build/test/logs
    [junit] Running org.apache.hadoop.hbase.client.TestGetRowVersions
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] Test org.apache.hadoop.hbase.client.TestGetRowVersions FAILED (timeout)

TestGetRowVersions shuts down and restarts the minicluster mid test. Maybe it could just force flush instead?

Prior to 2066 this test would exit, but I think only by luck. Now, according to jstack main() is joined to the regionserver thread, which is trying again and again to report for duty to a master thread that has gone away. Neither main nor the regionserver threads are daemon threads, so the JVM does not exit.

Andrew Kyle Purtell added a comment - 21/Mar/10 11:06 Since HBASE-2066 was committed on 0.20 branch o.a.h.h.client.TestGetRowVersions is hanging. Before: test-core: [mkdir] Created dir: /home/apurtell/src/Hadoop/hbase.git/build/test/logs [junit] Running org.apache.hadoop.hbase.client.TestGetRowVersions [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 36.048 sec Now: test-core: [mkdir] Created dir: /home/apurtell/src/Hadoop/hbase.git/build/test/logs [junit] Running org.apache.hadoop.hbase.client.TestGetRowVersions [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hbase.client.TestGetRowVersions FAILED (timeout) TestGetRowVersions shuts down and restarts the minicluster mid test. Maybe it could just force flush instead? Prior to 2066 this test would exit, but I think only by luck. Now, according to jstack main() is joined to the regionserver thread, which is trying again and again to report for duty to a master thread that has gone away. Neither main nor the regionserver threads are daemon threads, so the JVM does not exit.

Andrew Kyle Purtell added a comment - 22/Mar/10 09:02

Ryan committed a fix to SVN which makes the testcase work again.

Andrew Kyle Purtell added a comment - 22/Mar/10 09:02 Ryan committed a fix to SVN which makes the testcase work again.

ryan rawson added a comment - 22/Mar/10 23:39

commited to branch now

ryan rawson added a comment - 22/Mar/10 23:39 commited to branch now

Michael Stack added a comment - 12/May/10 23:52

Marking these as fixed against 0.21.0 rather than against 0.20.5.

Michael Stack added a comment - 12/May/10 23:52 Marking these as fixed against 0.21.0 rather than against 0.20.5.

Lars Francke added a comment - 20/Nov/15 13:01

This issue was closed as part of a bulk closing operation on 2015-11-20. All issues that have been resolved and where all fixVersions have been released have been closed (following discussions on the mailing list).

Lars Francke added a comment - 20/Nov/15 13:01 This issue was closed as part of a bulk closing operation on 2015-11-20. All issues that have been resolved and where all fixVersions have been released have been closed (following discussions on the mailing list).

People

Assignee:: ryan rawson

Reporter:: ryan rawson

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 22/Dec/09 05:09

Updated:: 20/Nov/15 13:01

Resolved:: 22/Mar/10 23:39

HBase

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates