Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Edge (in tez-api) need one more field for edge name and one more method for creating named edge. Client side DAG validation should also be updated: the DAG should have either all edges named or all edges unamed; in-edges of one vertex should get unique names; out-edges of one vertex should get unique names.
Attachments
Attachments
- TEZ-3511.6.patch
- 28 kB
- Zhiyuan Yang
- TEZ-3511.5.patch
- 27 kB
- Zhiyuan Yang
- TEZ-3511.4.patch
- 24 kB
- Zhiyuan Yang
- TEZ-3511.3.patch
- 16 kB
- Zhiyuan Yang
- TEZ-3511.2.patch
- 11 kB
- Zhiyuan Yang
- TEZ-3511.1.patch
- 10 kB
- Zhiyuan Yang
Activity
darion Thanks for your interest on this!
when the edge name don't supported we could use input and output vertex name as edge name such as input_name+""output_name"" . ? I don't think null is a good idea
At client side, we use null for unnamed edge so that AM side could separate the unnamed edge case from named edge case. If you assign any other default name, user maybe use same name for a named edge and AM will be confused.
"DAG should have either all edges named or all edges unamed" I'm not sure is it necessary ?
Whether this could work need more investigation. Unless we are 100% percent sure it could work, we shouldn't allow hybrid case. Given the size of the whole change is huge, leaving it in second step may be better.
Btw, if you look at the my POC patch, this jira is already done except DAG validation for named edge case. Would you mind assigning this jira to me? If you are interested in DAG validation part, feel free to create a jira and work on it.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12837829/TEZ-3511.1.patch
against master revision ad68f73.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.
-1 release audit. The applied patch generated 1 release audit warnings.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2095//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2095//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2095//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12837841/TEZ-3511.2.patch
against master revision ad68f73.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2096//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2096//console
This message is automatically generated.
Comments:
- no javadocs on the new API?
- "Edges in DAG shouldn't be partially named" - this seems a bit confusing. What is the real error that the user needs to understand and fix?
- Useful to use "other" as var name in equals() function.
- Patch does not match design doc in terms of providing clarity on whether named edges can be mixed with unnamed edges. Improved javadocs will help. Likewise for unit tests to ensure that users cannot create a hybrid. Also, for edge names - are they meant to be unique across the dag, within an vertex pair, what about when working with vertex groups?
- Also are there are limitations on naming conventions? It might be good to restrict edge names to a defined char set and length.
Comments are addressed in new patch. Regarding to naming, why not just make it normal string without restriction, like vertex name?
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12838103/TEZ-3511.3.patch
against master revision eb6fb67.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in :
org.apache.tez.test.TestFaultTolerance
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2098//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2098//console
This message is automatically generated.
Comments:
- allEdgesNamed should not be a member var. It should be only needed when the dag is built. This does imply that a mix of edge types will not be detected until the dag is finally built but that should be ok.
- removing such state is to ensure that re-submitting a dag to a different tez client will work
- typo in "(of null if edge has no name)" - should be or
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12838865/TEZ-3511.4.patch
against master revision b71ea4a.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2106//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2106//console
This message is automatically generated.
Comments:
- firstNonNull is marked as deprecated in newer versions of guava. Please do not use this api. Also,
* DAG must have all edges named or all edges unamed, not mixed. In-edges or out-edges of same 280 * vertex must have unique names, but in-edges could have same name as out-edges.
- the above can be re-worded to the following ( feel free to re-edit as needed ):
- All edges within a DAG must either be named ( created via <method ref> ) or unnamed ( created via ... ).
- All inbound edges to a vertex should have unique names. Likewise for outbound edges.
- A vertex can have an inbound edge that uses the same name as that used by an outbound edge.
if (edgeNamed && groupInputEdges.size() > 0) { 593 throw new IllegalStateException("DAG contains both named edge and group edge"); 594 } 595 if (edgeNamed && e.getName() == null) { 596 throw new IllegalStateException("DAG contains both named edges, but edge " + 597 e + " is unnamed"); 598 } else if (!edgeNamed && e.getName() != null) { 599 throw new IllegalStateException("DAG contains both unnamed edges, but edge " + 600 e + " is named"); 601 }
- Why not loop through all edges, make a count of named and unnamed and throw an error at the end with a summary?
- use groupInputEdges.isEmpty()
- I am not sure about the error messages - why do we differentiate between "DAG contains both named edges, but edge" and "DAG contains both unnamed edges, but edge" - isnt it simpler to just state that DAG contains both named and unnamed edges? Now the bigger question is whether you want to list all named and unnamed separately? This could be done if debug log level is enabled.
"return name + " " + inputVertex + " -> " + outputVertex + " (" + edgeProperty + ")";"
- without any separators, this may be confusing to read and parse for a user who does not know that the first string is name and second string is vertex name. Maybe follow the same logic as the javadoc i.e. src vertex – [edgeName] --> destVertex ?
try { 109 dag.addEdge(edge2); 110 Assert.fail(); 111 } catch (Exception ignored) {} 112 }
- Would be good to check the exception message and verify correctness.
- good practice to add a message to Assert.fail()
- Would be good to cover the case where the edge name is same as vertex name and that it works correctly for named edges.
Comments are addressed in new patch.
Why not loop through all edges, make a count of named and unnamed and throw an error at the end with a summary?
I am not sure about the error messages - why do we differentiate between "DAG contains both named edges, but edge" and "DAG contains both unnamed edges, but edge" - isnt it simpler to just state that DAG contains both named and unnamed edges? Now the bigger question is whether you want to list all named and unnamed separately? This could be done if debug log level is enabled.
In new patch conflicting edges(not all of them) are printed out in case user don't know which edges are causing problem.
without any separators, this may be confusing to read and parse for a user who does not know that the first string is name and second string is vertex name. Maybe follow the same logic as the javadoc i.e. src vertex – [edgeName] --> destVertex ?
Now the edge will be printed like this:
{e2 : [v1 : Processor] -> [v2 : Processor] (SCATTER_GATHER : input >> PERSISTED >> output >> NullEdgeManager)}+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12839955/TEZ-3511.5.patch
against master revision 501a351.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2116//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2116//console
This message is automatically generated.
Comments - minor nits:
- why does EdgeProperty::toString need to change to use "()" instead of "{}"?
- Exception messages are not consistent. In some cases it is incoming or outgoing edge. In others, it is in-edges or out-edges. Please fix.
throw new IllegalStateException("DAG shouldn't contains both named edge " + namedEdge 601 + " and group edge " + groupInputEdges.iterator().next());
- why not simply say that there are "N" named edges and "N" grouped edges instead of printing out the first grouped edge info? Not sure what usefulness comes from printing out details of a single grouped Edge if there are more than one grouped edges?
Assert.assertTrue(e.getMessage().contains("already defined"));
- this needs a better check such as looking for "Edge1 already defined"? Likewise for some other assert checks.
Might be good to break testHashAndEquals() into different unit tests. One for a basic check. Others to test variations of mismatches.
{e2 : [v1 : Processor] -> [v2 : Processor] (SCATTER_GATHER : input >> PERSISTED >> output >> NullEdgeManager)}why does EdgeProperty::toString need to change to use "()" instead of "{}"?
is better than
(e2 : [v1 : Processor] -> [v2 : Processor]
)
because of the convention that curly brace has larger scope than bracket.
why not simply say that there are "N" named edges and "N" grouped edges instead of printing out the first grouped edge info? Not sure what usefulness comes from printing out details of a single grouped Edge if there are more than one grouped edges?
If we just tell user how many edges have problem, user may have trouble figuring out which edge has problem especially when DAG is constructed programmatically. In this scenario, printing detailed edge info will be helpful for debugging. Even there are multiple edge having problem, user can still finally figure out all the issue.
Other comments are addressed in new patch.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12841146/TEZ-3511.6.patch
against master revision 8079919.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in :
org.apache.tez.mapreduce.TestMRRJobsDAGApi
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2134//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2134//console
This message is automatically generated.
Hi Zhiyuan Yang when the edge name don't supported we could use input and output vertex name as edge name such as input_name+""output_name"" . ? I don't think null is a good idea
"DAG should have either all edges named or all edges unamed" I'm not sure is it necessary ?