Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1824

Support import modules in Jython UDF

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.0, 0.9.0
    • 0.10.0
    • None
    • None
    • Reviewed
    • Hide
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar.

      this patch addresses the following import modes:
      - import re, which will (if configured) find re on the filesystem in the jython install root
      - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc.
      - from pkg import *, which works when the cachedir is writable
      - import non.jvm.class, which works when the cachedir is writable
      - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
      Show
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar. this patch addresses the following import modes: - import re, which will (if configured) find re on the filesystem in the jython install root - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc. - from pkg import *, which works when the cachedir is writable - import non.jvm.class, which works when the cachedir is writable - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
    • jython, import

    Description

      Currently, Jython UDF script doesn't support Jython import statement as in the following example:

      #!/usr/bin/python
      
      import re
      @outputSchema("word:chararray")
      def resplit(content, regex, index):
              return re.compile(regex).split(content)[index]
      

      Can Pig automatically locate the Jython module file and ship it to the backend? Or should we add a ship clause to let user explicitly specify the module to ship?

      Attachments

        1. 1824.patch
          24 kB
          Woody Anderson
        2. 1824a.patch
          23 kB
          Woody Anderson
        3. 1824b.patch
          28 kB
          Woody Anderson
        4. 1824c.patch
          28 kB
          Woody Anderson
        5. 1824d.patch
          28 kB
          Woody Anderson
        6. 1824x.patch
          30 kB
          Woody Anderson
        7. TEST-org.apache.pig.test.TestGrunt.txt
          1.09 MB
          Alan Gates
        8. TEST-org.apache.pig.test.TestScriptLanguage.txt
          898 kB
          Alan Gates
        9. TEST-org.apache.pig.test.TestScriptUDF.txt
          192 kB
          Alan Gates
        10. 1824_final.patch
          30 kB
          Woody Anderson

        Issue Links

          Activity

            Unless, there is a java api provided by jython interpreter which lists all the dependencies of a jython script, trying to figure out all the module dependencies yourself will be close to writing a linker, isn't it? I think it will be easier to let user specify and ship his modules in the meanwhile.

            ashutoshc Ashutosh Chauhan added a comment - Unless, there is a java api provided by jython interpreter which lists all the dependencies of a jython script, trying to figure out all the module dependencies yourself will be close to writing a linker, isn't it? I think it will be easier to let user specify and ship his modules in the meanwhile.
            gates Alan Gates added a comment -

            +1 to Ashutosh's comment. Also, this won't port well as we add UDFs in new languages.

            gates Alan Gates added a comment - +1 to Ashutosh's comment. Also, this won't port well as we add UDFs in new languages.

            I don't think it is appropriate to just leave this up to the end user to figure this stuff out.

            Especially when the errors won't be discovered until the user attempts to run the code on the grid
            then must decipher the errors
            then must track down the individual dependency files
            then must try to figure out how to ship the necessary files
            then must try to track down why it still doesn't work because the import files contained dependencies on imported files
            then must track down the subsequent dependencies
            then ...

            If jython itself does not provide hooks to enumerate all dependencies after parsing, would it be possible to build a tool which recurses the imports and then provides information to the end user on how to package all the dependencies for ship (or better just does it).

            Couldn't this be a requirement for all language bindings to provide a method or script for enumerating all dependent files, even if the interpreter implementation in Java doesn't provide this functionality natively?

            ciemo David Ciemiewicz added a comment - I don't think it is appropriate to just leave this up to the end user to figure this stuff out. Especially when the errors won't be discovered until the user attempts to run the code on the grid then must decipher the errors then must track down the individual dependency files then must try to figure out how to ship the necessary files then must try to track down why it still doesn't work because the import files contained dependencies on imported files then must track down the subsequent dependencies then ... If jython itself does not provide hooks to enumerate all dependencies after parsing, would it be possible to build a tool which recurses the imports and then provides information to the end user on how to package all the dependencies for ship (or better just does it). Couldn't this be a requirement for all language bindings to provide a method or script for enumerating all dependent files, even if the interpreter implementation in Java doesn't provide this functionality natively?
            woody.anderson@gmail.com Woody Anderson added a comment -

            this code originally written cannot work:

            import re
            @outputSchema("y:bag{t:tuple(word:chararray)}")
            def strsplittobag(content,regex):
                    return re.compile(regex).split(content
            

            the reason is that split returns a list of strings, not a list of tuples, and jythonfunction casting will fail. i've created a ticket for these kinds of 'obvious' type coercions: https://issues.apache.org/jira/browse/PIG-1942

            and, as such, i am going to change the code for this ticket to something that will work when 'import re' works.

            woody.anderson@gmail.com Woody Anderson added a comment - this code originally written cannot work: import re @outputSchema( "y:bag{t:tuple(word:chararray)}" ) def strsplittobag(content,regex): return re.compile(regex).split(content the reason is that split returns a list of strings, not a list of tuples, and jythonfunction casting will fail. i've created a ticket for these kinds of 'obvious' type coercions: https://issues.apache.org/jira/browse/PIG-1942 and, as such, i am going to change the code for this ticket to something that will work when 'import re' works.
            woody.anderson@gmail.com Woody Anderson added a comment -

            This patch was developed against 0.8 trunk, i've tested in local mode and on the deployed grid.

            I deprecated PigContext.scriptFiles because it's odd to have two ways to add script files to the jar, and i wanted to have the patch work with 0.8 and any code that might be written against the public member variable 'scriptFiles'. That code will work with this patch, it'll just produce a deprecation warning.

            The name/file pair is needed to register files into the job jar with an appropriate relative path to allow jython to resolve it in the jar.

            woody.anderson@gmail.com Woody Anderson added a comment - This patch was developed against 0.8 trunk, i've tested in local mode and on the deployed grid. I deprecated PigContext.scriptFiles because it's odd to have two ways to add script files to the jar, and i wanted to have the patch work with 0.8 and any code that might be written against the public member variable 'scriptFiles'. That code will work with this patch, it'll just produce a deprecation warning. The name/file pair is needed to register files into the job jar with an appropriate relative path to allow jython to resolve it in the jar.
            woody.anderson@gmail.com Woody Anderson added a comment -

            here's the patch file.

            woody.anderson@gmail.com Woody Anderson added a comment - here's the patch file.
            woody.anderson@gmail.com Woody Anderson added a comment -

            The following may not be immediately self evident to all developers:

            import statements that execute from within runtime function calls will not work (unless the dependency has already been satisfied statically), eg:

            def resplit(content, regex, index):
             import re
             return re.compile(regex).split(content)[index]
            

            will not work b/c the import is not attempted until after the job has been defined, built, and deployed.
            This import practice is frowned upon and is used very rarely. If you happen to be doing it (i'll assume you have a good reason), then you probably know how to fix it. If you're using someone else's code that is written like this, you can satisfy the dependency by explicitly importing the module up front, this will cause it to be added to the jar, and subsequent uses will succeed.

            woody.anderson@gmail.com Woody Anderson added a comment - The following may not be immediately self evident to all developers: import statements that execute from within runtime function calls will not work (unless the dependency has already been satisfied statically), eg: def resplit(content, regex, index): import re return re.compile(regex).split(content)[index] will not work b/c the import is not attempted until after the job has been defined, built, and deployed. This import practice is frowned upon and is used very rarely. If you happen to be doing it (i'll assume you have a good reason), then you probably know how to fix it. If you're using someone else's code that is written like this, you can satisfy the dependency by explicitly importing the module up front, this will cause it to be added to the jar, and subsequent uses will succeed.
            woody.anderson@gmail.com Woody Anderson added a comment -

            This altered patch removes the explicit 'import re' test, as it relies on having a jython 2.5.0 install on disk and configured as visible to the runtime.

            test nested accomplishes the test of the mechanism in use by 'import re', so removing the explicit test is simply more portable.

            woody.anderson@gmail.com Woody Anderson added a comment - This altered patch removes the explicit 'import re' test, as it relies on having a jython 2.5.0 install on disk and configured as visible to the runtime. test nested accomplishes the test of the mechanism in use by 'import re', so removing the explicit test is simply more portable.
            gates Alan Gates added a comment -

            A couple of questions:

            1. Based on my analysis the static init block that this patch adds to JythonScriptingEngine will only get invoked once we know we have Jython in the mix. Is that correct? We don't want to be invoking this when Python UDFs or a Python control flow.
            2. Right now the code to do this is part of the init of the JythonScriptingEngine. Should we make this a separate method in ScriptEngine so that other languages can also add this kind of functionality? I would not make it abstract, since some languages may not be able to do this. But it seems like it makes for a cleaner interface.
            gates Alan Gates added a comment - A couple of questions: Based on my analysis the static init block that this patch adds to JythonScriptingEngine will only get invoked once we know we have Jython in the mix. Is that correct? We don't want to be invoking this when Python UDFs or a Python control flow. Right now the code to do this is part of the init of the JythonScriptingEngine. Should we make this a separate method in ScriptEngine so that other languages can also add this kind of functionality? I would not make it abstract, since some languages may not be able to do this. But it seems like it makes for a cleaner interface.
            woody.anderson@gmail.com Woody Anderson added a comment -

            1. i could re-work the initialization into the static block of the inner class "Interpreter", it simply needs to be done before the interpreter is allocated. I'm not sure what you mean by not wanting a cache dir when using python udfs or control flow? can you clarify?
            2. separate the logic out of init into what? I think it should, in general, be the contract of any script environment to handle resource inclusion (if possible). Are you imagining some scenario where init(file,..) would not actually parse/internalize the code inside init()? I don't much care where the code is parsed and added to a ScriptEngine, but when it is, it should handle all other evaluated resources that are necessary to succeed. In the current API, a user provided script file is given to init(), so that's where it must do this. There is really no other place to evaluate resource inclusions, and i think i might not be understanding your suggestion. As for other ScriptEngines that may not be able to support this concept, are you suggesting a "supportsFeature()" method that we use to test various SE's to determine if they can support this (or other) features? I'm not sure what we'd do with this knowledge.

            woody.anderson@gmail.com Woody Anderson added a comment - 1. i could re-work the initialization into the static block of the inner class "Interpreter", it simply needs to be done before the interpreter is allocated. I'm not sure what you mean by not wanting a cache dir when using python udfs or control flow? can you clarify? 2. separate the logic out of init into what? I think it should, in general, be the contract of any script environment to handle resource inclusion (if possible). Are you imagining some scenario where init(file,..) would not actually parse/internalize the code inside init()? I don't much care where the code is parsed and added to a ScriptEngine, but when it is, it should handle all other evaluated resources that are necessary to succeed. In the current API, a user provided script file is given to init(), so that's where it must do this. There is really no other place to evaluate resource inclusions, and i think i might not be understanding your suggestion. As for other ScriptEngines that may not be able to support this concept, are you suggesting a "supportsFeature()" method that we use to test various SE's to determine if they can support this (or other) features? I'm not sure what we'd do with this knowledge.
            gates Alan Gates added a comment -

            On the issue of the static block, I dislike static initialization blocks because you're never sure when they are going to be called. Someone adding "import o.a.p.s.j.JythonScriptingEngine" somewhere in the code will result in changing when this is executed, including possibly when it does not need to be executed. Just moving it into the Interpreter class as a static block won't change that I don't think. It can't be in Interpreter's constructor?

            On the second point, what I meant was, should there be a separate method ScriptEngine.includeResources()? This would make clear to developers of future scripting engines that this is something they need to do. The contract would then be that before Pig called ScriptingEngine.registerFunction it would call includeResources(). I agree with you that, when possible, all scripting engine implementations should include their resources. I was not suggesting a supportsFeature() method. For situations where it cannot be supported includeResources would be a NOP.

            gates Alan Gates added a comment - On the issue of the static block, I dislike static initialization blocks because you're never sure when they are going to be called. Someone adding "import o.a.p.s.j.JythonScriptingEngine" somewhere in the code will result in changing when this is executed, including possibly when it does not need to be executed. Just moving it into the Interpreter class as a static block won't change that I don't think. It can't be in Interpreter's constructor? On the second point, what I meant was, should there be a separate method ScriptEngine.includeResources()? This would make clear to developers of future scripting engines that this is something they need to do. The contract would then be that before Pig called ScriptingEngine.registerFunction it would call includeResources(). I agree with you that, when possible, all scripting engine implementations should include their resources. I was not suggesting a supportsFeature() method. For situations where it cannot be supported includeResources would be a NOP.
            woody.anderson@gmail.com Woody Anderson added a comment -

            ok. i understand your thoughts on static, and mostly i have them too, but the PythonInterpreter is a static member of the Interperter class, and the code i wrote must run BEFORE that interpreter is constructed.

            Interpeter is a private inner class, so it cannot be caused to load before normal use patterns. So, moving the static block into the static block for Interpreter addresses your concerns.

            import will not cause the static block to be executed btw, it's the first executed reference to the class. However, i take the point that some code could have been:

            Class = JythonScriptEngine.class;
            

            or something like that to cause the class to be loaded. Still, as i said: Interpreter static block addresses this, and the ctor is out b/c of the static nature of Interpreter.interpreter.

            on second point:
            i dont' see the point of a includeResources() method, if it can be done, it can be done in init(), if not it won't be done. Why add a new method?

            woody.anderson@gmail.com Woody Anderson added a comment - ok. i understand your thoughts on static, and mostly i have them too, but the PythonInterpreter is a static member of the Interperter class, and the code i wrote must run BEFORE that interpreter is constructed. Interpeter is a private inner class, so it cannot be caused to load before normal use patterns. So, moving the static block into the static block for Interpreter addresses your concerns. import will not cause the static block to be executed btw, it's the first executed reference to the class. However, i take the point that some code could have been: Class = JythonScriptEngine.class; or something like that to cause the class to be loaded. Still, as i said: Interpreter static block addresses this, and the ctor is out b/c of the static nature of Interpreter.interpreter. on second point: i dont' see the point of a includeResources() method, if it can be done, it can be done in init(), if not it won't be done. Why add a new method?
            woody.anderson@gmail.com Woody Anderson added a comment -

            static block moved to Interpreter

            woody.anderson@gmail.com Woody Anderson added a comment - static block moved to Interpreter
            julienledem Julien Le Dem added a comment -

            Hi Woody,
            This is a great feature.
            I agree with the static block comments, but I don't see how you could do it differently without a major refactoring of the existing code.
            Here are comments/questions about some details of the implementation.

            in JythonScriptEngine.Interpreter static block:

            • If PYTHON_CACHEDIR is provided, we will delete it on exit. Shouldn't we delete it only if it has been created by Pig? it is dangerous to delete something that we have not created. The user could shoot himself in the foot by providing something he cares about as the PYTHON_CACHEDIR.
            • Also, if we can't write to the provided PYTHON_CACHEDIR we create another one. Can the user pre-populate the cache dir? If yes we should throw an exception here.

            in JythonScriptEngine.Interpreter.init():

            • Something should fail if is is null.
              InputStream is = getScriptAsStream(path);
               if (is != null) {
              
            julienledem Julien Le Dem added a comment - Hi Woody, This is a great feature. I agree with the static block comments, but I don't see how you could do it differently without a major refactoring of the existing code. Here are comments/questions about some details of the implementation. in JythonScriptEngine.Interpreter static block: If PYTHON_CACHEDIR is provided, we will delete it on exit. Shouldn't we delete it only if it has been created by Pig? it is dangerous to delete something that we have not created. The user could shoot himself in the foot by providing something he cares about as the PYTHON_CACHEDIR . Also, if we can't write to the provided PYTHON_CACHEDIR we create another one. Can the user pre-populate the cache dir? If yes we should throw an exception here. in JythonScriptEngine.Interpreter.init(): Something should fail if is is null. InputStream is = getScriptAsStream(path); if (is != null ) {
            woody.anderson@gmail.com Woody Anderson added a comment -

            agree:

            inre: PYTHON_CACHEDIR: the code behaves as you wish, in that it only deletes the dir if it (pig) created it.
            sorry for not being being clear in comments about that, but if you read the code you'll see it.

            if we can't write, i (pig) was creating an alternate directory. It may be possible to pre-populate this, and i understand (and had) the desire to have an error instead of a new directory, but I was initially experiencing this error:

            *sys-package-mgr*: can't create package cache dir, '/grid/0/Releases/pig-0.8.0..1103222002-20110401-000/share/pig-0.8.0..1103222002/lib/cachedir/packages'
            

            which is why i added the 'is writable' check, but after reviewing (per your comment), it seems that cachedir is not set on the grid (at least at the point when the static block runs). If left as null, it seems to default to some grid location that is not writable (and thus doesn't work), but if i set it to a writable tmp first, it works.
            so.. i can safely agree that an error if the dir isn't writable is both desirable and works.

            as for the getScriptAsStream():
            i followed the existing code convention on that one, though i didn't like it either.
            again, if you read down a bit you'll see that the impl of getScriptAsStream() is:

            ..
                    if (is == null) {
                        throw new IllegalStateException(
                                "Could not initialize interpreter (from file system or classpath) with " + scriptPath);
                    }      
                    return is;
            

            so, the null check is superfluous but does quiet the "not null check" warnings.
            i didn't add an additional throw statement in this case b/c essentially, my code wouldn't add any new errors that the existing code didn't already exhibit if somehow the impl of getScriptAsStream changed and could return null.

            anyway, ill upload a new patch to address the writable issue, if you think it's a big deal we can add an 'else throw' statement around getScriptAsStream

            woody.anderson@gmail.com Woody Anderson added a comment - agree: inre: PYTHON_CACHEDIR: the code behaves as you wish, in that it only deletes the dir if it (pig) created it. sorry for not being being clear in comments about that, but if you read the code you'll see it. if we can't write, i (pig) was creating an alternate directory. It may be possible to pre-populate this, and i understand (and had) the desire to have an error instead of a new directory, but I was initially experiencing this error: *sys- package -mgr*: can 't create package cache dir, ' /grid/0/Releases/pig-0.8.0..1103222002-20110401-000/share/pig-0.8.0..1103222002/lib/cachedir/packages' which is why i added the 'is writable' check, but after reviewing (per your comment), it seems that cachedir is not set on the grid (at least at the point when the static block runs). If left as null, it seems to default to some grid location that is not writable (and thus doesn't work), but if i set it to a writable tmp first, it works. so.. i can safely agree that an error if the dir isn't writable is both desirable and works. as for the getScriptAsStream(): i followed the existing code convention on that one, though i didn't like it either. again, if you read down a bit you'll see that the impl of getScriptAsStream() is: .. if (is == null ) { throw new IllegalStateException( "Could not initialize interpreter (from file system or classpath) with " + scriptPath); } return is; so, the null check is superfluous but does quiet the "not null check" warnings. i didn't add an additional throw statement in this case b/c essentially, my code wouldn't add any new errors that the existing code didn't already exhibit if somehow the impl of getScriptAsStream changed and could return null. anyway, ill upload a new patch to address the writable issue, if you think it's a big deal we can add an 'else throw' statement around getScriptAsStream
            julienledem Julien Le Dem added a comment -

            Hi Woody,
            I had misread the code about automatic deletion. You're right it deletes only if it was created by Pig.

            I understand the superfluous null check and the warning being somewhat incorrect.
            To me there should be either no null check in that case or throw some exception if null. This is about debug-ability of the code. If someone changes the behavior of getScriptAsStream() there should be an exception in your code at that point. Not somewhere else. It also helps with understanding the code so that the reader does not wonder why it does nothing when the stream is null (because it's never null. But then why do we check ? etc)

            otherwise it looks good. Thanks!

            julienledem Julien Le Dem added a comment - Hi Woody, I had misread the code about automatic deletion. You're right it deletes only if it was created by Pig. I understand the superfluous null check and the warning being somewhat incorrect. To me there should be either no null check in that case or throw some exception if null. This is about debug-ability of the code. If someone changes the behavior of getScriptAsStream() there should be an exception in your code at that point. Not somewhere else. It also helps with understanding the code so that the reader does not wonder why it does nothing when the stream is null (because it's never null. But then why do we check ? etc) otherwise it looks good. Thanks!
            woody.anderson@gmail.com Woody Anderson added a comment -

            understood.
            adding that null check/throw etc. is just a change that is unrelated to this bug. I can bundle it up as all the related lines of code are being changed by this bug anyway, but that's why i didn't do it originally.

            I'll add a throw similar to current impl of getScriptAsStream

            woody.anderson@gmail.com Woody Anderson added a comment - understood. adding that null check/throw etc. is just a change that is unrelated to this bug. I can bundle it up as all the related lines of code are being changed by this bug anyway, but that's why i didn't do it originally. I'll add a throw similar to current impl of getScriptAsStream
            olgan Olga Natkovich added a comment -

            Given that we would like to release 0.9 pretty soon, lets delay this one till the follow up release

            olgan Olga Natkovich added a comment - Given that we would like to release 0.9 pretty soon, lets delay this one till the follow up release
            woody.anderson@gmail.com Woody Anderson added a comment -

            patch includes throw new IllegalStateException if the stream is null.

            woody.anderson@gmail.com Woody Anderson added a comment - patch includes throw new IllegalStateException if the stream is null.
            woody.anderson@gmail.com Woody Anderson added a comment -

            i'm not sure what's really left to keep this out of the next release, given we've been going back an forth over issues that don't even affect functionality.
            but, there are other jython related bugs in the pipe for 0.10 anyway, so perhaps having them all in the same release is a good idea for a feature grouping perspective.

            woody.anderson@gmail.com Woody Anderson added a comment - i'm not sure what's really left to keep this out of the next release, given we've been going back an forth over issues that don't even affect functionality. but, there are other jython related bugs in the pipe for 0.10 anyway, so perhaps having them all in the same release is a good idea for a feature grouping perspective.
            julienledem Julien Le Dem added a comment -

            +1 for inclusion for me.
            Thanks for including the comments Woody.

            julienledem Julien Le Dem added a comment - +1 for inclusion for me. Thanks for including the comments Woody.
            gates Alan Gates added a comment -

            I'll start running the tests and such. I also want to add some end to end tests.

            gates Alan Gates added a comment - I'll start running the tests and such. I also want to add some end to end tests.
            gates Alan Gates added a comment -

            Woody,

            This patch now conflicts with the changes that were checked in as part of PIG-2056. I don't understand how to resolve the conflicts. You could upload a new patch or just tell me how to do the resolution so I can continue testing.

            gates Alan Gates added a comment - Woody, This patch now conflicts with the changes that were checked in as part of PIG-2056 . I don't understand how to resolve the conflicts. You could upload a new patch or just tell me how to do the resolution so I can continue testing.
            woody.anderson@gmail.com Woody Anderson added a comment -

            patch for trunk

            woody.anderson@gmail.com Woody Anderson added a comment - patch for trunk
            gates Alan Gates added a comment -

            I ran the unit tests and saw issues with most of the python oriented tests. I'll attach the logs from the failing tests.

            gates Alan Gates added a comment - I ran the unit tests and saw issues with most of the python oriented tests. I'll attach the logs from the failing tests.
            woody.anderson@gmail.com Woody Anderson added a comment -

            hmm.. i ran each of those tests via:

            ant -noclasspath test -Dtestcase=org.apache.pig.test.TestScriptUDF
            etc. and they all passed.

            is your environment clean?
            % printenv | grep YTHON
            (should be empty)

            is there anything else i should be doing to try to mirror your test framework (while not having to run all tests for the 18 hours that that requires)?

            woody.anderson@gmail.com Woody Anderson added a comment - hmm.. i ran each of those tests via: ant -noclasspath test -Dtestcase=org.apache.pig.test.TestScriptUDF etc. and they all passed. is your environment clean? % printenv | grep YTHON (should be empty) is there anything else i should be doing to try to mirror your test framework (while not having to run all tests for the 18 hours that that requires)?
            woody.anderson@gmail.com Woody Anderson added a comment -

            ok. my bad!

            testcase=full.package.path doesn't even run the test, so tho i claimed that the tests were passing, it was in fact simply that junit could run.

            Here's a new patch:
            there was an extra line that i mistakenly didn't delete when creating the re-trunked code.

            this patch will pass the tests

            woody.anderson@gmail.com Woody Anderson added a comment - ok. my bad! testcase=full.package.path doesn't even run the test, so tho i claimed that the tests were passing, it was in fact simply that junit could run. Here's a new patch: there was an extra line that i mistakenly didn't delete when creating the re-trunked code. this patch will pass the tests
            rding Richard Ding added a comment -

            The new patch fixed the unit test errors reported earlier. I have one (different) failed test in TestGrunt, not sure if it's related to the patch.

            rding Richard Ding added a comment - The new patch fixed the unit test errors reported earlier. I have one (different) failed test in TestGrunt, not sure if it's related to the patch.
            woody.anderson@gmail.com Woody Anderson added a comment -

            cool. can we get this into trunk so i don't have to keep fixing the patches?

            woody.anderson@gmail.com Woody Anderson added a comment - cool. can we get this into trunk so i don't have to keep fixing the patches?
            olgan Olga Natkovich added a comment -

            I believe that Richard is running some additional tests. Once he is done, he is planning to commit the patch

            olgan Olga Natkovich added a comment - I believe that Richard is running some additional tests. Once he is done, he is planning to commit the patch
            rding Richard Ding added a comment -

            Patch passed e2e python tests.

            rding Richard Ding added a comment - Patch passed e2e python tests.
            olgan Olga Natkovich added a comment -

            Lets get it committed! Thanks, Woody for contributing!

            olgan Olga Natkovich added a comment - Lets get it committed! Thanks, Woody for contributing!
            rding Richard Ding added a comment -

            patch committed to trunk. Thanks Woody!

            rding Richard Ding added a comment - patch committed to trunk. Thanks Woody!
            russell.jurney Russell Jurney added a comment -

            This does not actually work for me, in either Pig 0.10 or Pig 0.10.1. I can't include the 're' module via 'import re', or I get this error:

            Caused by: Traceback (most recent call last):
            File "udfs.py", line 20, in <module>
            import re
            ImportError: No module named re

            russell.jurney Russell Jurney added a comment - This does not actually work for me, in either Pig 0.10 or Pig 0.10.1. I can't include the 're' module via 'import re', or I get this error: Caused by: Traceback (most recent call last): File "udfs.py", line 20, in <module> import re ImportError: No module named re
            martin.gerlach Martin Gerlach added a comment -

            Doesn't work for me, either (with codecs module). Pig version is 0.10.0-cdh4.1.2

            martin.gerlach Martin Gerlach added a comment - Doesn't work for me, either (with codecs module). Pig version is 0.10.0-cdh4.1.2

            You need to have jython/Lib directory in the classpath. We bundle it with our deployment. Else need to have jython-standalone.jar instead of jython.jar as in Pig 0.11.

            rohini Rohini Palaniswamy added a comment - You need to have jython/Lib directory in the classpath. We bundle it with our deployment. Else need to have jython-standalone.jar instead of jython.jar as in Pig 0.11.

            to use a jython install, the Lib dir must be in the jython search path

            • via env variable JYTHON_HOME=jy_home or JYTHON_PATH=jy_home/Lib:... or
            • jython-standalone.jar should be in the classpath
            rohini Rohini Palaniswamy added a comment - to use a jython install, the Lib dir must be in the jython search path via env variable JYTHON_HOME=jy_home or JYTHON_PATH=jy_home/Lib:... or jython-standalone.jar should be in the classpath

            People

              woody.anderson@gmail.com Woody Anderson
              rding Richard Ding
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: