Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
ManifoldCF 0.1, ManifoldCF 0.2, ManifoldCF 0.3, ManifoldCF 0.4, ManifoldCF 0.5
-
None
Description
If you create a job for the web connector and enter an invalid URL into the seeds list, any value is accepted. An error message should be returned to the user in order to prevent invalid seeds.
Attachments
Attachments
- CONNECTORS-430.patch
- 3 kB
- Karl Wright
- CONNECTORS-430.patch
- 3 kB
- Erlend Garåsen
Activity
Some comments.
First, it is not a good idea to include quotation marks in translated messages. The quotation marks belong as part of the Javascript.
Second, the following Javascript constructs are unsupported by the browser simulator:
- typeof() - it looks like you could just remove this, however.
- toString() - once again, looks like this could be just removed.
- split() - easy to add.
- the "+=" operator - can be added, or you might more readily just rephrase the expression to not use it
If we shouldn't include quotation marks in translated messages, we should change similar occurrences as well, for instance:
editjob.ExpirationIntervalMustBeAValidIntegerOrNull="Expiration interval must be a valid integer or null"
Do you think we should create another ticket about these quotation marks as well?
Is it difficult to add more functionality to the browser simulator or is it a better approach to follow some strict rules when writing JavaScript? If there are so many restrictions, I think it is a good idea to provide a list of safe JavaScript functions to use. My apologizes if such a list exists.
It would be great if the browser simulator could support the split function. Otherwise the JavaScript code will be harder to read due to a lot more lines of code.
If we shouldn't include quotation marks in translated messages, we should change similar occurrences as well, for instance:
editjob.ExpirationIntervalMustBeAValidIntegerOrNull="Expiration interval must be a valid integer or null"
Do you think we should create another ticket about these quotation marks as well?
Yes, I thought I'd fixed all of those, but I must have missed some.
Is it difficult to add more functionality to the browser simulator or is it a better approach to follow some strict rules when writing JavaScript? If there are so many restrictions, I think it is a good idea to provide a list of safe JavaScript functions to use. My apologizes if such a list exists.
I haven't created a list, because one adds things as one needs to to the simulator. If you have time you can certainly look at it. It is in general not difficult to add certain things (methods of already existing objects), but it is harder to add others (such as whole new operations).
It would be great if the browser simulator could support the split function. Otherwise the JavaScript code will be harder to read due to a lot more lines of code.
I can try adding that this evening.
I can try adding that this evening.
Great! It is possible to rewrite everything in my JavaScript code but the split function, so this will make it much more easier and the code will be more readable.
Good news; I added it a while ago and forgot about it. So you should be all set.
Adds an error message with a list of invalid URLs in the seeds list, based on JavaScript.
I have created a patch which adds URL validation to the seeds list. The regular expression for URL validation is not perfect, but IP addresses and localhosts are accepted in addition to domains. If it looks ok, I can committ my changes with Japaneese translation of the error message as well.
Hi Erlend,
All looks good except the check for editjob.seeds == undefined. That's not supported. The whole conditional should be unneeded because there is always a "editjobs.seeds.value" available, either as a hidden or as a text box.
The whole conditional should be unneeded because there is always a "editjobs.seeds.value" available, either as a hidden or as a text box.
I'm afraid not, and that's the reason why I added this check. Sorry, I thought it was the typeof function that wasn't supported by the browser simulator. Anyway, I'll double-check the requirement for this check later. If it's required, I'll add the necessary hidden element instead.
I'm afraid not...
All tabs are coded so that either the displayed form element is output, or instead a corresponding hidden is output. The web connector's tabs are no different. The code is:
if (tabName.equals(Messages.getString(locale,"WebcrawlerConnector.Seeds"))) { out.print( "<table class=\"displaytable\">\n"+ " <tr><td class=\"separator\" colspan=\"2\"><hr/></td></tr>\n"+ " <tr>\n"+ " <td class=\"value\" colspan=\"2\">\n"+ " <textarea rows=\"25\" cols=\"80\" name=\"seeds\">"+org.apache.manifoldcf.ui.util.Encoder.bodyEscape(seeds)+"</textarea>\n"+ " </td>\n"+ " </tr>\n"+ "</table>\n" ); } else { out.print( "<input type=\"hidden\" name=\"seeds\" value=\""+org.apache.manifoldcf.ui.util.Encoder.attributeEscape(seeds)+"\"/>\n" ); }
If I comment out the if check, I get the following error in my Error console (Firefox):
Timestamp: 22.03.12 19.25.43
Error: editjob.seeds is undefined
Source File: http://localhost:8345/mcf-crawler-ui/editjob.jsp
Line: 243
This happens when I press the Connection tab just after I have named my new job. I think it is due to a missing hidden element, something I will figure out tomorrow.
Ah, I see the problem - you are putting your check in the main UI javascript, not in the javascript that the web connector is supposed to provide.
You should be editing the method WebCrawlerConnector.outputSpecificationHeader(), not editing the UI JSP.
So please don't commit this change in its current form.
Thanks for clarifying. This was also the reason why I created a patch in the first place.
Is the replace command supported by the browser simulator? I found a problem while testing my changes. Leading and trailing spaces should be removed in order to avoid URL formatted white spaces in the hidden seeds field. Otherwise, an annoying alert box pops up if you first enter a blank line in the seeds list, select another tab and then tries to select another tab again. The following seems to work inside my function:
editjob.seeds.value = editjob.seeds.value.replace(/^\s*/, "").replace(/\s*$/, "");
It looks like replace is supported:
def get_value( self, member_name ): if member_name == "indexOf": return JSIndexOf( self ) if member_name == "charAt": return JSCharAt( self ) if member_name == "length": return JSNumber( len(self.value) ) if member_name == "search": return JSSearch( self ) if member_name == "replace": return JSReplace( self ) if member_name == "split": return JSSplit( self ) if member_name == "substring": return JSSubstring( self ) return JSObject.get_value( self, member_name )
Can you attach the trace you are seeing?
You mean the content of the hidden seeds field if it contains white spaces?
<input type="hidden" name="seeds" value=" http://www.uio.no/"/>
You mean the content of the hidden seeds field if it contains white spaces?
No, I think I misunderstood you when you said:
I found a problem while testing my changes.
I thought by "problem" you meant that the UI test failed. Clearly you meant that you wanted to use replace, but hadn't yet.
Clearly you meant that you wanted to use replace, but hadn't yet.
Exactly. I will probably commit my changes within few hours unless I find another problem. Then I will reassign the ticket to Hitoshi in order to add a Japanese translation of the error message.
Before you commit, can you try:
ant run-webcrawler-UI-tests-derby
... to make sure the UI tests continue to pass? You will first need to install the latest version of python 2.x on your system of course.
Thanks!
The UI tests actually failed. I will take a closer look tomorrow in order to find the reason. BTW, I already have Python 2.6 installed. Do you think it's necessary to upgrade to version 2.7?
Everything seems to work fine by using Firefox, Opera and Safari.
[junit] File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py", line 714, in send
[junit] TypeError: fromfile() takes exactly 2 arguments (1 given)
[junit] 2012-03-26 21:49:26.260:INFO::Stopped SocketConnector@0.0.0.0:8346
[junit] ------------- ---------------- ---------------
[junit] Testcase: createConnectionsAndJob(org.apache.manifoldcf.webcrawler_tests.NavigationDerbyUI): Caused an ERROR
[junit] UI test failed; error code: 1
[junit] java.lang.Exception: UI test failed; error code: 1
[junit] at org.apache.manifoldcf.core.tests.HTMLTester.executeTest(HTMLTester.java:183)
[junit] at org.apache.manifoldcf.webcrawler_tests.NavigationDerbyUI.createConnectionsAndJob(NavigationDerbyUI.java:282)
I have never seen this before; the tester was originally developed on python 2.3 and has worked ever since. I use 2.7 at the moment.
The same error returns if I check out a new version from trunk and run the UI test, so the failure is not caused by my changes. I will try to upgrade to Python 2.7 later this evening and/or try to run the same test on Linux instead of OS X.
I think it is possible that you have more than one version of python rattling around and it is picking up the wrong one. Try "which python" to see if you are getting the one you think.
You can also attach a new patch and I can see if the UI tests complete here on my system.
The UI test ran successfully on Linux with version 2.4.3 of Python.
I have attached a new patch. If it looks OK and that the UI test passes on your computer, I suggest that I commit my changes and later try to find the reason why these tests do not run on my mac. I don't think I have several versions of Python installed.
erlend-garasens-macbook-pro:~ erlendfg$ which python /usr/bin/python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
For me it fails:
[junit] Multipart posting url 'http://localhost:8346/mcf-crawler-ui/execute.jsp' with parameters 'outputname=MyOutputConnection&index=&recrawlinterval=1440&description=MyJob&startmethod=2&expirationinterval=&jobid=1332885945254&priority=5&reseedinterval=60&tabname=Connection&connectionname=MyRepositoryConnection&schedulerecords=0&scheduletype=1&type=job&op=Continue' and 0 files... [junit] Traceback (most recent call last): [junit] File "test.py", line 166, in <module> [junit] var124.click() [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 120, in click [junit] self.get_form( ).execute_javascript_expression( self.onclick ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 672, in execute_javascript_expression [junit] return self.window_instance.execute_javascript_expression( javascript ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 920, in execute_javascript_expression [junit] return tokenstream.evaluate_expr( self.jscontext, "HTML" ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1085, in evaluate_expr [junit] rval = self.evaluate_expr1( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1120, in evaluate_expr1 [junit] rval = self.evaluate_expr2( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1164, in evaluate_expr2 [junit] return self.evaluate_expr3( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1176, in evaluate_expr3 [junit] rval = self.evaluate_expr4( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1264, in evaluate_expr4 [junit] rval = self.evaluate_expr5( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1299, in evaluate_expr5 [junit] rval = self.evaluate_expr6( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1341, in evaluate_expr6 [junit] return self.evaluate_expr7( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1438, in evaluate_expr7 [junit] return reference_object.call( arguments, context ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 554, in call [junit] return self.get_referenced_object().call(argset,context) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 186, in call [junit] response = ts.evaluate_statement( context, "method %s" % self.name ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 752, in evaluate_statement [junit] result = self.evaluate_statement( newscope, place ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 988, in evaluate_statement [junit] if self.evaluate_expr( context, place ) == None: [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1085, in evaluate_expr [junit] rval = self.evaluate_expr1( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1120, in evaluate_expr1 [junit] rval = self.evaluate_expr2( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1164, in evaluate_expr2 [junit] return self.evaluate_expr3( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1176, in evaluate_expr3 [junit] rval = self.evaluate_expr4( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1264, in evaluate_expr4 [junit] rval = self.evaluate_expr5( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1299, in evaluate_expr5 [junit] rval = self.evaluate_expr6( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1341, in evaluate_expr6 [junit] return self.evaluate_expr7( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1438, in evaluate_expr7 [junit] return reference_object.call( arguments, context ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 554, in call [junit] return self.get_referenced_object().call(argset,context) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 186, in call [junit] response = ts.evaluate_statement( context, "method %s" % self.name ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 752, in evaluate_statement [junit] result = self.evaluate_statement( newscope, place ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 810, in evaluate_statement [junit] rval = self.evaluate_statement( context, place ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 752, in evaluate_statement [junit] result = self.evaluate_statement( newscope, place ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 988, in evaluate_statement [junit] if self.evaluate_expr( context, place ) == None: [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1085, in evaluate_expr [junit] rval = self.evaluate_expr1( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1120, in evaluate_expr1 [junit] rval = self.evaluate_expr2( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1164, in evaluate_expr2 [junit] return self.evaluate_expr3( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1176, in evaluate_expr3 [junit] rval = self.evaluate_expr4( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1264, in evaluate_expr4 [junit] rval = self.evaluate_expr5( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1299, in evaluate_expr5 [junit] rval = self.evaluate_expr6( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1341, in evaluate_expr6 [junit] return self.evaluate_expr7( context, place, parse_only ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1438, in evaluate_expr7 [junit] return reference_object.call( arguments, context ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 554, in call [junit] return self.get_referenced_object().call(argset,context) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 1291, in call [junit] self.form_instance.submit( ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 706, in submit [junit] self.window_instance.execute_action( self.method, variables, files, self.action_url ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 928, in execute_action [junit] return self.browser_instance.execute_action( self.window_name, method, parameters, files, self.resolve( url ) ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 1069, in execute_action [junit] self.reload_window( window_name, window_data, url ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 1032, in reload_window [junit] self.build_window( window_name, window_data, old_window.get_parent_window( ), full_url, old_window.get_dialog_answers( ) ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 1038, in build_window [junit] new_window = VirtualWindow( self, window_name, window_data, parent_window, current_url ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 785, in __init__ [junit] parser.feed( data ) [junit] File "/usr/lib/python2.7/HTMLParser.py", line 109, in feed [junit] self.goahead(0) [junit] File "/usr/lib/python2.7/HTMLParser.py", line 153, in goahead [junit] k = self.parse_endtag(i) [junit] File "/usr/lib/python2.7/HTMLParser.py", line 327, in parse_endtag [junit] self.handle_endtag(tag.lower()) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 1497, in handle_endtag [junit] self.end_script( ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 1823, in end_script [junit] self.window_instance.accept_javascript( javascript_text ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 965, in accept_javascript [junit] jstokens.evaluate_statement_list( self.jscontext ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 720, in evaluate_statement_list [junit] self.evaluate_statement( context, place ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 979, in evaluate_statement [junit] self.skip_statement( ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1017, in skip_statement [junit] self.skip_statement( ) [junit] File "/home/kwright/wip/trunk/tests/webcrawler/test-derby-output/Javascript.py", line 1053, in skip_statement [junit] raise Exception("Unexpected end of statement; need semicolon") [junit] Exception: Unexpected end of statement; need semicolon [junit] 2012-03-27 18:05:50.226:INFO::Stopped SocketConnector@0.0.0.0:8346 [junit] ------------- ---------------- --------------- [junit] Testcase: createConnectionsAndJob(org.apache.manifoldcf.webcrawler_tests.NavigationDerbyUI): Caused an ERROR [junit] UI test failed; error code: 1 [junit] java.lang.Exception: UI test failed; error code: 1 [junit] at org.apache.manifoldcf.core.tests.HTMLTester.executeTest(HTMLTester.java:183) [junit] at org.apache.manifoldcf.webcrawler_tests.NavigationDerbyUI.createConnectionsAndJob(NavigationDerbyUI.java:282) [junit] [junit] BUILD FAILED
This looks like a Javascript syntax issue - there's a missing semicolon.
Further research shows that the problem is due to the tester and is twofold:
(1) Only while loops are supported
(2) The ++ operator is not supported
So if you change the for loop to a while, and use i = i+1 instead of i++, the test should pass.
After fixing the above problems, I'm still getting errors. I improved the error output and now I see this:
[junit] Exception: Unexpected end of statement; unknown tokens: '['Symbol: var', 'Symbol: regexp', 'Punctuation: =', 'Regexp: http(s)?:\\/\\/([a-z0-9+!*(),;?&=\\$_.-]+(\\:[a-z0-9+!*(),;?&=\\$_.-]+)?@)?[a-z0-9+\\$_-]+(\\.[a-z0-9+\\$_-]+)*(\\:[0-9]{2,5})?(\\/([a-z0-9+\\$_-]\\.?)+)*\\/?(\\?[a-z+&\\$_.-][a-z0-9;:@()', 'Punctuation: &', 'Punctuation: %', 'Punctuation: =', 'Punctuation: +', 'Punctuation: \\', 'Punctuation: $', 'Symbol: _', 'Punctuation: .', 'Punctuation: -', 'Punctuation: ]', 'Punctuation: *', 'Punctuation: )', 'Punctuation: ?', 'Punctuation: (', 'Punctuation: #', 'Punctuation: [', 'Symbol: a', 'Punctuation: -', 'Symbol: z_', 'Punctuation: .', 'Punctuation: -', 'Punctuation: ]', 'Punctuation: [', 'Symbol: a', 'Punctuation: -', 'Symbol: z0', 'Punctuation: -', 'Int: 9', 'Punctuation: +', 'Punctuation: \\', 'Punctuation: $', 'Symbol: _', 'Punctuation: .', 'Punctuation: -', 'Punctuation: ]', 'Punctuation: *', 'Punctuation: )', 'Punctuation: ?', 'Regexp: ;\n var lines = editjob.seeds.value.split("\\n");\n var ...
It looks like the escaping of the large regular expression is incorrect, or the parsing of the regular expression is incorrect in the tester. Looking further...
Fixed a problem in the tester. Now I get this, which could well be just a test error:
[junit] ALERT: Invalid URLs in seeds list:n.comn
[junit] FOCUS: On field 'seeds'
It's not a test error. The test line uses a full URL, which is not appropriately flagged:
textarea.setValue(testerInstance.createStringDescription("http://www.cnn.com"));
I'm going to attach the patch as I currently have it and debug some more tomorrow.
Thanks, I will also try to debug some more today if I get sufficient time. By the way, the === operator might not be supported as well? I can try to change it to == if it helps.
I changed the === to == already in the updated patch. The remaining problem is not parsing related, but rather related to the regular expression, I think. Either the tester is not processing the regular expression properly or the actual regular expression is incorrect. But in any case I won't be able to look at this again until tonight.
I have got rid of the annoying error after I upgraded to version 2.7.2 of Python. Now I get a different error message, but I'm afraid that it is not related to an invalid regular expression. I simplified the regexp just in case, but it still fails.
var regexp = /http(s)?:\/\/.*/;
I think the problem is related to the variable declaration above, i.e. the browser simulator think it is an invalid variable declaration.
[junit] File "/Users/erlendfg/tmp/mcf_2012/tests/webcrawler/test-derby-output/Javascript.py", line 790, in evaluate_statement [junit] raise Exception("Didn't find expected ';' at end of var statement, saw %s, in %s" % (unicode(token),place)) [junit] Exception: Didn't find expected ';' at end of var statement, saw Punctuation: ., in method check_seedsList [junit] 2012-03-30 15:40:22.530:INFO::Stopped SocketConnector@0.0.0.0:8346
My intention is to solve the problem before I fly to Azerbaijan tomorrow morning.
Did you svn update? I fixed several issues in the simulator a while back, and this was one of them.
Now I did a svn up. I noticed a few things. If I keep my simplified shortened regexp mentioned above, I get the following error:
[junit] File "/Users/erlendfg/tmp/mcf_2012/tests/webcrawler/test-derby-output/VirtualBrowser.py", line 870, in find_button [junit] raise Exception("Can't find button %s on page %s" % (alt,self.current_url)) [junit] Exception: Can't find button Add url regexp on page http://localhost:8346/mcf-cr ...
If I remove the regexp var and do the following:
if (! /http(s)?:\/\/.*/.test(line))
I get:
[junit] File "/Users/erlendfg/tmp/mcf_2012/tests/webcrawler/test-derby-output/Javascript.py", line 72, in get_referenced_object [junit] raise Exception("Object %s has no legal object reference" % unicode(self)) [junit] Exception: Object <Javascript.JSRegexp instance at 0x1005a03b0> has no legal object reference [junit] 2012-03-30 18:12:52.429:INFO::Stopped SocketConnector@0.0.0.0:8346
If I comment out the regexp check (two lines), the test passes. I guess it is something about the test() function which does not play well with the browser simulator.
The first error is the same as what I get, but if you go back further in the output of the test you will see this:
[junit] ALERT: Invalid URLs in seeds list:n.comn
[junit] FOCUS: On field 'seeds'
That's the smoking gun of what is actually going wrong: the page is not reloading because there's an alert (which outputs that busted message) and there's a subsequent focus(), but no submit occurs. So the problem is that the regexp is not matching.
About the only way to chase this down will be to dump the regexp as the browser simulator sees it during the "test" method. The appropriate place to look at the code is in framework/core/src/test/resources/org/apache/manifoldcf/core/tests/Javascript.py.
I just added another fix to Javascript.py that should allow your second syntactical construction above to work properly.
After yet a third fix to the virtual browser, the test passed, so I committed the code.
r1309631
Great! Perhaps you should reassign the issue to Hitoshi in order to add the Japanese error message before we close it.
This can be fixed by using JavaScript which performs an url validation. I have entered the following in editjobs.jsp (vrawler-ui) which seems to work. This code is not finished since some URLs (localhost) are not accepted.