Description
In the Common Crawl WARC files I've observed that the If-modified-since header is sent in varying time zones:
If-Modified-Since: Tue, 25 Feb 2020 03:33:21 MSK If-Modified-Since: Sun, 22 Sep 2019 04:41:48 GMT If-Modified-Since: Mon, 18 Nov 2019 12:06:19 KRAT If-Modified-Since: Tue, 21 Jan 2020 02:10:22 UTC If-Modified-Since: Fri, 18 Oct 2019 20:23:57 BST If-Modified-Since: Sun, 20 Oct 2019 08:39:26 CEST If-Modified-Since: Fri, 15 Nov 2019 12:56:38 EST If-Modified-Since: Mon, 30 Mar 2020 09:10:33 GMT If-Modified-Since: Mon, 30 Mar 2020 05:18:36 GMT If-Modified-Since: Fri, 28 Feb 2020 03:09:16 PST If-Modified-Since: Thu, 21 Nov 2019 10:16:19 YEKT If-Modified-Since: Thu, 14 Nov 2019 18:01:05 EET If-Modified-Since: Thu, 14 Nov 2019 16:46:43 UTC If-Modified-Since: Sun, 17 Nov 2019 13:14:28 UTC If-Modified-Since: Tue, 25 Feb 2020 21:46:10 GMT If-Modified-Since: Wed, 16 Oct 2019 19:03:31 UTC If-Modified-Since: Thu, 14 Nov 2019 09:07:13 EST If-Modified-Since: Thu, 09 Apr 2020 12:21:53 EEST If-Modified-Since: Sat, 28 Mar 2020 19:08:52 CET If-Modified-Since: Sun, 23 Feb 2020 12:22:46 CET If-Modified-Since: Mon, 21 Oct 2019 03:18:16 PDT If-Modified-Since: Fri, 15 Nov 2019 05:41:44 UTC If-Modified-Since: Thu, 09 Apr 2020 21:01:32 CEST If-Modified-Since: Wed, 11 Dec 2019 11:18:28 KRAT If-Modified-Since: Tue, 22 Oct 2019 18:55:54 GMT
This actually happens because the time zone of HttpDateFormat's internal SimpleDateFormatter may change when a date is parsed. The next formatting uses the time zone of the last parsed date.
The usage of "GMT" as time zone is specified in sec. 7.1.1.1 of RFC 7231.
Attachments
Issue Links
- links to