Details
-
Bug
-
Status: Resolved
-
P1
-
Resolution: Fixed
-
None
-
None
Description
Summary
An existing virtualenv directory should be cleared before creating a new one.
Problem Description
A virtualenv directory name for Python tasks is generated from the hash of the project path so any tasks that have the same project path share the same virtualenv directory. The problem is that when setupVirtualenv task initializes a new virtualenv directory it doesn't overwrite an existing data. This can cause a subtle bug which is very hard to debug. See the following example:
❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.8 Configuration on demand is an incubating feature. > Task :sdks:python:setupVirtualenv > Task :sdks:python:sdist > Task :sdks:python:installGcpTest Successfully installed apache-beam-2.37.0.dev0 atomicwrites-1.4.0 attrs-21.4.0 azure-core-1.21.1 azure-storage-blob-12.9.0 boto3-1.20.41 botocore-1.23.41 cachetools-4.2.4 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.10 cloudpickle-2.0.0 crcmod-1.7 cryptography-36.0.1 deprecation-2.1.0 dill-0.3.1.1 docker-5.0.3 docopt-0.6.2 execnet-1.9.0 fastavro-1.4.9 fasteners-0.17.2 freezegun-1.1.0 google-api-core-1.31.5 google-apitools-0.5.31 google-auth-1.35.0 google-cloud-bigquery-2.32.0 google-cloud-bigquery-storage-2.11.0 google-cloud-bigtable-1.7.0 google-cloud-core-1.7.2 google-cloud-datastore-1.15.3 google-cloud-dlp-3.5.0 google-cloud-language-1.3.0 google-cloud-pubsub-2.9.0 google-cloud-pubsublite-1.3.0 google-cloud-recommendations-ai-0.2.0 google-cloud-spanner-1.19.1 google-cloud-videointelligence-1.16.1 google-cloud-vision-1.0.0 google-crc32c-1.3.0 google-resumable-media-2.1.0 googleapis-common-protos-1.54.0 greenlet-1.1.2 grpc-google-iam-v1-0.12.3 grpcio-gcp-0.2.2 grpcio-status-1.43.0 hdfs-2.6.0 httplib2-0.19.1 idna-3.3 isodate-0.6.1 jmespath-0.10.0 libcst-0.4.0 mock-2.0.0 more-itertools-8.12.0 msrest-0.6.21 mypy-extensions-0.4.3 numpy-1.21.5 oauth2client-4.1.3 oauthlib-3.1.1 orjson-3.6.5 overrides-6.1.0 pandas-1.3.5 parameterized-0.7.5 pbr-5.8.0 pluggy-0.13.1 proto-plus-1.19.8 psycopg2-binary-2.9.3 pyarrow-6.0.1 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 pydot-1.4.2 pyhamcrest-1.10.1 pymongo-3.12.3 pyparsing-2.4.7 pytest-4.6.11 pytest-forked-1.4.0 pytest-timeout-1.4.2 pytest-xdist-1.34.0 python-dateutil-2.8.2 pytz-2021.3 pyyaml-6.0 requests-2.27.1 requests-mock-1.9.3 requests-oauthlib-1.3.0 rsa-4.8 s3transfer-0.5.0 sqlalchemy-1.4.31 tenacity-5.1.5 testcontainers-3.4.2 typing-extensions-3.10.0.2 typing-inspect-0.7.1 typing-utils-0.1.0 urllib3-1.26.8 wcwidth-0.2.5 websocket-client-1.2.3 wrapt-1.13.3 > Task :sdks:python:wordCount INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds. INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds. INFO:oauth2client.transport:Attempting refresh to obtain initial access_token INFO:oauth2client.client:Refreshing access_token WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter. INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.37.0.dev INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function annotate_downstream_side_inputs at 0x122f479d0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function fix_side_input_pcoll_coders at 0x122f47af0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function pack_combiners at 0x122f48040> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function lift_combiners at 0x122f480d0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function expand_sdf at 0x122f48280> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function expand_gbk at 0x122f48310> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function sink_flattens at 0x122f48430> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function greedily_fuse at 0x122f484c0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function read_to_impulse at 0x122f48550> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function impulse_to_input at 0x122f485e0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function sort_stages at 0x122f48820> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function setup_timer_mapping at 0x122f48790> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function populate_data_channel_coders at 0x122f488b0> ==================== INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100 INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler object at 0x122fdeca0> for environment ref_Environment_default_environment_1 (beam:env:embedded_python:v1, b'') INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (((((ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Impulse_19)+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-FlatMap-lambda-at-core-py-3228-_20))+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Map-decode-_22))+(ref_AppliedPTransform_Write-Write-WriteImpl-InitializeWrite_23))+(ref_PCollection_PCollection_11/Write))+(ref_PCollection_PCollection_12/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running ((((ref_AppliedPTransform_Read-Read-Impulse_4)+(ref_AppliedPTransform_Read-Read-Map-lambda-at-iobase-py-898-_5))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction))+(ref_PCollection_PCollection_2_split/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (((((ref_PCollection_PCollection_2_split/Read)+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/Process))+(ref_AppliedPTransform_Split_8))+(ref_AppliedPTransform_PairWIthOne_9))+(GroupAndSum/Precombine))+(GroupAndSum/Group/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (((((((GroupAndSum/Group/Read)+(GroupAndSum/Merge))+(GroupAndSum/ExtractOutputs))+(ref_AppliedPTransform_Format_14))+(ref_AppliedPTransform_Write-Write-WriteImpl-WindowInto-WindowIntoFn-_24))+(ref_AppliedPTransform_Write-Write-WriteImpl-WriteBundles_25))+(ref_AppliedPTransform_Write-Write-WriteImpl-Pair_26))+(Write/Write/WriteImpl/GroupByKey/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running ((Write/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-Extract_28))+(ref_PCollection_PCollection_17/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running ((ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-PreFinalize_29))+(ref_PCollection_PCollection_18/Write) WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: -*-of-%(num_shards)05d INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-FinalizeWrite_30) INFO:apache_beam.io.filebasedsink:Starting finalize_write threads with num_shards: 1 (skipped: 0), batches: 1, num_threads: 1 INFO:apache_beam.io.filebasedsink:Renamed 1 shards in 0.02 seconds. Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0. You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins. See https://docs.gradle.org/7.3.2/userguide/command_line_interface.html#sec:command_line_warnings BUILD SUCCESSFUL in 1m 14s 14 actionable tasks: 4 executed, 10 up-to-date ❯ ./gradlew :sdks:python:wordCount -PpythonVersion=3.6 Configuration on demand is an incubating feature. > Task :sdks:python:setupVirtualenv > Task :sdks:python:installGcpTest > Task :sdks:python:wordCount INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds. INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds. INFO:oauth2client.transport:Attempting refresh to obtain initial access_token INFO:oauth2client.client:Refreshing access_token WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter. INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.37.0.dev INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function annotate_downstream_side_inputs at 0x124afa9d0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function fix_side_input_pcoll_coders at 0x124afaaf0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function pack_combiners at 0x124afb040> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function lift_combiners at 0x124afb0d0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function expand_sdf at 0x124afb280> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function expand_gbk at 0x124afb310> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function sink_flattens at 0x124afb430> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function greedily_fuse at 0x124afb4c0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function read_to_impulse at 0x124afb550> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function impulse_to_input at 0x124afb5e0> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function sort_stages at 0x124afb820> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function setup_timer_mapping at 0x124afb790> ==================== INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function populate_data_channel_coders at 0x124afb8b0> ==================== INFO:apache_beam.runners.worker.statecache:Creating state cache with size 100 INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler object at 0x124bd6f70> for environment ref_Environment_default_environment_1 (beam:env:embedded_python:v1, b'') INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (((((ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Impulse_19)+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-FlatMap-lambda-at-core-py-3228-_20))+(ref_AppliedPTransform_Write-Write-WriteImpl-DoOnce-Map-decode-_22))+(ref_AppliedPTransform_Write-Write-WriteImpl-InitializeWrite_23))+(ref_PCollection_PCollection_11/Write))+(ref_PCollection_PCollection_12/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running ((((ref_AppliedPTransform_Read-Read-Impulse_4)+(ref_AppliedPTransform_Read-Read-Map-lambda-at-iobase-py-898-_5))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/PairWithRestriction))+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction))+(ref_PCollection_PCollection_2_split/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (((((ref_PCollection_PCollection_2_split/Read)+(Read/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/Process))+(ref_AppliedPTransform_Split_8))+(ref_AppliedPTransform_PairWIthOne_9))+(GroupAndSum/Precombine))+(GroupAndSum/Group/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (((((((GroupAndSum/Group/Read)+(GroupAndSum/Merge))+(GroupAndSum/ExtractOutputs))+(ref_AppliedPTransform_Format_14))+(ref_AppliedPTransform_Write-Write-WriteImpl-WindowInto-WindowIntoFn-_24))+(ref_AppliedPTransform_Write-Write-WriteImpl-WriteBundles_25))+(ref_AppliedPTransform_Write-Write-WriteImpl-Pair_26))+(Write/Write/WriteImpl/GroupByKey/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running ((Write/Write/WriteImpl/GroupByKey/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-Extract_28))+(ref_PCollection_PCollection_17/Write) INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running ((ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-PreFinalize_29))+(ref_PCollection_PCollection_18/Write) WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: -*-of-%(num_shards)05d INFO:apache_beam.runners.portability.fn_api_runner.fn_runner:Running (ref_PCollection_PCollection_11/Read)+(ref_AppliedPTransform_Write-Write-WriteImpl-FinalizeWrite_30) INFO:apache_beam.io.filebasedsink:Starting finalize_write threads with num_shards: 1 (skipped: 0), batches: 1, num_threads: 1 INFO:apache_beam.io.filebasedsink:Renamed 1 shards in 0.02 seconds. Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0. You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins. See https://docs.gradle.org/7.3.2/userguide/command_line_interface.html#sec:command_line_warnings BUILD SUCCESSFUL in 1m 8s 14 actionable tasks: 3 executed, 11 up-to-date
Note that the second Gradle command specified Python 3.6 but the executed test adopted Python 3.8. The first Python version used right after the clean task fixes the virtualenv Python version. Any tasks thereafter based on the same project path will use the first Python version as shown above.
Affected Tests
We have Python test suites that run against multiple Python versions. Luckily, most of them have Python versions as a part of their project paths e.g. :sdks:python:test-suites:dataflow:py38:setupVirtualenv. For automated Jenkins tests, we also utilize tasks created for each Python versions. The only exception is cross-language tests which use for-each loop to run the multiple test for each target Python versions. In summary:
- Jenkins Python tests are not affected. In other words, we have a good coverage for multiple Python versions.
- Cross-language VR tests are affected. It means that we missed the test coverage of the second Python version, namely Python 3.8
- Any tests executed directly from the command-line are error-prone since -PpythonVersion flag only works for the first task after the clean task
Solution
venv module supports --clear option which removes any existing virtualenv directory before initializing a new one.
Attachments
Issue Links
- links to