Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Hi
It would be desirable to have the ability to obtain a data frame with the unique combinations, say
open_dataset("sitc-rev2/parquet/", partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>% select(Year, `Reporter ISO`) %>% filter(Year >= 1988 & Year <= 1994) %>% distinct() %>% collect()
However, in the current development version of the Arrow package (installed from GitHub), we get this error for the last expression
Error in UseMethod("distinct") : no applicable method for 'distinct' applied to an object of class "arrow_dplyr_query"
This works
reporters_1 <- open_dataset("sitc-rev2/parquet/", partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>% select(Year, `Reporter ISO`) %>% filter(Year >= 1988 & Year <= 1994) %>% collect() %>% distinct()
Attachments
Issue Links
- duplicates
-
ARROW-10415 [R] Support for dplyr::distinct()
- Resolved