The spark code does not provide the unique values.

…00)
    polarsTimeSpent = float((polarsEndEpoch-polarsStartEpoch)/1000)
    return polarsTimeSpent

def sparkTest(testDf):
    sparkStartEpoch = int(time.time()*1000)
    idListSpark = testDf.select("id").rdd.flatMap(lambda x: x).collect()
    for n in range(0,calcN):
        groupedSparkDf = testDf.groupBy("id").sum()
    sparkEndEpoch = int(time.time()*1000)
    s…
Pandas, Spark, and Polars — When To Use Which?
449
11
Martin Karlsson
Ashraf Miah
·Follow
Jul 10, 2023
--
The spark code does not provide the unique values. Not sure why you don't do something similar to this:

```
df.select('col_name').distinct().show()
```

The variable `calcN` isn't explained either. Seems overall like a flawed comparison.
--
--
Written by Ashraf Miah119 Followers
·29 Following
CTO, Data Scientist & Chartered Engineer (MEng CEng EUR ING MRAeS) with over 20 years experience in the Aerospace, Rail & Energy Industry.
Responses (1)
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams