Mady
Mady•9mo ago

reading from csv pulls in different number of rows

Another weird issue I'm seeing in a WASM notebook - I'm importing data from a web-hosted CSV. I know the data is currently static. However, I could click the button to refresh 3 times in a 10 second window and it's pulling in different numbers of rows each time. It also intermittently throws an import error. This behavior seems wrong because I'm not changing the import portion of the script and the data itself is not changing either?
6 Replies
Myles Scolnick
Myles Scolnick•9mo ago
is it possible to share the code to repro this? this does seem like an odd bug
Mady
MadyOP•9mo ago
marimo | a next-generation Python notebook
Explore data and build apps seamlessly with marimo, a next-generation Python notebook.
Mady
MadyOP•9mo ago
it does sometimes produce the same result a few times in a row but it always eventually changes I'm actually seeing a pattern right now - it pulls in 384 rows on the first run (correct number), then 242, then errors
Myles Scolnick
Myles Scolnick•9mo ago
i am getting random rows too (not sure if i see a pattern yet) i can see the network requests returning the correct number of rows, but the chunking as it parses may be completing too early this might be a race condition between the request finish loading and pandas reading we actually patched the pandas.read_csv, so you can now do:
df = pd.read_csv(
'https://api.scout.kennan.tech/dump/2024inmis/CSV/'
)
df = pd.read_csv(
'https://api.scout.kennan.tech/dump/2024inmis/CSV/'
)
this also seems to fix the issue
Mady
MadyOP•9mo ago
that is way better! thank you @Myles Scolnick 🙂
Myles Scolnick
Myles Scolnick•9mo ago
of course! thanks for pointing out the issue and glad we patched this just now