Monday, May 15, 2023

Python: datatable

"Datatable is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame; however we put specific emphasis on speed and big data support. As the name suggests, the package is closely related to R’s data.table and attempts to mimic its core algorithms and API."

See https://github.com/h2oai/datatable for more details.

Link to website:

datatable 1.0.0



Here is a code snippet often used to load large .CSV files and work in Pandas:

import datatable as dt

df = dt.fread("my_large_file.csv").to_pandas()


5 Signs You’ve Become an Advanced Pandas User Without Even Realizing It



Find below an example of difference in execution time to read a .CSV file using pd.read

and the datatable library:










The datatable library is known for its speed and efficiency in handling large datasets. It provides an alternative to pandas for manipulating tabular data structures while emphasizing speed and big data support. The code snippet provided demonstrates how to load a large CSV file using the datatable library and convert it to a pandas DataFrame.

The advantage of using the datatable library is its optimized algorithms and efficient memory usage, which can significantly reduce the execution time when working with large datasets compared to pandas. This can be particularly beneficial when dealing with big data scenarios where memory constraints and processing speed are crucial.

In the example, the execution time difference between pd.read (pandas) and datatable can be quite remarkable, showcasing the performance advantage of the datatable library for large dataset operations.

It's worth noting that while the datatable library shares similarities with pandas and R's data.table, it may have some differences in terms of syntax and functionality. Therefore, it's important to consult the official documentation and familiarize yourself with the library's specific API when using datatable in your projects.

If you're working with large datasets and performance is a critical factor, exploring the datatable library can be a valuable option. It's recommended to refer to the official documentation and examples provided by the datatable project to leverage its full capabilities.

No comments:

Post a Comment

Yahoo Finance Futures Contracts Historical Data

Futures data downloaded from yahoo finance are not adjusted as continuous contracts. When you download futures data from Yahoo Finance or ma...