Sunday, June 18, 2023

Download Data in Parquet Format

Here you can see how to download a file from Yahoo Finance and save it both in csv and Parquet format. Note that Parquet efficiently compress data to about 60% of its original size. #Python #pythonprogramming


This is also a very good article about How To Efficiently Write Data To Parquet Format.

4 Ways to Write Data To Parquet With Python: A Comparison


@QuantScraper

Lies and Statistics

 “There are three types of lies: lies, damn lies, and statistics…” –Benjamin Disraeli (1804–1881), Prime Minister of Great Britain (1874–1880) #quoteoftheday #quotes #InvestingQuotes @QuantScraper

Thursday, June 15, 2023

Python: Parquet - optimized for big data processing

 In Python, Parquet is a columnar storage file format that is designed for efficient data storage and processing. It is optimized for use with big data processing frameworks, such as Apache Hadoop and Apache Spark, but can also be used in standalone Python applications.

The Parquet format offers several advantages over traditional row-based file formats, such as CSV or JSON, especially when working with large datasets:

  1. Columnar storage: Parquet stores data column-wise rather than row-wise. This columnar organization allows for more efficient compression and encoding, as similar data values are stored together, reducing storage space and improving query performance.

  2. Compression: Parquet supports various compression algorithms, such as Snappy, Gzip, and LZO. Compression helps to reduce the size of the data files, resulting in faster I/O operations and lower storage requirements.

  3. Predicate pushdown: Parquet supports predicate pushdown, which means that when executing queries, it can skip reading entire columns or row groups based on the query predicates. This capability improves query performance by minimizing disk I/O.

  4. Schema evolution: Parquet files can handle schema evolution, allowing for flexibility in adding, modifying, or deleting columns from the dataset without the need to rewrite the entire dataset.

To work with Parquet files in Python, you can use libraries like pyarrow or pandas that provide convenient APIs for reading and writing Parquet data. These libraries offer methods for converting data between Parquet files and other data structures like DataFrames, enabling seamless integration with existing Python data processing workflows.

@QuantScraper

#Python #pythonprogramming





Sunday, June 4, 2023

Python: CPU usage - psutil Library

A working example that compares two dataframes and measures the CPU usage during the comparison:



In this example, we have two example dataframes df and dg. The compare_dataframes function compares the dataframes by using the equals method. You can modify the comparison logic based on your specific requirements.

Before performing the comparison, the initial CPU usage is obtained using psutil.cpu_percent(). After the comparison, the final CPU usage is obtained, and the difference in CPU usage is calculated.

Finally, the result, indicating whether the dataframes are equal, and the CPU usage difference, is printed.

Please note that the CPU usage can vary depending on the specific system specifications and the complexity of the dataframe operations being performed. This example provides a basic approach to measure CPU usage during dataframe comparison, but you may need to adjust it according to your specific use case and requirements.

Following is the result in this case. df was about 3.2Mb and dg which was using specific datatypes was about 1.5Mb.


Dataframes are equal: False
CPU Usage: -1.5%

psutil (python system and process utilities) is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python. It is useful mainly for system monitoringprofilinglimiting process resources and the management of running processes.

https://psutil.readthedocs.io/en/latest/




Saturday, June 3, 2023

S&P: 20-Day High Not So Bulish

 Historical data reveals that a 20-Day High hasn't translated into significant bullishness for the Emini S&P within 1-5 days over the past 3 years. The Figure provides a visual representation. #ES #ES_F #SP500 $ES $SPY $SPX #NQ #QQQ #NQ_F #ZB_F #GC_F #CL_F #eurusd $EURUSD


@QuantScraper

Yahoo Finance Futures Contracts Historical Data

Futures data downloaded from yahoo finance are not adjusted as continuous contracts. When you download futures data from Yahoo Finance or ma...