MapReduce on one local server?

Discussion in 'Data Sets and Feeds' started by blueraincap, May 1, 2022.

  1. I am storing all historical data using MySQL on my PC (Dell XPS) and analyzing them using some Python scripts on the same PC.

    I never gave MapReduce (Hadoop) any thought as I have the impression that it is for data spread across multiple computers for use by multiple users. MySQL is working so far, but as I try to learn NoSQL, specially HBase, they seem to relate to Big Data and Distributed Processing so Hadoop constantly comes up in my reading.

    For an independent trader using one PC for both database/analysis/trading, is there anything to gain from a distributed architecture?
     
  2. ET180

    ET180

    I just write my data to single files. One file per symbol. One file per expiration for futures. I never saw the need to introduce the complexity of a SQL database.
     
    dholliday, rb7 and blueraincap like this.
  3. Speed. But only if you need it. How much data are you storing? Daily OHLCV on all exchange traded US stocks? 1-min bars? All trade data and conditions?

    Store only what you use. There are a few hundred million trades each day on the exchanges just for US stocks. If you tried to store that you'd be over a trillion records after a few years worth of data. That'll slow down your desktop PC analysis.
     
    blueraincap likes this.
  4. Hadoop is supposed to take advantage of a cluster of multiple computers to achieve a task, if you are running it on a single computer you are probably better just using a proper language in order to use parallel programming so you can take advantage of all your cores. Python is a toy compared to c++ or c# so I would start changing to a language that won't let you down.

    Once you switch to a better language you won't need that database probably, csv local files do just fine most of the time.
     
    stochastix and blueraincap like this.
  5. That's right, I am just a typical dude running typical statistical analysis on 1-min stock data, so see no benefits to complicate things using Hadoop. I am using MySQL to organize data, but am looking into HBase, OpenTSDB/InfluxDB to learn about time series DB in general. Every single book or blog I read on HBase mentions Hadoop and distributed computing here and there as if they are inseparable.
     
  6. 2rosy

    2rosy

    hadoop? it was popular around 2010. For an independent trader you dont need anything more a simple language or even excel and data
     
  7. Zwaen

    Zwaen

    Familiar setup :)
    Dell xps, use Python for option data analysis. I think it all depends on your goal. I use options ( calenders & combinations ) for optimization of drawdown/profit of (simple) strategies. Main analysis is still in excel, which gives more ‘feel’ for the data, but since options data can grow to large files I use Phyton for etl/simple data manipulation.

    What is it what you are searching for?