On Clean Data
For those who are asking for historical data to test their strategies with, here’s a free source (God bless the person who has taken the effort to download data from broker and update daily) for 1-min historical data going back to 2008.
Now, this isn’t clean data. You can’t expect 1-min data you get for free to be completely clean and exactly accurate. But this is good enough for any beginner or even an intermediate person to test their strategies with. And since most retail traders trade only TF > 3-5m, this data should be good enough for you to test your strategies with. Also, you don’t need any gimmicks or paid data until you truly have a system in place - well backtested with this free data and you see enough proof in forward trading (paper trading) that it actually works.
TrueData, Globaldatafeeds and other such data vendors charge over 1L for this kind of data. And such data could be erroneous too - but less so than this free data as they are registered data vendors of NSE and get their data directly from NSE and other exchanges.
I don’t know if there could be the cleanest data without any errors at all. There are companies such as Thomson Reuters/Refinitiv, Bloomberg, Factset,etc., which are all industry leaders in data solutions. You could eventually get your data from them too. But all that is largely unnecessary for a beginner.
First things first, don’t pay any money to anyone. You don’t need any secrets or even any quant’s help. Just use this data, use excel, and test basic strategies that you can find online and learn about backtesting thoroughly all the while watching how the strategies perform.
Daily data you can download from QUANDL - that should be mostly clean and free of errors coz it’s the data exchange publishes. You can write scripts to scrape bhavcopies out of NSE website and put daily data together yourself. You can use NSEPY library with Python to do this.
Intraday wise, if you’re trading in a 3-min or 5-min or 10-min timeframe or above, you don’t worry about the cleanliness of the data first. Focus on getting a strategy with a decent profitable expectancy - and given this free data available is obtained from Zerodha PI i suppose, it is decently okay for you to test your strategies. So do that first, then assign certain error % and be very generous with that error assumption - and if your strategy still stands, you got something to trade with.
Then, once you have something tangible that’s tested on this free data, that looks attractive enough to trade even after assuming a healthy error percentage, then you buy clean data and test it on that, before you start trading your strategy live.