The reason of all those questions is lack of understanding of multithreading and laws governing computing in general.
In this article we will try to address some of those misunderstandings and misconceptions. We assume that the reader already read Efficient use of multitreading from AmiBroker’s Users’ guide and is fully aware how work is distributed in many threads in the Analysis window. We also assume that the reader already read Peformance Tunning chapter of AmiBroker’s guide. These two parts of the manual explain fundamental concepts and are essential to understanding of what is written below.
Another fundamental reading is Amdahl’s Law article in the Wikipedia that explains theoretical speedup limit of any multi-threaded program. In short Amdahl’s law says that if 95% of your program runs in multiple threads and only 5% of it is serial (single-threaded), the maximum achievable speedup regardless of how many CPUs and how many cores you have is 20x (20 times).
Let us focus on Analysis window performance (Exploration/Scan/Backtesting/Optimization). Any operation in the Analysis window involves:
AmiBroker is highly parallel multithreading application, so most of steps are done in multiple threads. Specifically only first and last (1. and 5.) step is serial, the rest (2, 3, 4) is parallel. It is worth noting that steps 1-4 are done on every symbol, while step 5 is only done once for all symbols. In addition to that program spends some time handling the UI (things like updating UI controls like progress bar and reacting to your mouse / keyboard input) which is of course done in single (main) UI thread.
There is one exception, a special case: Individual optimization. In individual optimization step 1 is done only once (for one symbol), and all other steps 2-5 (so including last one) are done in multiple threads.
Now is where Amdahl’s law kicks in. By adding threads/cores/processors you can only decrease parallel parts (2..4 or 2..5) and ultimately you are limited by the speed of data access. You can’t backtest faster than you can read/prepare the data.
As for data access: the database is shared resource, no matter where it resides. If it resides on hard disk, it is single physical device that does not speed up with increasing number of CPUs. If it resides in RAM, it is still single physical RAM, that has limit on bandwidth and fixed latency regardless how many processors you throw to the mix. Even if it is in L3 (Level 3) cache on the processor, it is still single L3 cache shared by multiple cores. And it is worth nothing, that L3 cache even on most modern processors operate on half the speed of the core, so single core can actually saturate bandwidth of L3 cache if doing nothing but reading or writing large chunks of data from/to it. In many cases this means that processor must wait for memory, unless it is doing complex computations involving only minimum amount of data. These are for example real-world measurement results for triple channel RAM controller on Intel i7 920 CPU (measured using memtest86 program)
Data location | Bandwidth [MB/sec] |
L1 | 52408 |
L2 | 30722 |
L3 | 24521 |
RAM | 11879 |
Only L1 cache runs at full core speed. As you can see L3 cache has half the bandwidth and RAM has 1/4 of bandwidth of L1 cache. Of course disk speeds (even SSD) are far cry behind 11GB/sec offered by RAM.
In case of portfolio backtest: a final backtest phase (portfolio backtesting) is one per backtest, done once for all symbols, so naturally it is done in single thread (as opposed to first phase that is done on every symbol in parallel).
Now, knowing this all you may wonder how to use all that knowledge in practice.
For example it allows to understand the limits of achievable speed gains for given formula and plan your hardware purchases or find ways to improve run times.
As we learned from the above the only parts that can be speed-ed up by adding more cores are those that are run in parallel (multiple threads). In practice it means – your AFL formula code. What is more the more time is spent in parallel part the better it scales on multiple cores. This means that simple formulas DO NOT scale too well because they are too simple to put enough strain on CPU and are mainly memory (data access) bound. All your simple moving average cross overs are just too simple to keep CPU busy for longer time, especially when there is not too much data to process.
Let us take this trivial formula for example:
period = Optimize( "period", 10, 2, 102, 1 );
Buy = Cross( C, MA( C, period ) );
Sell = Cross( MA( C, period ), C )
and run Optimize->Individual Optimize on symbol that has 2000 quotes.
Now switch to “Info” tab in the Analysis window and you will see this output (this example comes from 4 core / 8 thread Intel i7), all times are in seconds:
Individual optimize started.
Completed in 0.49 seconds. Number of rows: 500
( Timings: data: 0.11, setup: 0.00, afl: 0.28, job: 2.97, lock: 0.00,
pbt: 0.00, UI thread: 0.11, worker threads: 3.26/3.26 )
So our 500 step optimization on 2000 quotes took less than half of the second. What you see there are some cryptic numbers that you might wonder what they mean. Here is the explanation (for the backtest/optimization)
a) data – time spent accessing/preparing the data
b) setup – time spent preparing AFL engine
c) afl – time spent executing your formula (first phase of backtest)
d) job – post processing (here signals are collected and trading simulation is performed in case of individual optimize)
e) lock – time spent waiting in critical section / lock accessing shared signal table
f) pbt – portfolio backtesting code (not used in individual optimization)
g) UI thread – time spent in UI thread in total (data + pbt + UI handling) – single threaded time
h) worker threads – time spent in worker (parallel) threads (setup+afl+job+lock) – multi-threaded time
Firstly it may look surprising that “worker threads” time is 3.26 which is way longer than entire optimization took (0.49 seconds). But this time is a SUM of times spent in all 8 threads. They ran in parallel. Each was running for (3.26/8 seconds = 0.4075 seconds), and with only one thread running it would take 3.26s. Now you suddenly realize the power of multi-threading!
So now it would seem that our formula run (0.11 + 3.26)/0.49 = 6.8 times faster than on single core.
You may ask why not 8x? We had 8 threads, didn’t we?
First reason is the Amdahl’s law – serial time (0.11sec) is constant and limits our speed up, no matter how many threads you would put on it, but there is something more.
Let us check how much time would it really take if we limited to one thread only. Try running with #pragma statement limiting number of threads:
#pragma maxthreads 1
period = Optimize( "period", 10, 2, 102, 1 );
Buy = Cross( C, MA( C, period ) );
Sell = Cross( MA( C, period ), C )
Suddenly the result is:
Individual optimize started.
Completed in 1.62 seconds. Number of rows: 500
( Timings: data: 0.07, setup: 0.00, afl: 0.10, job: 1.37, lock: 0.00,
pbt: 0.00, UI thread: 0.07, worker threads: 1.47/1.47 )
What? Entire optimization took just 1.62 seconds when run in single-thread which is just 3.3 times slower than multi-threaded, not 6.8x as we calculated earlier. Why worker thread is 1.47 ? It was 3.26? What happened?
There are couple of reasons for that:
a) Hyper-threading – as soon as you exceed CPU core count and start to rely on hyperthreading (running 2 threads on single core) you find out that hyperthreading does not deliver 2x performance. If your code is NOT doing complicated things like lots of trigonometric functions that put FPU busy or other number crunching, the hyperthreading will not give you 2x performance. On simple tasks it struggles to deliver +30%.
b) Turbo boost – modern CPUs have different settings for single-core turbo boost and multi-core turbo boost. The effect is that CPU can reach raise clock to 4GHz when running single-core only but limit to 3.5GHz when running multi-threaded code. This limits multi-threaded performance and speeds up single-thread apps
c) Concurrent L3 cache / RAM access – when multiple cores run the code accessing L3 cache / RAM they will fight for access slowing them down
The effect of all three factors is amplified by the fact that our formula is extremely simple and does NOT do any complex math, so it is basically data-bound. This is why single-core execution was not as bad as we expected.
But what would happen if we increase the number of bars (keeping formula the same)? Let us try with 12000 bars of data (6 times more data than previously):
8-threads:
Individual optimize started.
Completed in 1.61 seconds. Number of rows: 500
( Timings: data: 0.18, setup: 0.00, afl: 0.81, job: 11.57, lock: 0.00,
pbt: 0.00, UI thread: 0.19, worker threads: 12.38/12.38 )
1-thread:
Individual optimize started.
Completed in 6.90 seconds. Number of rows: 500
( Timings: data: 0.10, setup: 0.00, afl: 0.28, job: 6.48, lock: 0.00,
pbt: 0.00, UI thread: 0.10, worker threads: 6.76/6.76 )
First we observe that although we used 6x more data, the time in multi-threaded case has increased from 0.49 to 1.61 which is only 3.28x. Secondly we see that 8-threaded execution is now 6.90/1.61 = 4.29 times faster than single-threaded.
What happened that multi-threaded performance is now better and it scales better?
Simply – we loaded CPU with more work. That is general rule, the more work you place on the CPU, the more time is spent in parallel section and more gain you get from multi-threading.
So, what would happen if you put CPU to some really heavy-work. It is surprisingly difficult to put i7 CPU into such a hard work that it sits busy doing calculations and not doing too much memory access. You would really need to use functions that do heaps on calculations on very small chunks of data sitting in L1 cache all the time or use some transcendental math functions that require FPU to spend way more than single cycle to derive result. Let us try with combination of raising to power, decimal logarithm and arcus sine.
period = Optimize( "period", 10, 2, 501, 1 );
Buy = Cross( C, MA( C, period ) );
Sell = Cross( MA( C, period ), C );
// add some math to force i7 CPU to sweat a little bit
for( i = 0; i < 100; i++ ) x = acos( log( C ^ period ) )
Once you to run this you will see AmiBroker saturating your CPU (on my end it uses 99% of CPU) for the first time. The results are:
8 threads:
Individual optimize started.
Completed in 39.39 seconds. Number of rows: 500
( Timings: data: 0.14, setup: 0.00, afl: 302.73, job: 9.14, lock: 0.00,
pbt: 0.00, UI thread: 0.14, worker threads: 311.87/311.87 )
1 thread:
Individual optimize started.
Completed in 251.27 seconds. Number of rows: 500
( Timings: data: 0.12, setup: 0.00, afl: 243.92, job: 6.59, lock: 0.00,
pbt: 0.00, UI thread: 0.12, worker threads: 250.51/250.51 )
Now you can see that 8 threaded execution was (251.27/39.39) 6.38 times faster than single-threaded.
This is almost perfect scaling with hyperthreading – remember hyper-threaded thread is NOT fast as separate-core thread. To prove that we can run same code on 4 threads:
#pragma maxthreads 4
period = Optimize( "period", 10, 2, 501, 1 );
Buy = Cross( C, MA( C, period ) );
Sell = Cross( MA( C, period ), C );
// add some math to force i7 CPU to sweat a little bit
for( i = 0; i < 100; i++ ) x = acos( log( C ^ period ) )
With four threads we get:
Individual optimize started.
Completed in 64.63 seconds. Number of rows: 500
( Timings: data: 0.13, setup: 0.00, afl: 250.22, job: 6.91, lock: 0.00,
pbt: 0.00, UI thread: 0.13, worker threads: 257.12/257.12 )
So 4-thread performance was 251.27/64.63 = 3.89 faster than single-thread. And look at the “worker threads” time it is very close to single-thread time (250s vs 257s). This proves our point that except the effect of RAM and L3 congestion and slightly slower turbo boost speed, full-core threads scale perfectly as long as your formula puts them into some real work.
Note: in all those tests we did NOT include the impact of disk speed because we run single-symbol individual optimization which runs out of RAM.
Bottom line is: despite marketing hype buying 32 thread CPU does not buy you 32x performance. Real-world performance depends on many factors including formula complexity, whenever it is heavy on math or not, amount of data, RAM speed, on-chip cache sizes, turbo boost clocks differences between single-thread and multi-thread configurations and so on. The devil is in the details and there are no simple answers. I always say: do not assume. Assumptions are not facts. Unless you measure something you don’t know.
]]>Old format line
$FORMAT Date_DMY,Open,High,Low,Skip,Close,Volume
should be changed to:
$FORMAT Date_DMY,Open,High,Low,Close,Skip,Volume
A ready to use aqh.format file can be downloaded from here (right click and choose “Save target as…”). Place it into “Formats” subdirectory.
You may have noticed over last few days that on some symbols (stocks and ETF that pay dividends) close price is below open, high, low prices.
This is new Yahoo error due to the fact that they added “dividend adjustments” incorrectly
http://forum.amibroker.com/t/amiquote-3-15-is-coming/171/18
In the period May 17 – June 10, Yahoo was NOT adjusting for dividends so OHLC prices were adjusted for splits only.
Charts looked correct (albeit without dividend adjustments).
Then people sent complaints about missing dividend adjustments and Yahoo started adjusting for Dividends, but… only Close price, leaving High, Low, Open adjusted for Splits only.
Here is example of EWJ data downloaded from Yahoo:
02-08-2010, 39.000000, 39.360001, 38.880001, 9.840000, 35.239758, 4488800
The columns are as follows:
Date, AdjOpen, AdjHigh, AdjLow, RawClose, AdjCloseForSplitsAndDividends, Volume
As you can see adjusted for splits÷nds close price (35.239758) is lower than adjusted low (38.880001)
So here is where the mess is coming from.
Yahoo is adjusting High, Low, Open fields for splits but at the same time they are adjusting Close field for splits and dividends.
That is why you are getting Close field below OHL fields and strange looking charts.
Another example – this time straight from Yahoo web site.
What you get:
Clearly visible Adjusted Close is lower than Adjusted Low, because they subtract dividends from Close price, but don’t do that for High, Low, Open prices.
This needs to be fixed by Yahoo on their site. You need to complain to Yahoo to fix their mess.
There is already a thread that describes this data error on Yahoo web site https://forums.yahoo.net/t5/Yahoo-Finance-help/Historical-Data-Split-and-Dividend-Adjustment-Errors-in-EEM/td-p/279800 but apparently Yahoo does not see / understand that yet.
The are two possible ways to fix that on Yahoo side:
1. Yahoo needs to adjust Open, High, Low fields the same way as they are adjusting Close.
..or..
2. Yahoo needs to send RawOpen, RawHigh, RawLow instead of adjusted ones. Then adjusted OHL can be calculated on the fly by deriving split ratio from AdjClose/Close
Method 2 was used in the past (prior to May 17, 2017 changes). It is best solution as it gives access to both adjusted and unadjusted OHLC easily. But any method would do as long as OHLC fields are adjusted the same way (i.e. for both splits AND dividends)
]]>AmiQuote 3.21 has been announced here: http://www.amibroker.com/devlog/2017/07/11/amiquote-3-21-released/
AmiQuote 3.21 is not a standalone installer, therefore – it requires AmiBroker installed on your computer and the installation path should point to the folder where AmiBroker is installed.
To run the installer – use the link at:
http://www.amibroker.com/bin/aq3210.exe (119KB) – 32 bit version
http://www.amibroker.com/bin/aq3210x64.exe (143KB) – 64 bit version
Make sure to download correct version. If you have 32-bit AmiBroker, use 32-bit AmiQuote. If you have 64-bit AmiBroker, use 64-bit AmiQuote. If you have both, install both.
32-bit version of AmiBroker
The correct folder is automatically detected if AmiBroker is installed so you don’t need to change it. The default installation path is the following:
64-bit version of AmiBroker
The correct folder is automatically detected if AmiBroker is installed so you don’t need to change it. The default installation path is the following:
TROUBLESHOOTING:
Three AmiQuote functionalities are affected by recent Yahoo changes
You need to update to the most recent AmiQuote version announced here http://www.amibroker.com/devlog/ and available in the download section http://www.amibroker.com/download.html to be able to use Yahoo data again.
]]>Support response times to those different inquiries obviously vary a lot.
Technically we answer basic questions in 24 hours on week days (Monday-Friday).
Very simple questions get answered even in minutes if you happen to ask them when our support staff is in the office (we are in GMT+1 timezone). If you are in different time zone, we may be currently sleeping so you may need to wait for next day.
This response time applies to questions that are covered already in our Official Knowledge Base, Users’ Knowledge Base, Users’ Manual or internal documentation/resources. It is quite good idea to check those resources yourself as you are very likely to find the answer much quicker.
For more complex questions that need some formulas to be written or checked/verified the response time may be higher (48 hours), as long as this check can be done by our regular support staff.
Some complex issues/questions can not be solved/answered by support staff alone and then they are escalated to development. You need to keep in mind that development is 100% busy all the time, we are not sitting here doing nothing. It is all-day development job that is on-going and those complex support issues must wait in the queue. Also since some of issues/questions require lots of work (setting up environment to mimic customer’s setup, testing, single-step debugging sessions, going through millions of lines of code), it may easily take days or even weeks to complete. If development finds out that the issue is due to software problem, then the problem is either fixed at once or scheduled for fixing. This is a process. So please do not expect “next day response” for those kind of issues. You are also not going to get constant e-mails/updates like “we are working on it”, because we are always working on something in the queue. Please be patient, things are being worked on constantly.
]]>In order to register these programs, it is necessary to launch AmiQuote or AFL Wizard first.
AmiQuote can be launched e.g. from the Windows Start menu or by double-clicking on Quote (Quote.exe) program in AmiBroker/AmiQuote folder.
Once the program is running, we need to enter the unlock codes into Help->Register AmiQuote menu:
Then we can enter our name and the unlock code, then press Update button.
AFL Code Wizard can be launched from Analysis menu inside AmiBroker:
After the program is launched it’s necessary to select Help->Registration details item from the menu.
Then we can enter our name and the unlock code, then press Update button.
]]>When we want to call such function in our formula, we should add function definition into our code, so AmiBroker could identify and interpret custom keyword properly. Consequently, if we use the function in multiple chart panes, each of the formulas should contain the function definition first.
// custom function definition
function myMACD( array, fast, slow )
{
return EMA( array, fast ) - EMA( array, slow );
}
// use of function
Plot( myMACD( High, 12, 26 ), "Open MACD", colorRed )
Since we may potentially define a large group of our own functions, pasting the definitions manually may not be very convenient. To avoid that, we can use #include statement and group our definitions in a separate AFL file which will be called with a single statement from our main code.
To create such file we should do the following:
function myMACD( array, fast, slow )
{
return EMA( array, fast ) - EMA( array, slow );
// include our definitions
#include <myfunctions.afl>
// use of function
Plot( myMACD( High, 12, 26 ), "Open MACD", colorRed
We don’t have to specify the path, because we saved our formula in the folder, which is specified as a ‘default include path’ in Tools–>Preferences–>AFL:
In other cases we should provide full path to the file – #include is a pre-processor command, therefore this time we use single backslashes in the path:
#include “C:\Program Files\AmiBroker\AFL\common.afl”
More information about include command can be found at:
]]>AmiQuote can also communicate with AmiBroker using OLE automation and automatically import downloaded data into AmiBroker if Automatic Import option is selected:
AmiQuote will import data to the database, which is opened in AmiBroker at the time of import.
Additionally, if more than one instance of AmiBroker is opened at the same time with different databases loaded, then AQ will communicate with the instance that was launched first and will import data into the database opened in this instance of AmiBroker.
]]>IQFeed:
eSignal:
As it is explained in the users guide here: http://www.amibroker.com/guide/h_rtsource.html – although AmiBroker is able to handle more symbols in the database than the streaming limit, we should not really exceed the RT subscription limits in continuous screening during session hours.
This is because if we do otherwise and try to access more symbols than our subscription covers, then it would requires lengthy process that includes:
Then such process will be repeated for each new symbol that is included in screening. As a result, that might cause various problems with the data source not able to handle that many backfill requests in a short time, additionally data-vendors may be pro-actively protecting their servers from abusing the streaming limits this way.
Therefore – it is highly recommended to stay within the subscription limits for real-time operation and scanning to avoid problems.
]]>