Tips for CP - Gaurang Tandon
Tips are separated by categories
Interactive problems
Checking interactive problems
There is already a script provided (by Google here). Usage instructions are inside the script. Of course, though, you would have to write the judge script too when using this, not just the solution.
To be fair, in general it is far easier for you to “be the judge yourself” and simulate the judging process, rather than writing out the judge from scratch. Most of the time you can find the mistake in your code that way.
Flushing methods
cout.flush()
in C++fflush(stdout)
in C.
Compilation tricks
Fixing undefined behavior
Using this compilation command (note the flags): g++ -fsanitize=address -fsanitize=undefined -D_GLIBCXX_DEBUG file.cpp
, you can catch out-of-bounds index accesses at runtime. So, you will get an error for them and they will not pass undetected.
For example, try catching the error in this code with and without the above compilation flags, and you’ll see the difference yourself :)
Precompiled header
To enable 2x faster compilation, setup a precompiled header in your bits
directory. In Ubuntu, it is usually located at /usr/include/x86_64-linux-gnu/c++/7/bits
. cd
into it and simply compile the stdc++.h file (g++ stdc++.h
). You should now have a stdc++.h.gch
file. Enjoy! (more info)
Do not use O2 flag
Don’t use the -O2
flag for compilation by default. It takes more time to compile and it enables several optimizations, but it demolishes the print <local_variable>
output that is otherwise available in GDB. Interestingly, codeforces.com already compiles the files with -O2
flag.
To enable O2 anyway, since you don’t get the command line for yourself, you can use pragma directives. The lines to stick at the top are:
#pragma comment(linker, "/stack:200000000")
#pragma GCC optimize ("Ofast")
#pragma GCC target ("sse,sse2,sse3,ssse3,sse4,popcnt,abm,mmx,avx,tune=native")
Read more compilation options on this page. How do the first and last line actually work?
Fixing Time Limit Exceeded
Using int throughout
In several questions where time limit is tight and you need to perform several modulo operations, using long long
datatype can increase your time and memory taken by factor of 2. Most questions can be solved with int type in general. Here is one code for example, that efficiently mixes int
and long long
datatype.
Digit segment tree
Converting a digit segment tree into actual parts (int tree[len][dig_count]
) reduces time taken significantly.
Query function of segment tree
Avoid returning more than a pair in query function of segment tree. Even sorting a vector of length 4 can result in tle! (tl code, ac)
cerr
Even a very innocent looking cerr
can cause your code to TLE. Here is AC and TLE code. The only difference in TLE code is that I’m printing two integers to cerr
. So, always remove all your cerr
statements.
Other miscellaneous problems
- Sorting a very large vector of pairs using a custom comparator function can cause TLE. Example link
- Using
long long
instead ofint
can cause TLE. - Using a set of strings (
set<string>
) can give TLE in a lot of cases. Perform lots of optimizations in the code if you plan to use such a set. For example, an n^2 loop should be done in $n^2/2$ instead. Try to reduce number of other things being used, basically, think for two minutes and make it as optimized as possible. There is a very high possibility you will get TLE if you do not do this (TLE and AC).- Another example for $n^2$ TLE is 270807A.
- Using binary search with sets results in a $\log^2n$ factor and can lead to TLE. Example submission. Use two pointer approach to AC such questions wherever possible (submission)
General coding
Use of auto-keyword
Instead of set<pair<int, pair<int, int>>>::iterator
you can simply do auto it = s.begin()
, similarly, instead of vector<int> row = grid[index]
you can simply do auto row = grid[index]
Local setup code
To exclude some code from executing in online judge (C++), you can use this:
#ifndef ONLINE_JUDGE
//code that should not execute in ONLINE JUDGE
#else
//code that should execute in ONLINE JUDGE
#endif
Uninitialized values inside struct array
Note that values of arrays inside struct are not automatically initialized. So, (struct)->array[i]
is null. Hence, need to manually iniitialize each index of an array property. Else you get run error, like here.
Nonsensical output/Unexpected runtime error
If you get a run error in a line where you shouldn’t, especially free()
related errors, or you are seeing some nonsensical outputs, that means you have undefined behavior somewhere above in your file.
Compile with fsanitize debugging flags to find the problem. See the section on compilation.
- Instead of a binary search tree, if all insertions before all the queries, then please use 1d vector with binary search.
- If stuck and have a brute force, perform complexity analysis and it might even pass. (255C as well as 1229C)
- If you have range queries and point updates, but the updates are offline (all queries occur after all updates), then consider using the “library question” approach (checkposts waala).
Causes for RE
- Accessing an element out of range or an invalidated iterator.
- Popping/accessing elements from empty container.
- Miswritten sorting function (should return true iff a < b)
lower_bound
called on a container which was sorted with a different criterion. For example, container is sorted byPII.second
but yourlower_bound
comparator searches byPII.first
.
Causes for MLE
- A 10^7 vector of long long ints takes roughly 400MB. So, consider using int instead, which takes 200MB.
- Inserting strings into set may give memory limit even if you use
int
and $n^2$ complexity is permitted. (link)
IO
- When using
scanf
, always use%llu
or%lld
to take integer inputs. On codeforces custom invocation you can check that using%lf
or%f
to take integer inputs assigns it a value 0.000000. - Prefer
cin
/cout
as far as possible overscanf
/printf
. With sync with stdio disabled, they’re as fast as or even faster than the C analogs. Also, taking string input with cin is super easy. -
Do not mix
cin
/cout
withscanf
/printf
/getchar
when sync with stdio is disabled. The latter literally meanscin
/cout
(which are part ofiostream
) will NOT be in sync with the others (which are incstdio
).For example, the following code may result in unexpected behavior:
scanf("%d", &a); cin >> b; cout << b; printf("%d", a);
Using vectors
- Always pass large vectors, with roughly $10^6$ elements, to methods in reference form (using the ampersand argument format).
-
Always initialize very large vectors outside of your main loop. For example, the time complexity of the following program is
O(n * k)
, which will TLE if n and k are both very large:for(int i = 0; i < n; i++){ vu freq(k + 1, 0); }
Instead of this, consider doing this:
vu freq(k + 1, 0); for(int i = 0; i < n; i++){ // keep track of which values are // changed from 0 set<lu> usedValues; // every time we use a value // we insert it usedValues.insert(<any used value>); // reset used values for(auto x : usedValues) freq[x] = 0; }
The same problem plagues array initialization (
int arr[k] = {0};
) and cannot be avoided by doingmemset(arr, k, 0);
as the latter also takesO(k)
. vector<vector<int>> v(n,vector<int>(2))
requires 4x more memory thanvector<pair<int,int>> v(n)
. See testing script. This can cost you a TLE vs AC. Discord thread
memset
s
memset
is a string function, so it fills values byte-by-byte. It won’t work for integer arrays! (unless you set to 0) So, a simple code like following:
#include <bits/stdc++.h>
using namespace std;
int main() {
const int sz = 1;
int nums[sz];
memset(nums, 1, sizeof(nums));
cout << nums[sz - 1] << endl;
return 0;
}
produces 16843009
, which is 0x01010101
, basically, the single byte value 0x01
copied four times over. Note that this is the same in both memset
from C and from C++, they both are string functions only.
Fix: use std::fill(dest, dest + n, val)
.
Multisets
multiset.erase(value)
erases all copies ofvalue
inmultiset
. To delete just one value usemultiset.erase(multiset.find(value))
(but first ensure the value exists in multiset).
Integers
- Maximizing greedy consumptions occurs best in greedy space, consider modular groups always. (see 1244C - number theory solution)
- Never print
!integer
directly in codeforces, instead of printing 0 or 1, have faced WA, instead first assign to boolean and then apply ternary like this AC
Floats
- Use epsilon when comparing floating point values, even when with ints. Do not use anything less than 1e-6 unless absolutely sure. See the problem https://codeforces.com/group/K3Zd1r0QSA/contest/102343/submission/61050721 and previous two submissions.
- Be careful in Python when using division and modulo together. Division
/
produces float values and modulo on float values gives very incorrect results. Always use//
if using modulo (and cast 1e10 usingint
.)
Strings
- Always check size of string before applying integer parsing functions. For example, don’t pass strings of length greater than 18 to
stoull
. - Use
<
or>
when attempting lexicographical comparison. It is much better than writing a normal for loop for the same. For example,"abc" > "def"
isfalse
. - If $n^2$ complexity is permitted, consider using tries instead of hashing. Hashing can give wrong answer.
Math
std::pow
is a constant time operation. See SO and code submission that passes on Codeforces.- For taking modulo, the following works for both negative or positive integers:
((value % MOD) + MOD) % MOD
. Simply usingvalue % MOD
produces negative remainder for negative integers and may not be desired. - If mod value
= 1ll << 30
, (or any power of 2), then taking mod is not necessary as it will overflow on its own and adjust accordingly. (use unsigned int if necessary). In fact, doing(1ll * a * b) % mod
will slow your code down. - Never print
pow(a,b)
directly. Always convert to integer first. This is becausepow
returns double type which prints in a different format (1e+6.0
) compared to integers (1000000
). - Do not use
log10(num) / log10(base) + 1
to find number of digits in number in base b. It can give a wrong answer, use the trivial while loop instead. Example WA, AC
Square roots of very large numbers
When taking sqrt of very large numbers (order $10^16$), you can get erroneous with sqrt
or log10
style functions. Always verify that if (int) x = sqrt(N)
, then x * x <= N
, because, of these things: sqrt((ll) (1e16-1)) = 1e8
(can check it’s true) In similar vein, x = log10(999999999999996448.0l)
means x = 18
, so you can double check with pow(10, x)
to be sure the output was incorrect.
General algo
- Stackoverflow: The recursion stack is limited to around 350,000 on an average machine. This solution won’t TLE, but will give a memory limit exceeded error. You’ll get an error similar to
SIGSEGV: Invalid memory reference
, and in gdb, you will get the line number as the start of the method header. In general, avoid using recursion for such large cases, use iteration instead. - When value of
n
is very large, try to come up with some patterns for lower values ofn
using brute force or dp. Like this question can be solved in constant time, however, the formula is not immediately obvious. Use DP to observe a pattern. See my A.cpp submission.
Bit manipulation
1 << x
can overflow ifx
is greater than 31. Use1ll << 31
instead.
Sieve
-
Do not create a sieve unless you want to check the primality of a number more than once. Besides that, decide at all if you actually require the number to be prime in your code. Refer to bad solution, best solution In most cases you can get away with a simple square root based for loop, like so:
for(int i = 2; 1ll * i * i <= x;){ if(x % i == 0){ x /= i; } else i++; } }
since at each step the value of $x$ reduces, in the worst case the complexity is $\sqrt{x}$ when $x$ is prime.
Flows
- Practically, all flow algos (dinics, push relabel, or even edmonds-karp) have complexity
= min( (V + E) * mf, ...)
(mf = max flow). - In practice, even if N=800 and M=1e4, Dinics or even EK might work, so do not hesitate in coding them up. (The problem might have weak testcases, or see the previous point, it is possible that their max flow bound is lower)
-
CLRS question 26.2-13 is nice. “Given graph G, propose modification so that among all current mincuts, in the new graph G’, any mincut will also have least number of edges”. Solution. Short answer: scale all capacities by 2 E and add one.
Graphs
- Kosaraju’s algo works better with stacks than with toposort. See failed toposort and accepted stacks solution.
- When there are only two paths in a grid possible (up and right OR down and right) and so on, consider using a nested for loop instead of bfs type queue to iterate over all elements in order. See, example submission
Gaussian elimination
It has two techniques: full pivoting and partial pivoting. It is known that full pivoting is slower, although guarantees stability in output (see discussion here). Stability means that the error term in the output will not grow too large as compared to correct output. There exists certain matrices - with negative terms in their lower triangle - that fail stability when computed with partial pivoting.
Blog on using gaussian to solve for probabilities and xor values.
Randomization tasks
Use this line: mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
. Have previously got WA when using rand()
or (1ll * rand() * rand() % n + n) % n + 1
(WA, AC)
Specific kinds of problems
Problems where the complement graph is inputted
1242B, 190E
Approaches:
- DFS using a set that keeps self-deleting: this TLEd on 190E so go with DSU approach
- DSU (I didn’t understand this approach :/ explained in editorial of 1242B)
Problems with map
giving TLE
Using unordered_map
instead will usually NOT fix your problem. Instead, focus on reworking your logic a bit so that you can get an AC. See problem 102411M (AC https://codeforces.com/gym/102411/submission/66275452). Also, see https://codeforces.com/blog/entry/60737 but you probably won’t ever need it.
Ordered set
Multiset with ordered set pbds does not work even if you set comparator to less_equal
(specifically, delete or find don’t work). See Anurudh’s explanation here
SPOJ TIPS
- If problem says “Round answer to 6 decimal digits”, it actually wants you to print exactly six decimal digits and not more! So, for your final answer, do
ans = std::round(1e6 * ans) / 1e6
and usesetprecision(6)
. - SPOJ always runs on all tests first and only then gives the verdict. So, it may be possible that you see “running (12)” even though your code failed on first test itself.
- Python 2 is on SPOJ, and it uses
raw_input
instead ofinput
.