It seem like @ZeroDark30 suspicion is correct I just done some sweeping on my duplicated files and found that most of it contain something similar to the Cyrillic characters that @ZeroDark30 had. I had some Japanese video file with Japanese text in it and found out that the duplicated one have word ...