Archive for the ‘C++ tips’ Category

Well back to blogging after a real long time. Last few months has been challenging for me and left me with no time to blog especially after 18 to 20 hours of work schedule with little or almost no sleep. Well now that the pressure is a bit off, I am back to my code cafe with lots of learning from my last 4 months experience, which I will share slowly and steadily so that I don’t miss out on anything.

In this series I would start off with the most challenging problem I faced and it was related to a character set issue. Today there are lots of talk about Unicode and multi-byte character sets. No doubt they are wonderful and help in broadening the audience scope, however sometimes while working in heterogeneous environment it becomes more easier to stick to single byte character sets. In my case the backend was mainframe, middleware was linux and the client was either unix, windows or AS400.

Problems usually arise when dealing with binary data. The binary data contains characters which are higher in value and represent a multi byte character in Linux or any other Unicode system. The problem that I faced was my middleware server and client was completely dealing with bytes and not characters or strings. So when I received a multi-byte character from backend and was transmitted to the client it was transmitted as 2 bytes which I didn’t want. On the client end I was dealing with a fixed length response file. Any increase in the bytes than expected would give me surprising results and this is what happened. The multi-byte characters behaved as two different bytes when looked from the byte point of view. The response file that got generated shifted data to the right and I encountered data loss on Windows response file and more dangerous results in AS400 response file.

The fix:

The real fix was to prevent Linux from converting a high value character into a multi-byte system and rather treat it as a low value single byte character system. If you are working on a Linux system setting your environment variable LANG to en_US instead of the default en_US.UTF-8 helps fix the problem.

just try:

export LANG=en_US (on Red Hat or Fedora Systems)

However if for some reason you can-not set the environment variable then you will have to do a bit of math here to get your multi-byte character in to a single byte character. If you are dealing with strings, break the string in to individual character (each character can be stored in multiple bytes), then treat them as integers. Any high value character will have an integer value less than 0. For such characters add 256 to the integer value and use the new character. The final character array you would recieve will be in a single byte system.

The only way to verify the integrity of your string is use the hex value of it to verify. Before you convert your string into a single byte character encoding, grab hex value of the string. Then after conversion, grab the hex value again. They both should match.

I hope this information helps others who are working in a hetergenous environment and prevents them from putting the amount of time and energy that I had put to resolve my problem.

Another tip for optimization if you are working with C++. Hot/Cold splitting of structures is about splitting heavily accessed ( hot ) portions of data structure from rarely accessed cold portions. An example of the concept is as follows :-

Assume you have a class C, which is 400 bytes. 300 of those 400 bytes are needed only to contruct an exception object if C::foo() throws. Is it smart to keep these 300 bytes directly as members?

An alternative would be to put those 300 bytes in an C::C_exceptiondata class, and put a C_exceptiondata * in C. The total memory usage would increase ( by sizeof(C_exceptiondata*) ), but the C_exceptiondata objects may very well end up all swapped out, and certainly would not disrupt the CPU cache. The cache would only load the remaining 100 bytes.

The 300 bytes that are rarely used are called “cold”.