Using signed and unsigned integers in C++

2019-10-22 00:00:00 +0000

I am going to start writing a bit more about C++ as it is the programming language I spend the most time with and I may have some insights that can help others. This post will cover when to use signed and unsigned integers, and why you mostly should use the later.

Notation

  • A number written $n_{d}$ is a number in decimal
  • A number written $n_{b}$ is a number in binary
  • A number with out any of the above is in decimal

What is signed and unsigned integers

So first, what is an integer? An integer is a natural number, and it can be both be negative or positive and negative. We distinguish between positive natural numbers ($N^{+}$) and negative natural numbers ($N^{-}$) by saying that a number $n \in N^{+}$ is any $n$ in the set $[0, \infty^{+}]$ and $n \in N^{-}$ if $n$ is in the set $[-1, \infty^{-1}]$. NOTE: In computer science we include $0$ in $N^{+}$ that is not always true in mathematics

However, in computers, we cannot represent all the numbers to $\infty$, but we can represent a subset of the numbers. In a computer, we use the bits to represent numbers by combining multiple bits into bytes and setting the individual bit to either $0$ or $1$. If we say that we have $8$ bits or $1$ byte, we have $2^{8} = 256$ possible combinations of bits of $1$’s and $0$’s. However, how do we know if a number is negative or positive? Well, enter two’s complement numbers. A number in two’s complement can be negative or positive, and most often, the left-most bit determines if a number is negative or positive, if it is $1$ it is negative. I will use the example from [1]. If we want to represent the number $-28_{d}$, in two’s complement, using $1$ byte, we do the following: 1) Write the number $28_{d}$ which is $00011100_{b}$ 2) Swap all $0$’s with $1$’s and vice versa, giving us $11100011_{b}$ 3) Add $1$ which gives us $11100100_{b}$ and this represents $-28_{d}$ in two’s complement.

Now the observant reader, would note that if we use the left most bit to represent a number is negative, means any $n >= 128$ cannot be expressed in two’s complement as $128_{d}$ in $1$ byte is represented as $10000000_{b}$. So that puts a limit to the number of numbers we can represent. We can represent any positive number $n^{+} < 128$ including 0. Now recall that our options of possible combinations in $256$ and $\frac{256}{2} = 128$, that our $N^{+}$ using $1$ byte for number representation is the set$[0, (\frac{256}{2} - 1)]$. That leaves us with $128$ combinations for negative numbers So we can say that $N^{-1}$ is the set $[-1, -128]$. We can write this as:

\begin{equation} N^{+} = [1, (\frac{2^{8}}{2} -1)] \wedge N^{-} = [-1, -\frac{2^8}{2}] \end{equation}

We make this general, if we say that we $w$ bits, above $w = 8$ to represent our numbers, then we $2^w$ different bit patterns. This means that we can say that for two’s complement we have

\begin{equation} N^{+}_{w} = [1, (\frac{2^{w}}{2} -1)] \wedge N^{-}_{w} = [-1, -\frac{2^w}{2}] \end{equation}

So now that we have that figured out, let us moved on to signed vs unsigned numbers.

Signed Integers

A signed integer is an integer in two’s complement and using the leftmost bit to determine if $n \in N^{+}$ or $n \in N^{-}$ is called signing. However, $w$ varies depending on the system architecture, but in general we say that the data type int has $w = 32$ and long has $w=64$. However, trouble, I did write in general. The problem is that some systems restrict representation to word size, meaning on a $16$-bit system $w$ for both int and long could be 16. Note: To keep up with the C++ standard systems should conform to what is shown in the type table presented in [2]. This causes some confusion and therefore C++ comes to the rescue with the header <cstdint> which defines the data type int8_t, int16_t, int32_t, and int64_t where the number Y in intY_t is $w$, so for int8_t $w = 8$. The side benefit of having an exact $w$ is also that you know exactly how many bits your variable will take. One problem is though that not all compilers support this header as the architecture they support does not support these integers.

So when to use int/long vs intY_t? Well, you use intY_t when you want to know precisely how many bytes you will need and if you know that your variables do not go outside the bounds of $N^{+}_{w}$ and $N^{-}_{w}$. So that is what a signed integer is.

Unsigned Integers

Some times we do not like negative numbers and some times we do not want to care about them at all, and that is when unsigned integers come into play. Because, with unsigned integers, we can only represent numbers in $N^{+}$ and since we do not care about negative numbers we now have all $2^w$ bit patterns available for numbers, meaning we can represent large positive numbers. For unsigned numbers our $N^{+}$ is the set $[0, (2^{w} -1 )]$, the $-1$ is due to one pattern is used to represent $0$.

C++ provides unsigned int and unsigned long, but again $w$ can vary based on the system. But as for signed integers, C++ provides uint8_t, uint16_t, uint32_t, and uint64_t via the <cstdint> header, and they have the same properties as their unsigned counterparts and the same use cases.

So now we have signed and unsigned integer. When do we use what then?

When to use Signed or Unsigned Integers?

It is straightforward only use signed integers when you negative numbers, otherwise use unsigned. Why do I say that? First using unsigned integers removes the need to investigate if the statementvar < 0 is true. I see a lot of people using the signed data types “polluting” their code with some way to check if a number is negative, even though it is never expected to be. This can be avoided by using unsigned integers as the number never will be negative. Secondly, consider the following C++ code iterating a list by index:

std::vector<std::string> collection(20);
for (int i = 0; i < collection.size(); ++i)
{
    collection[i] = std::to_string(i);
}

Why is the variable i an int? It does not need to be. Also, why are we comparing a signed integer to an unsigned? The std::vector<T,Allocator>::size [3] method returns a value of std::size_t [4] which is an unsigned integer. It will not cause compiler or run-time errors. It is just weird to do unless needed. So instead we know that collection.size() returns a size_t and that it is an unsigned integer, why not use the same data type then and transform our code to:

std::vector<std::string> collection(20);
for (size_t i = 0; i < collection.size(); ++i)
{
    collection[i] = std::to_string(i);
}

NOTE: Fun fact it is generally recommended to use unsigned numbers for indexing as you rarely have a negative index.

So we are done, right? Now we know when to use signed and unsigned integers? Well yes, but I would like to plant some seeds in the mind of all developers who do not think about this already. Recall the data types with intY_t and uintY_y or even types I have mentioned yet, like short. Why are we not using them all the time? So from the table showing type size presented in [4], we can see that an int is at least $16$-bit or $2$ bytes wide. If we assume a system where an int is has $w = 16$ and we have two constants x = 128 and y = 255 and both of the type int. then we spend $4$ bytes for both variables. If instead int has a $w = 32$, we would have spent $8$ bytes. Now, why am I bring this up? Well, x and y are constants. Both are not exceeding a value of 255, which means they can be represented in a type where $w = 8$, right? What if we swapped to use, let us say uint8_t instead of int, then combined x and y would use $2$ bytes. For int with $w=16$, that means we have reduced the memory consumption, roughly, by half using uint8_t, and for w=32 we have reduced it with $\frac{3}{4}$. Now that may not sound like a lot, but let us say it this way what if we have $1000$ variables of type $int$ on a $32$-bit system with $w=32$. Then we would use roughly $3.9$kB of memory to keep the integers in memory. But what if, let us say half of those variables could be represented in ‘uint16_t’? Then we would have used $(500 \cdot 32) + (500 \cdot 16)$ bit representing them all which results in $2.9$kB which is a reduction of $1.34$x. Now imagine you applied that to all variables, where possible, in a huge software solution. How many MB and potentially even GB of memory could you save, just by using reasonable data types? I know writing about kB may not be the best example, bu the $1.34$x should give you an incitement to use more suitable data types. I am also aware that we have “unlimited” memory, but why not code, so you have system resource for the more challenging stuff?

I hope you liked this post. I will write more about C++ and C in the future.

./Lars

References

[Book Review] The Art of Deception: Controlling the Human Element of Security by Kevin Mitnick

2019-09-07 00:00:00 +0000

Over the years one of the techniques that comes up over and over again as one of the most efficient hacking techniques is Social Engineering. The technique requires you to be a high quality con-artist in many cases, both in person, over the phone or text. One person who is an amazing Social Engineer is Kevin Mitnick [1]. He has written/co-written a lot of books on IT Security, where one of these books is The Art of Deception [2]. It is a book I have read a few times from cover to cover and some times used as a reference, when explaining the idea of social engineering to people. I have just never gotten around to write a review, so here it is.

Overall

The book is split into four parts where the first part focuses on the weakest link in most security, not just IT, the human element. The part outlines how humans act and behave, even on a subconscious level and it is possible to exploit this in general.

The second part of the book focus on what an attacker can do to use his/her skills to attack a company and how an attack can behave when gaining trust/access.

The third part of the book is similar to the second part, but focus on attacks with a higher risk of discovery. As an example there is a full chapter on Entering the Premises.

The fourth part is focused on how we can increase the awareness towards Social Engineered intrusions and in general increase the awareness of security.

The parts are well structure and “build” on top of each other, in the sense that they can be read individually. But you will get a better understanding why the attacker is doing a specific thing if you have read the chapter/section is build on top off. However, I highly recommend that on your first read you do not skip any of the first part of the book. It is the foundation for why any of the following attacks will work and it is explained without “to much psychology”.

Presentation

All parts of the book, first explains a concept from an abstract point of view, followed by one or more well structured concrete examples. This gives the read a deeper understanding of how the process of the concept works and how to use it in reality. The examples are also the gateway drug for newcomers to the world of social engineering, as they make it seem approachable. Side note: I was listening to the podcast Darknet Diaries [3] where the IT security professional known as Tinker [4][5] was interviewed in the episode Jeremy from marketing[6]. During the interview Tinker talked about Social Engineering a lot and because I had read Mr. Mitnicks book, the attacks and techniques described was very easy to follow.

One critic, I have is that some times the attack seems to expect the target, to be of either an extremely low IQ or way to trusting. But that is the only critic I have and I would highly recommend people to read the book, even if they have no interest in IT what so ever. They might learn a thing or two.

References

Got my paper accepted for a GlobeCom Workshop

2019-08-25 00:00:00 +0000

Hello creatures of the internet,

I am pleased to announced that I got the good news that my paper Alexandria: A Proof-of-concept Implementation and Evaluation of Generalised Data Deduplication, I know the title is a mouth full, has been accepted for the IEEE GLOBECOM 2019 Workshop on Edge Computing for Cyber Physical Systems.

Best,
Lars Nielsen

Qutting Thunderbird and moving to Evolution and Mail

2019-07-12 00:00:00 +0000

thunderbird-logo

For years I have been using Thunderbird and really enjoyed it and recommend it to friends, family, and colleagues, and to be honest that is not something I do lightly. Thunderbird have been easy to setup and use across all my machines (Linux and MacOS) and I have not really had any complaints with it, until now.

So why have I started to have complaints? Well the latest update I have installed is v60.8.0 and for some weird reason the user interface is no longer the same on Linux and MacOS and I have not been able to tell why. One of the things that are different is that on Linux the search bar for searching in mails is gone and I have not been able to get it back. Even after a LONG! time of searching and talking to people also people who actively develops Thunderbird. This is bad for a guy who receives over a thousand relevant emails every month. I even went as far as trying to uninstall Thunderbird and reinstalling it on Linux. Next, after update to major version v60 I have had nothing but problems with my Google Mail account, yes I still have one of those, it locks up, is signed out and is in general behaving weird. This is kind of a problem to me, as I really do not like gMails web interface and prefer to not use it at all. Then, I started having problems with a mail which I have Unoeuro hosting for me. Basically every third time, or so, I tried to send an e-mail it would say that I had to type in the login information for the SMTP server, which I had already done. Finally, the last straw, Exchange support. This is not a problem with v60 but rather Thunderbird in general. At a lot of companies and Universities, Aarhus University include, Microsoft Exchange is used as the mail server, which I am actually not against. This means that my mail client must provide some form of Exchange support Thunderbird does not offer this out of the box, but there is a plug-in ExQuilla which enables this. I love ExQuilla it made my life a lot easier when using Exchange on Linux. However, again after updating to v60.8.0 problems started with ExQuilla, my credentials kept dropping for the Universities Exchange server and I had to reset my Exchange account often. Another thing that has been confusing to me, is that Apple Mail provides built-in Exchange support for free and so did Nylas N1 when it existed so it was weird to me that Thunderbird does not. Other minor things have started to break to, but they are nothing to write home about.

evolution logo

So I started looking at alternatives I could use on both macOS and Linux, and well I could not find a good one. BUT! BEHOLD! I found out that Evolution, standard mail client in Gnome, comes out of the box with built-in Exchange support. So I started playing around with Evolution and realised that it fulfilled all my e-mails need and the built-in calendar in Gnome also support Exchange calendar so well Gnome native apps here I came and so far it has been a month and I am in love with these Apps. The calendar has a few bugs when creating events, but it is like start an event a day before but that can be solved with a drag on the event after creation. Now these Apps are not available on macOS for well obvious reasons, so I had to look for things there as well and I was considering to use Airmail by Bloop S.R.L and I did test it, but I simply could get use to how it requires two windows to write mails, one for the mail list and composing window. So I looked and looked, and every time I ended back at Apple Mail and the Calendar, and it is a bliss to use, so I am back in closed source Apple land when it comes to mail and calendar on Mac.

NOW! Does this mean a permanent farewell to Thunderbird for me. I honestly hope not, I love client and has been using it as my main email client since 2005. So what needs to change? Well I would like a nicer user interface and I have wet dreams of Emacs as an editor for mails. I would also like to see built-in support for Exchange and a proper built in Calendar, just start Sunbird again, it is really nice to have two separate apps for this. Another thing is that the Thunderbird team release this blog post: Thunderbird in 2019 and it contains a lot of promises, promises I have yet to see being fulfilled but they make me hopeful for my return one day. Maybe if I can find the time I can help a bit, but that remains to be seen.

sunbird logo

Notice: I would like to extend a special thanks to R Kent James for ExQuilla and I am sad to here about his health problems. I hope he will get better soon and best wishes to him and his family.

-Lars

Why I am using Tutanota and Signal

2019-06-03 12:00:00 +0000

Will add references later I have them

So over the last 5 - 10 years many people have become more focused on privacy when communicating, not just with colleagues, customers, and employers, but also for private communications. The reason for this is that we now know that Google, Facebook, and other companies “peek” in our correspondence with others, to identify relevant, for them, information. It is known that Facebook use the information to tailor commercials more aggressively towards you and its know that companies, not just Facebook, sell the information to other companies and we simply do not know what they do with the information. So how can we as users of this service combat this and can we actually defend combating these procedures?

I will start by answering the later. Well the answer is complex, if one can remember as far back as the mid nineties one would remember that we paid to use certain services such as mail services and search engines, even web browsers if you can believe it. However, this changed with the introduction of products such as Internet Explorer, Altavista, Yahoo, MSN, Hotmail, Google Search, and more, with these products the “free” internet was born. The problem was/is that the companies which provide these products still need to make money, and how can they do that? Well, advertisement, remember a time before NoScript and uBlock when the internet was covered in adds? That was a solution, however, how did companies prove that an add campaign was effective? Well you could correlate increase in revenue with the period of the add, or you could do something much simpler, track how many clicks the add. Sounds familiar? No? Well this is one of the ways tracking started, there are multiple others, and after some time some people thought; Well, what if we can create a profile of the person to target adds more specifically to that user?, letting to user profiling. So tracking rose from a need to earn money and by avoiding tracking we reduce a company’s money flow. So, because we did not want to pay for internet services we are at directly at fault for tracking. However, companies are super invasive and does not necessarily require you to use their system to track you, an example is Facebook, they track who ever visits a website that has a Facebook like button. So if I do not use Facebook why should I allow Facebook to track me? Well I should not. Additionally it is know that Google scan our gMail content to, amongst other things, build a profile of you. Well it is fair because gMail is free, but why then not offer a paid version where you can avoid that? Why can we not opt out of tracking? Simple to much money is to be made and tracking is now a core part of a lot of systems. So to summarise, if you use a services for free it is fair that you get track, it is your own fault, but if you do not use a company’s service they have no right to track you.

But I do not like tracking either way so how do I avoid it? Well it is close to impossible to 100% avoid tracking and I cannot. I use NoScript to block Google Analytics and other fancy stuff, I use uBlock to say f… you to adds. That is all good and well, but how do I avoid Google, Facebook, or who ever, the NSA for instance reading my emails and messages? Well I use products that provides privacy. For email there is multiple options, ProtonMail is a good example, and of those options I went with Tutanota which is located in Germany. Tutanota encrypts your mail on their server and allows you to send encrypted emails, even to people without Tutanota accounts, whilst avoid the need to exchange PGP-keys. It is super easy to use, only problem for me is that there is not a desktop application. But the web client is pretty good. Instant messaging on the other hand… that is a tough one, the reason for that is that in Denmark most do not use SMS anymore but use Facebook Messenger or SnapChat, exactly what I am trying to avoid. However, due to the resent public focus on privacy more and more are switching back to SMS or other services, for instance WhatsApp. But again a problem, a Cell Service Provide can read your SMS’es and WhatsApp is owned by Facebook. So what do we do well we can look at service such as Telegram, which I also use, but the problem is that companies such as Telegram do not explain how they make money or what legal restrictions they are under. So the main option for me is Signal from Open Whisper Systems. They explain how they make money, what legal restrictions they are under AND more importantly to me, almost everything they make is open source, so we can evaluated what is actually happening. Signal provides end-to-end encrypted messaging so they cannot analyse your data and neither can others. So that is why I use Tutanota and Signal.

-Lars

subscribe via RSS