Monday, January 20, 2014

Security - Malware Detection

Assumption : Source code not available

Static:
n-gram
reverse engineering

Dynamic:
Machine Learning
Graph Mining
Automata (L*)
Mining Frequent Itemset

To Find:
signature (feature - system call+string value)

Direction:
Focus on
1. End-user
2. Calling sequence on internet

Sunday, January 5, 2014

Hybrid concolic testing

Concolic testing could faced path exploration problem, to solve this problem, hybrid concolic testing (ICSE) method is proposed. It is mixed concolic testing with random testing, to increase the exploration to the entire "path space", rather than exploring only a region of the "path space".






Random Testing, Symbolic Testing, Combinatorial Testing, Concolic Testing

Recently have surveyed some automated testing methods, basically it is categorized into 3 categories

1. Random testing
- as its name implies, randomly generated input and test.

2. Symbolic testing
- treat the input (1,2,3) as symbolically (a,b,c), run the program symbolically, find some constraints on the input the makes the program failed. When you run to a if statement (e.g., a=b), then that branch does not failed, add a!=b to the constraint, and reexecute the symbolic execution. This continues until failures occured, and when failure occured, return the accumulated constraint to the user.

3. Combinatorial testing
- basically testing combinatory is untractable, therefore we need to make some heauristic. Some heauristic include pairwise testing, i.e., for each tuple, we only tested it once. 
For example given input of three parameters
Destination (Canada, Mexico, USA); Class (Coach, Business Class); Seat Preference (Aisle, Window)
all possible testing will result in 3x3x2=18 combinations.

In 1997, "The Combinatorial Design Approach to Automatic Test Generation", Telcodia's studies suggest that "most field faults were caused by either incorrect single values or by an interaction of pairs of values.", so to find bug, what we could do is to find every possible pair of input, which result in the following pair. This is called pairwise testing/all pair testing/2-way testing.
For example line 18 is removed as (USA, First Class), (USA,Windows), (First Class Windows) could be found at lines 9,15,16 respectively. This result in 9 columns (3x3=9, where we take the top 2 elements - destination, class which have 3 elements resp. - are multiplied them together).

This could be generalized to 3-way testing (or 4-way testing, etc) and so on (more thorough testing, but need more time), when the results in taking top 3 (or top 4, etc) elements multiplied together.

And some combination of these 3 categories

4. Concolic testing (concrete/random + symbolic testing)
This video gives nice introduction
http://www.youtube.com/watch?v=b8SeZTgwXEY

Problem Using random testing alone
if you generate the x randomly, then you have 1/2^32 chances to hit alone, which is not good, therefore we use symbolic execution approach, which aim to finding constraint on x using the condition, rather testing individual values on x.











Problem Using symbolic testing alone
1. For the following code symbolic testing will unable to count the value of x to execute first branch or second branch. Reason is that (x%10)*4!=17 will gives a list of infinite values of x that are disconnected, which is hard to represent using constraint. Therefore, symbolic execution told that both branches are executable, which results in false positive.









2. If we have function f(x) below, we simply can't use symbolic execution










Concolic Testing
Concolic testing keeps both concrete state and symbolic state when running.
Always use concrete states to decide the path,
The initial value of concrete states is randomly generated, subsequent values are inferred from the negation of path condition
For modulo and function - it chooses one value out of many possible values.











Here we require to negate one of the path condition to allow us explore different paths.
We hit the program errors.


It is an explicit path model checking method as it explores all the possible path in the program
This method shines when we hit the modulo and external function method, where it cannot resolve my traditional symbolic execution method
From the slide above, we use the random number 7. Noted that, this guarantee soundness, but not completeness, it allows it to pass through, but not guarantee to solve all kinds of bug. 

Basically these are the methods for testing I have surveyed so far. The general observation is:
1. black box testing - (e.g., combinatorial testing, and pairwise combination is effective)
2. white box testing - coverage is important (e.g., symbolic and concolic testing)
3. random testing (used for black box testing and white box testing)