Investigating a bug in Reflex
For the last 4-5 days, I’ve been spending a large amount of time investigating a bug and figuring out its fix. This meant, diving into rabbit holes, going deeper and deeper and surrounding myself with documentation and online discussion forums. I’m briefly outlining the process here mostly because I’m pretty proud of what I did.
So I was going through the open source repository of the Python web framework Reflex and I saw that they had an issue that was preventing them from using Reflex in Windows systems with Python 3.12. The issue was that whenever they were using Reflex in this environment and they leveraged the hot reload feature of the tool, the frontend server just dies. In other words, when you make a change in one of the files, you expect the running servers to detect it and reload the website with the new changes. You can see in the logs that the change has indeed been detected and they have compiled the new version of the website but for some reason the frontend server is not alive anymore. An early clue that I caught was that the logs were showing the character combination ‘[?025h’ whenever this was happening. I had also noticed that whenever I stopped Reflex intentionally with a Ctrl+C, this character combination was getting outputted as well. So the first thought was, okay somehow a Ctrl+C from somewhere was affecting the frontend server.
Unpacking UPX
High level Overview of Steps to Unpack UPX dynamically
- Find the original entry point of the packed file
- Run Scylla to extract the payload out
- Turn on the IMAGE_FILE_RELOCS_STRIPPED flag inside the extracted PE file
Finding OEP
There are a lot of different ways you can go about doing this, and it would depend heaviy on the packer used and the complexity of it. For a simple hello world program that was packed using UPX the following was the easiest way in which I could find the OEP.
- Break at the entry point
- Observe that the instruction at the entry point is a push instruction. Step over that instruction
- Set a memory breakpoint that will get triggered on accessing the location to which esp/rsp is pointing
- Run the program until the memory breakpoint is hit. This would likely be after a pop instruction (this would be the counterpart to the initial push instruction)
- Observe the dissassembly to find a jump instruction in the next few lines. The jump address is likely to be the OEP
- If you allow the program to jump to the OEP, you can observe the pattern of instructions to see that it resembles a new PE executable. If you are doing this as an exercise you would likely have the original unpacked file. You could compare the bytes of the original executable against the bytes at the OEP to confirm your strategy
Running Scylla
Running scylla is fairly well documented. If it is run as a plugin of a debugger, the module start address, size and such would already be filled. All you would need to modify would be the OEP. With the right OEP Scylla should be able to get the right import table. There might be unkown chunks in the imports that scylla had obtained. For a hello world program removing the unknown chunk was enough to get it working. More complicated programs might require more steps to resolve the imports. Dump the payload using Scylla and fix the Import Table using the Rebuild function of Scylla.
Patching the PE file
After all of this, even though the Scylla dumped file was very similar to the original payload it was crashing on execution. To fix this, open a PE editing tool such as PEBear, or CFF Explorer and turn on the IMAGE_FILE_RELOCS_STRIPPED flag.
AutoYara
Creating YARA rules to match against malware files using Machine Learning Model. This was for a course project, where we were supposed to come up with some improvement on an existing paper. So we decided to see if we can get AutoYara up and running and then maybe do some experiments on it. The link to the original work is… Here. Oh and by ‘we’ that I use throughout this post, I mean me and my friend Soumya. We also took the guidance of Dr.Marcus Botacin who was gracious enough to give his two cents about our issues, even though it was not his course or anything.
Learning Awk!
On a path to understand the widely famed power of Awk. Hope to conquer it one day, just for the sake of it. Sed is the next one on the bucket list
Most basic use case
Say you want to print out the second word from each line.
awk '{print $2}' input_file
Awk follows a <pattern> <action> format. For each line, awk will apply the pattern and see if it matches. If it does match, it’ll perform the action. Now, you can omit either pattern or action but atleast one should be present. If action is omitted, the default action is to just print the line out. If pattern is omitted, the action would be applied to all the lines.
Creating this Blog using Jekyll
So, my initial plan to create this blog was to build it from scratch, brick by brick, maybe even make a content management system on the way. However, due to a lack of time and a general decision in life to stop rebuilding the wheel every single time, I decided to use something someone wise has already built.
I did intially go for a framework called CMSjs which I found here but for some reason I did not like the way it worked. Finally I settled down to Jekyll and decided to get version 1 of the blog up and running as soon as possible instead of trying to find the perfect tool out there. Barry Clark, a Web Developer from NY, has this very straight-forward article to set up a minimal blog by forking a starting-point that he has created. Following it was a breeze, and before I knew it I had a pretty decent looking blog in my hands.
ML Based Cyber Defense Papers
For my course in TAMU - ML Based Cyber Defense under Dr. Botacin, throughout the semester I had to make summaries for different research papers in the domain. During each class, a student had to give a seminar on a research paper and the others had to prepare a summary on the paper as well as the discussion that followed the seminar. I am uploading all of the summaries that I created into this blog. But fair warning, finishing the summaries as soon as possible for the sake of submission, in the midst of all other course work doesn’t always lead to the best quality writing.
OffensiveZoe - AntiVirus vs Adversarial Attacks
For the course “ML Based CyberDefenses” under Dr.Botacin we had a very interesting and hands-on in-class competition. You form teams, and then each team come up with a malware detection system as well as generate some adversarial samples (samples that are designed to evade the ML detection models) using existing malware files. The adversarial samples that our “competitors” came up with would be put against our detection system, and vice versa to decide how much points each team scores. Luckily, I was able to find a team that valued team-building as importantly as actual work leading to a lot of fun memories. Big Shout out to SidBav, Soumya and Veronika. And to Zoe our team-mascot (Not to brag, but I won her from a claw machine first try)