Categories
sessions

Deep Dive Experiences

I thought it would be interesting to share a newcomer’s experiences with machine learning. Hopefully you can also get inspired into learning this amazing discipline.

Deep Dive ended about a month ago, and I’ve been busy with work and out-of-work projects since. But I thought to have a short synthesis about teaching this course. I have some few words to offer. Here goes:

I certainly learned a lot from teaching this course as well. Aside from reviewing the fundamental of machine learning, I got some good questions and ideas from my students as well. For example, throughout the course, I had everyone compete in Kaggle competitions, specifically Random Acts of Pizza and Santander Customer Satisfaction. After each week, we share our techniques and I learned a couple of novel ideas. With each new lecture, we hone our skills and likewise achieve better results in the leaderboards.

That’s it for me. Somewhat standard. On the other hand, I thought it would be interesting to share a newcomer’s experiences with machine learning. Hopefully you can also get inspired into learning this amazing discipline.


migz

Can you introduce yourself? Your background? Do you have a machine learning background?

I’m Migz Estrella, a DevOps engineer for GRID and I’ve been interested with computers since day one. Heard a lot about machine learning but I never thought they’d be really prevalent outside the academe.

How did you get interested in machine learning?

I’ve always been fascinated by data. Numbers don’t lie. That being said, I believe ML takes it into a whole other level.

Do you have a “favorite” lecture?

Logistic Regression models. It’s easy to understand and lays down the fundamentals of Machine Learning. You have to respect the process.

How about the least “favorite” lecture?

When we started computing for accuracy and errors.

On your capstone project, what is the problem you’re trying to solve? Why is it important?

I needed to discover which hashes should be grouped together to form a “packaged” application. This will prove to be useful to discover new applications used by our customers and in turn, improve our content coverage.

What techniques did you use?

Mainly I used DBSCAN, an unsupervised classifier to group together hashes with short distances to each other.

What are your key findings?

Playing with a huge amount of data (3 million hashes to be exact) is a herculean task on its own. It’s true what they say that the majority of the tasks in Machine Learning involves preparing your data. The classifying part is just a small chunk of it.

If you were to attend Deep Dive: ML again, what area of ML should your lecturer emphasize more?

Clarification of vocabulary used. I think it’s important to set the definitions straight right at the start of the class. What does model mean? Is it the same as classifier? An algorithm? Data sets? Something like that.

Okay, what should your lecturer improve in himself?

I felt there were sessions that were a bit rushed. I understand that meeting only once a week has its constraints, but there were days that I just got overloaded with information.

What did you learn that could be immediately used in your career?

Practically, the methodologies as a whole. The proper mindset and attitude, the strategy on how to solve a given problem and how to be able to share it with others.

What are the most surprising things that you learned?

People who made ML libraries (scikit-learn, etc) are legit.

Thanks Migz! Those guys started the scikit-learn project as a Google Summer of Code project back in 2007. Now it’s an indispensable tool for many data scientists. I enjoyed teaching those libraries to you guys, and I hope you could use them in your career.


romc

Can you introduce yourself? Your background? Do you have a machine learning background?

Hi, my name is Romano Cabral, Rom for short. I am a software engineer here at Trend Micro. I watched some videos of ML in Coursera.

How did you get interested in machine learning?

I became interested in machine learning when Dylan used it in our project called AI (Artificial Intelligence).

Do you have a “favorite” lecture?

My favorite lecture is when we read different white papers about different types of topics. I become fascinated by how ML was used in different types of problems. I learned different approaches and techniques used by the authors.

How about the least “favorite” lecture?

My least favorite is about NLP [natural language processing]. I think it’s because we did not dig deeper to that topic due to time constraints.

On your capstone project, what is the problem you’re trying to solve? Why is it important?

My capstone is about File Identification where I tried to classify if a file is raw or extractable (installer). It is important in our team to be able to process the file correctly.

What techniques did you use?

I used Logistic Regression and Decision Tree for that problem.

What are your key findings?

I noticed that file entropy is a very strong feature in this type of problem. I was able to get around 96% by just using 2K files and about 6 features.

If you were to attend Deep Dive: ML again, what area of ML should your lecturer emphasize more?

I believe the lecturer needs to focus more on algorithm strength, less mathematics and encourage everyone to use “GridSearch” more.

Okay, what should your lecturer improve in himself?

I think the lecturer is great. 🙂

What did you learn that could be immediately used in your career?

I learned the basics of machine learning which I can use together with programming. 🙂

What are the most surprising things that you learned?

The most surprising things for me is that ML is about “file preprocessing and feature engineering” more than applying the algorithm.

Thank you Rom! Rom was really the proactive student and thoroughly explored hyperparameter search. I enjoyed hearing about your ideas.


anwar

Can you introduce yourself? Your background? Do you have a machine learning background?

Hi, I’m Anwar and I’m a software engineer for GRID. I’m a medical technologist but I really wanted to craft software so I started my career working as a web developer. No machine learning background whatsoever until our Deep Dive class here at Trend Micro.

How did you get interested in machine learning?

I got interested in ML when you (Dylan) gave us a Techie Tuesday and I heard about it’s application especially here at work where we try to identify “good files”.

Do you have a “favorite” lecture?

Yes! The first 2 lectures were really memorable especially about linear and logistic regression and about cost functions. That’s where I got my first taste of ML so I find them memorable.

How about the least “favorite” lecture?

Probably that one about models that we didn’t study thoroughly? I forgot the names hahaha.

On your capstone project, what is the problem you’re trying to solve? Why is it important?

I’m trying to solve the problem of classifying spywares and possibly unwanted applications using their EULAs (end user license agreements).

It’s important because they are really hard to identify and there are even disputes regarding their identification. A lot of times it even depends on the context and the subjective use of the application. Having another way of automatically identifying them would help threat research engineers make decisions with regards to Trend Micro’s services.

What techniques did you use?

I used simple grep search to find the phrase “license agreement” in my samples. For classification I got great results using SVM [support vector machines].

What are your key findings?

It is possible to identify Spywares and PUAs using their EULAs but the real challenge is extracting and finding EULAs that are really used for Spywares and PUAs.

If you were to attend Deep Dive: ML again, what area of ML should your lecturer emphasize more?

I wish he could explain the maths somehow in the same pace as me 🙂

Okay, what should your lecturer improve in himself?

Honestly, I find the lectures interesting and digestible. Maybe if he can go in-depth with the maths without the assumption that we can quickly understand them it would be much more awesome 🙂

What did you learn that could be immediately used in your career?

The basic ML concepts and how to use the right model for the right job. I learned what to look for in researching other techniques and models not taught to us and how to use them.

What are the most surprising things that you learned?

To stop worrying and love the maths. To quote our lecturer (roughly): math is just another language to express something. Don’t be intimidated by it.

Much appreciated Anwar! Yeah, I also regret that support vector machines and matrix decomposition were not examined as thoroughly as they should be. I believe there is some creative approach on how to teach this — we just haven’t found it yet. 


Bonus: ML Competition

The Trend Micro Machine Learning competition preliminary round just ended about a week ago. It involved the ubiquitous MNIST dataset. Both of my advisee teams achieved top 30! Both teams had members from the deep dive sessions we had. So proud!

Here’s Romano again for this competition!

romc

 

What technologies did you use?

For the preliminaries, I used Python Anaconda and Sklearn.

What preprocessing techniques did you use?

I normalize the data and alter the image by rotation and some other computer vision techniques.

What models did you use?

I started using DBN [deep belief networks] and eventually CNN [convolutional neural networks] which is best in computer vision problems.

How did you evaluate what works best?

I used validation to determine what’s work best. I also did trial and error with the model parameters.

Were you  surprised by your findings?

I was surprised the flexibility of CNN, how it can be used to connect with other models like Random Forest.

What are your key learnings?

I learned basic deep learning specifically CNN. I also learned the basics of computer vision.

You guys did great. There’s deep literature in deep learning *wink*, as it is the hot topic these days. If you ever come across vision, signal, and natural language problems, know that CNN is your heavy-duty algorithm to achieve phenomenal results.

By krsnewwave

I'm a software engineer and a data science guy on recommender systems, natural language processing, and computer vision.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s