alexkras.com

  • Home
  • YouTube
  • Tech News TLDR
  • Top Posts
  • Resume
    • Projects
  • Contact
    • About

Archives for April 2016

What is the best part about being a Software Engineer?

April 28, 2016 by Alex Kras 15 Comments

Share this:

A week ago I saw an article on Hacker News titled – What’s awful about being a {software engineer, tech lead, manager}?. The article was spot on and it got a lot of attention. It also left me feeling depressed about my profession.

My great grandmother used to say: “Don’t compare yourself to people who have more than you do. Compare to those who have less.”

Her point, I believe, is very important. No matter how much I have or achieve in life, there is always going to be somebody smarter than me, who has more. It’s easy to get trapped in the negative thoughts and to forget just how good we have it.


That is why I wanted to write down a list of things that I believe are great about being a Software Engineer. Since I don’t have a ton of tech lead or management experience, my list will only cover Software Engineering 🙂 .

  • Opportunities
    • Money – I am a regular Software Engineer, yet my income is in top 20% in the US and even better % in the world
    • Location – Software Engineers are able to find employment almost anywhere
    • Impact – Sky is the limit, you can literally change the world
    • Reach – Products are used by millions of people
    • Options – Many paths to pursue, choose your own adventure
    • Demand – Shortage of Software Engineers is probably here to stay for the next 10+ years
    • Business – Easy to start your own service or product business
  • Learning
    • Never ending supply of fun things to learn
    • The job itself keeps the brain active, no need to do puzzles
    • Blogging – a fun way to share and learn
  • Cool Perks
    • Flexible work schedule
    • Snacks and Catered Food
    • Work From Home options
    • Travel opportunities
    • Comfortable work environment (compared to a construction workers, for example)
    • Access to expensive software and equipment
    • All other “standard” benefits like medical and paid time off
  • Fun Job
    • Sure we still have to work for a living, but at least our days are comfortable and fun
    • Often I enjoy my work so much, I have to set an alarm to go home on time
    • Debugging is a lot like playing a detective
  • Creative
    • Writing clean code is a creative process
    • Instant feedback from your work, you can “see” your code “do” stuff
    • Various cool side-projects to pursue
    • Being able to automate the boring stuff
  • People
    • Working with smart people
    • Networking with even smarter people. (May vary by location, but in the San Francisco/Bay Area it’s fairly easy to meet some of the greatest minds of our time)
  • Altruism
    • Teaching and helping others
    • Volunteering for Non-Profits
    • Contributing to Open Source

Please Share:

Wanna see my pretty face and hear me ramble? Check out my YouTube channel 🙂

P.S. Check out Disrupted: My Misadventure in the Start-Up Bubble by Dan Lyons a co-producer and writer for the HBO series Silicon Valley (sponsored).


P.P.S. Let's connect on Twitter, Facebook or LinkedIn. You can also get updates via RSS Feed or the Email Form Bellow.
Follow @akras14

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

You may also like:
Share this:

Filed Under: Software Engineering

Dead Simple JavaScript Next Boilerplate

April 12, 2016 by Alex Kras 1 Comment

Share this:

TLDR; Visit this repo for minimum ES6+ Boilerplate

Lately a lot people have been writing about how much they miss the good old days of JavaScript development. Five years ago creating a new web project looked as follows:

  1. Create an index.html file.
  2. Create a JavaScript file.
  3. Require your JavaScript file in your index.html file.
  4. Open your index.html file in your favorite browser of choice.

Today, starting a project might look more like:


  1. Install or update Node.
  2. Install or update NPM.
  3. Install Babel and any other build tools that you use.
  4. Copy over your old configs for all of the build tools.
  5. Realize that some of the build tools have breaking changes since you’ve last used them.
  6. Decide that you want to understand latest and greatest vs. using the old version.
  7. Spend few hours reading various docs on how to get all your build tools to work.
  8. Create an index.html file.
  9. Require your transpiled JavaScript file in your index.html file.
  10. Open your index.html file in your favorite browser of choice.

Sure, you could still do it the old way, but then you will be missing out on all the good stuff that has happened in the JavaScript community in the past 5 year. On the flip side, setting all of this up takes so much activation energy that sometimes I find myself giving up on the project before writing a single line of code.

Finding a better way

I’ve create this repo which I intend to keep “dead simple”. No bells and whistles. I want the experience to be as close as possible to the way it used to be 5 years ago, while at the same time giving me the needed pieces of the modern stack.

The master project is the minimum set up needed to run Babel that would transpile ES6 to ES5.

Since it’s runs with NPM Scripts, in theory it should not have any external dependencies. All that is needed is:

  1. Clone and rename the repo (and may be remove the .git folder).
  2. run npm install
  3. In one terminal tab run npm run watch
  4. In another terminal tab run npm start
  5. Open your browser to http://127.0.0.1:8080

But I want more

So do I. Right of the bet I wanted to at least have eslint support. Instead of hooking it into the main (master) branch, I’ve create a new recipe branch called eslint. That way I can clone the needed branch, and go from there.

Whats next

I plan to add more recipes to the repo as I go along. Next candidate, for example, is adding support for React.

Since, I know a lot of people are suffering from a similar problem, I wanted to open this up to the public. Please let me know if you thing something can be improved or suggest a new recipe.

Please Share:

Wanna see my pretty face and hear me ramble? Check out my YouTube channel 🙂

P.S. Check out Disrupted: My Misadventure in the Start-Up Bubble by Dan Lyons a co-producer and writer for the HBO series Silicon Valley (sponsored).


P.P.S. Let's connect on Twitter, Facebook or LinkedIn. You can also get updates via RSS Feed or the Email Form Bellow.
Follow @akras14

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

You may also like:
Share this:

Filed Under: Front End, JavaScript

TypeError – The Header Content Contains Invalid Characters

April 8, 2016 by Alex Kras 2 Comments

Share this:

A few weeks ago our production code started to throw a weird error, that looked like this:

1
2
3
4
5
6
7
_http_outgoing.js:351
      throw new TypeError('The header content contains invalid characters');
      ^
 
TypeError: The header content contains invalid characters
    at ServerResponse.OutgoingMessage.setHeader (_http_outgoing.js:351:13)
 

This was very strange, since it was coming from a code that used to work for a long time. The only thing that has changed was the Node version.


I’ve tried Googling for the issue, but all I could find was references to a bunch of various GitHub issues from all kinds of projects.

After some research, I was able to narrow down the error to the Node update for version 4.3.0 via this commit.

Why

The update was done for security reasons and to better comply with http spec. You can read more about the underlying security issues in the commit itself.

You can read more about the actual spec in this Stack Overflow Question, which also provides relevant links to the actual HTTP 1.1 spec (section 2.2 and section 4.2).

TLDR; version is as follows:

  1. Each header looks something like this: Accept-Encoding: gzip,deflate
  2. Accept-Encoding – is the header name
  3. gzip,deflate – is the header value
  4. Only some characters are allowed in the header value (mostly English ASCII)
  5. Even less characters are allows in the header name

An unwanted consequence of this update is that various node projects (that did not follow the spec, but that used to work just fine) all of the sudden broke.

What to do about it

There could be 2 reasons for you to see this error:

  1. One of your project’s dependencies was sending headers that did not comply with the spec
  2. Your own code did not comply with the spec

Case 1

If one of your project’s dependencies did not comply with the spec, chances are they already have a fix in place.

Simply update to the latest version and it should solve the problem.

If the dependency does not have a fix yet, all you can do is file an issue and hope that they will resolve it soon.


Case 2

If you own code did not follow the spec, you’ll have to look at your specific implementation.

In many cases the issue can be remedied short term, by removing invalid characters from your headers. The long term fix, should involve re-engineering your app to never send invalid headers in the first place.

I’ve created this repo to demo an appraoch that can be taken for a short term fix.

Overview of the Demo repo

The demo repo has 2 important files:
– rules.js
– example.js

rules.js

If you look at the rules.js file you will see that it exports 4 functions:

  • For the header name
    • validHeaderName – Accepts a header name and returns true if header name is valid
    • cleanHeaderName – Accepts a header name and returns a valid header name, removing unwanted characters
  • For the header value
    • validHeaderValue – Accepts a header value and returns true if header value is valid
    • cleanHeaderValue – Accepts a header value and returns a valid header value, removing unwanted characters

The two validate functions validHeaderName and validHeaderValue are using copy/pasted checkInvalidHeaderChar and checkIsHttpToken from the node source.

These functions will need to be updated as the Node implementation changes.

The clean functions used the same logic as the validate functions, but instead of simply returning true or false, they return a new header name or value with unwanted characters removed.

It may be helpful to look at the test file for rules.js too see some examples of valid and invalid headers.

example.js

In example.js you can see a sample implementation of how you might want to hook something like rules.js into your own code.

It’s a module that exports only one function called cleanHeaders, which accepts http headers represented as a JavaScript object, and goes through every header name(key) and value, removing unwanted characters.

You can check out the test file for example.js to see how it works. But it will boil down to something like this:

1
2
3
4
5
6
7
8
var clean = require("../example");
var someHeaders = {
  "some(name": "жsome value",
  "some other name": "some other valueж"
};
 
var cleanHeaders = clean(someHeaders);
 

If you use the cleanHeaders in your application, the error should go away.

Note: You may notice that example.js does two passes through, once simply checking if header is valid, and one more time to remove unwanted characters. I personally prefer that approach because I didn’t want to mess with headers that did not need to be messed with. But you can just run it through the cleanHeaderName and cleanHeaderValue right away, to gain faster performance.

Conclusion

I thought about releasing this module as an npm package, but for now decided against it, mainly to avoid the left-pad type of situation. If you WOULD like to have it released as an NPM module, please let me know in the comments or on GitHub.

I hope this post was helpful, but please take everything with a grain of salt.

Please Share:

Wanna see my pretty face and hear me ramble? Check out my YouTube channel 🙂

P.S. Check out Disrupted: My Misadventure in the Start-Up Bubble by Dan Lyons a co-producer and writer for the HBO series Silicon Valley (sponsored).


P.P.S. Let's connect on Twitter, Facebook or LinkedIn. You can also get updates via RSS Feed or the Email Form Bellow.
Follow @akras14

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

You may also like:
Share this:

Filed Under: Node.Js

I Tried To Virtually Stalk Mark Zuckerberg

April 4, 2016 by Alex Kras 44 Comments

Share this:

Part 1. A Naive Dream

min-network

The Dream

In late 2015, I finished reading Automate the Boring Stuff with Python and was very inspired to try to automate something in my life.

At the same time, I have been always fascinated by Mark Zuckerberg – the Bill Gates of our time. A lot of people love to hate on Mark, but I actually like the guy. Sure he got lucky in life, but at the same time he is doing something right for Facebook to continue to stay successful.


In any case, one day I had a “brilliant” idea. What if I write a script that would follow Mark’s public posts and send me a text whenever he posted something new? Then I can be the first guy to comment on his post.

After a while Mark would notice my comments and would begin to wonder “Who is this guy that is always posting meaningful responses to all of my posts?”. Then he would invite my wife and me to his house for dinner and our kids will become friends. 🙂

So, without further ado I got to work.

I’ve briefly considered using Facebook APIs to get notified on Mark’s posts. I’ve had mixed experience with APIs in the past, hitting rate limits pretty quick and other related problems. Plus, I wanted to use my Automate the Boring Stuff with Python knowledge 🙂

So I went the other route and wrote a Selenium script (which was really easy to do using selenium module in Python) that would:

  1. Log in to Facebook
  2. Use current timestamp as the time of last post
  3. Keep checking every 60 seconds if Mark has posted a new post
  4. Send me a Text using Twilio API, with a link to the new post

I happen to own a small server, so I set the script to run indefinitely in a headless browser (PhantomJS) and began to wait.

Paradise Lost

It took a couple of days for Mark to post something and I began to get worried that my script did not work.

At some point I had to go to the post office. Afterwards, I drove back home, parked my car, checked my phone and saw a new SMS text from my script. My heart started to beat really fast and I rushed to open the link. I soon realized that the post took place 5 minutes ago and I missed the notification when I was driving. By now the post already had thousands of comments…

Oh well, I thought, there is always the next time. Sure enough within a day I had another text. This time it was within under 1 minute from the original post. I quickly open the link, only to discover that Mark’s post already had close to 100 comments.

Now don’t get me wrong, I am not stupid. I knew that Mark’s posts were popular and would get a lot of comments.


I even tried to estimate, the rate at which people were posting replies. I’ve looked through Mark’s older posts and saw some posts with tens of thousands of comments. So if you take 10000 comments and divide by 24 hours, then divide by 60 minutes, you get about 7 posts per minute.

What I didn’t realize in my estimate is that those comments were not evenly distributed in time and that I had a very small chance of being the first to comment.

I knew that I was losing my dream and I considered my options 🙂

I could set my script to run more often than every 60 seconds, to give myself an early warning. By doing so I would risk showing up on Facebook’s radar as a spammer and it just didn’t feel right for me to bombard their servers.

Another option that I considered was to try to make an automated reply, in order to be one of the first people to comment. This approach, however, would defeat the purpose of saying something meaningful and would not help me to become friends with Mark.

I’ve decided against both of these ideas and admitted my defeat. I’ve also realized, that I could turn this (failed) experiment into an interesting Data Exploration project.

Part 2. Data Analysis

Scraping

Having made a big error in my estimate of the rate at which people were replying to Mark, I was curious to explore what and when people were saying. In order to do that, I needed a data set of comments.

Without putting much thought into it, I decided to scrape one of Mark’s most recent posts at the time.

Merry Christmas and happy holidays from Priscilla, Max, Beast and me! Seeing all the moments of joy and friendship…Posted by Mark Zuckerberg on Friday, December 25, 2015

My first approach was too try to modify my notification script to:

  1. Log in to Facebook
  2. Go to the post that Mark has made
  3. Click on “Show More Comments” link, until all comments were loaded
  4. Scrape and parse the HTML for comments

Once again I under estimated the scale of the operation. There were just too many comments (over 20,000) and it was too much for a browser to handle. Both Firefox and PhantomJS continued to crash without being able to load all of the comments.

I had to find another way

I proceeded to examine how View more comments requests were made using Network Toolbar in Chrome Developer Tools. Chrome allows you to right click on any request and to copy it as CURL via “Copy as cURL” option.

curl

I’ve ran the resulting CURL command in my terminal, and it returned some JSON. BINGO!

At that point all I had to do was to figure out how pagination of comments was done. This turned out to be a simple variable in the query of the request, which acted as a pointer to the next set of comments to fetch.

I’ve converted the CURL command to a Python request code via this online tool.

After that I wrote a script that would:

  1. Start at pagination 0
  2. Make a comments request
  3. Store it in memory
  4. Increment the pagination
  5. Sleep for a random amount of time from 5 to 10 seconds
  6. Repeat the loop, until no more comments were found
  7. Save all of the comments to a JSON file

I’ve ended up with an 18Mb minimized JSON file, containing about 20,000 comments.

Analyzing the data

First I looked at the distribution of comments over time.

As can be seen in the two plots bellow, it looked a lot like exponential decay, with most of the comments being made in the first two hours.

First Two Hours
time-to-comment-minute

First 24 Hours
time-to-comment-minute

First 1500 comments were made within first 10 minutes. No wonder I had a hard time making it to the top.

Next I wanted to see what people were saying.

I created a word cloud of most commonly used keywords in comments using a python library called (surprise, surprise) – Word Cloud.

facebook-word-cloud

Looking at the word cloud I realized that I might have picked the wrong day to do this experiment. Most of the people responded in kind to Mark’s wishes of Merry Christmas and Happy New Year. That was great news for Mark, but kind of boring from the data exploration stand point.

Digging Deeper

After I finished the word cloud I’ve spent WAY TOO MUCH TIME trying to gain a deeper understanding of the data.

The data set turned out to be a bit too big for me to iterate on it quickly and all of the positive comments created too much noise.

I’ve decided to narrow down the data set by removing all comments with any of the following word stems. A word stem is simply a shortest version of the word that still makes sense. For example, by removing comments that have a word thank in it, I was able to remove both the comments with the words thank you as well as the comments with the word thanks. I’ve used nltk library to help me convert my words to stems.

I’ve organized the stems by a type of comment that they usually belonged to:

  • Happy New Year Wishes
    • new
    • happ
    • year
    • wish
    • bless
    • congrat
    • good luck
    • same
    • best
    • hope
    • you too
  • Comment on Photo of the Family
    • photo
    • baby
    • babi
    • beautiful
    • pic
    • max
    • family
    • famy
    • cute
    • child
    • love
    • nice
    • daughter
    • sweet
  • Thanking Mark for creating Facebook
    • thank
    • connect
    • help

After removing all of the typical comments, I’ve ended up with 2887 “unusual” comments.

Digging Even Deeper

I’ve also recently finished reading Data Smart, from which I learned that Network Analysis can be used to identify various data points that belong together, also known as clusters.

One of the examples in the book used Gephi – an amazing software that makes cluster analysis very easy and fun. I wanted to analyze the “unusual” comments in Gephi, but first I had to find a way to represent them as a Network.

In order to do that, I’ve:

  1. Removed meaningless words such as “and” or “or” (also known as stop words) from every comment using nltk library
  2. Broke remaining words in every comment into an array (list) of word stems
  3. For every comment calculated an intersection with every other comment
  4. Recorder a score for every possible intersection
  5. Removed all intersection with a score of 0.3 or less
  6. Saved all comments as nodes in Gephi graph and every intersection score as an undirected edge

By now you might be wondering how the intersection score was calculated. You may also wonder what the heck is Gephi graph, but I’ll get to it a bit later.

Calculating Intersection Score
Let say we have two comments

1
2
3
4
["mark", "love"] # From "Mark, I love you"
# and
["mark", "love", "more"] # From "Mark, I love you more"
 

We can find the score as follows:

1
2
3
4
5
6
7
8
9
10
def findIntersection(first, second):
    intersection = set(first) & set(second)     # Find a sub set of words that is present in both lists
    intersectionLength = len(intersection)      # Words both comments have in common
    intersectionLength = float(intersectionLength)
    wordCount = len(first) + len(second)        # Total length of both comments
    if wordCount == 0:                          # Corner case
        return 0
    else:
        return (intersectionLength/wordCount)   # Intersection score between two comments
 

So for our example above:

  1. Intersection between two comments is ["mark", "love"] which is 2 words
  2. Total length of both comments is 5 words
  3. Intersection score is 2/5 = 0.4

Note: I could have used average length of two comments (so (2+3)/2 = 2.5) instead of total length (5), but it would not have made any difference since the score was calculated similarly for all of the comments . So I decided to keep it simple.

Once I had all of intersection calculated I saved all comments in the nodes.csv file, that had the following format:

1
2
3
Id;Label
1;Mark, I love you
 

I’ve saved all intersection in the edges.csv file, that had the following format:

1
2
3
Source;Target;Weight;Type
1;2;0.4;"Undirected"
 

Analyzing the Network

This was all that was needed to import my data into Gephi as a Network Graph. You can read more about Gephi file formats here and this video provides a good introduction to Gephi and how it can be used.

Once I imported my data into Gephi, I’ve run a network analysis algorithm called “Force Atlas 2”, which resulted in the following network Graph.

I’ve manually added the text in red to summarize some of the clusters. If you click on the image, you will be taken to a full screen representation of the graph. It is pretty big, so you might have to zoom out and scroll for a while before you see some data.

min-network

Some Notes on the Results

I was really happy to see my approach finally working (after many days of trying).

I have been starring at those comments for a long time and I’ve seen some references to “money”. Therefore, I was not surprised to see a couple of clusters asking Mark “Where is my Money?”.

I was very surprised, however, to see a cluster of comments mentioning a specific number – 4.5 million to be exact. I had no idea where this number was coming from, but a quick Google search pointed me to this hoax. Turns out a lot of people were duped into believing that Mark would give away 4.5 million to 1000 lucky people. All you had to do was to post a “Thank you” message of sorts.

Other than that, I didn’t see anything very interesting. There were some spammers and some people asking Mark to ban some people form Facebook. Some aggression towards Mark and a lot more of general types of comments that I did not filter out.

I’ve also noticed some weaknesses in my approach. For example, there were two clusters around the word “precious”. It was probably caused by removing relationships that did not have intersection score of at least 0.3. Since I did not use the average length for two comments, the threshold of 0.3 really meant that the two comments were at least 60% similar, and it was probably too high and caused the error. On the flip side it has helped to reduce the number of edges, focusing on the most important connections.

Please let me know in the comments, if you find anything else note worthy or if you have suggestions on how intersection scores can be improved.

Conclusion

It is hard being a celebrity.

I started this journey naively assuming that I can get Mark’s attention by simply posting a comment on his timeline. I did not realize the amount of social media attention an average celebrity gets.

It would probably take a dedicated Data Scientist working full time just to get insight into all of the comments that Mark receives. While Mark can afford to hire such a person, my bet is that he is using his resources for more meaningful things.

That being said, this has been a great learning experience for me. Gephi is a magical tool, and I highly recommend checking it out.

If you want some inspiration for automating things, I highly recommend reading Automate the Boring Stuff with Python.

If you are looking for a good entry level text on Data Science, I found Data Smart to be an informative read, although hard to follow at times.

Also note that I’ve destroyed all of my data sets to comply as best as I can with Facebook’s Terms of Service. Scraping content without permission is also against Facebook’s Terms of Service, but I’ve avoided thinking about it until after I’ve done all of my analysis.

I am hoping that Facebook will over look my transgression, but wanted to make sure I don’t send anybody else down the wrong path without a proper warning.

If all else fails, you can always follow me on Twitter 🙂

Please Share:

Wanna see my pretty face and hear me ramble? Check out my YouTube channel 🙂

P.S. Check out Disrupted: My Misadventure in the Start-Up Bubble by Dan Lyons a co-producer and writer for the HBO series Silicon Valley (sponsored).


P.P.S. Let's connect on Twitter, Facebook or LinkedIn. You can also get updates via RSS Feed or the Email Form Bellow.
Follow @akras14

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

You may also like:
Share this:

Filed Under: data

Copyright © 2018 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in