danielnee.com
cheatsheet | Daniel Nee
http://danielnee.com/tag/cheatsheet
Hadoop Command Line Cheatsheet. February 16, 2015. Useful commands when using Hadoop on the command line. Full reference can be found in Hadoop Documentation. Hdfs dfs -ls [-R] dir. List the contents of provided directory. Hdfs dfs -put local file dst file. Put the local file to provided HDFS location. Hdfs dfs -get hdfs loc local loc. Copy the file to the local file system. Hdfs dfs -cat file hdfs dfs -text file. Hdfs dfs -text file head. Hdfs dfs -cp source dst hdfs dfs -mv source dst. By default the f...
danielnee.com
dn | Daniel Nee
http://danielnee.com/author/dn
All posts by dn. April 27, 2016. Links to various useful external resources. Will be continually updated. Http:/ drivendata.github.io/cookiecutter-data-science/. Numpy for R users. Http:/ mathesaurus.sourceforge.net/r-numpy.html. Computing PCA with SVD. April 25, 2015. PCA is a great tool for performing dimensionality reduction. Two reason you might want to use SVD to compute PCA:. Spark's PCA implementation currently doesn't support very wide matrices. The SVD implementation does however. Hadoop 2 is a ...
danielnee.com
Hadoop 2 Introduction | Daniel Nee
http://danielnee.com/2015/02/hadoop-2-introduction
February 16, 2015. Hadoop 2 is a complete overall of some of the core Hadoop libraries. It is a fundamental shift in the way applications run on top of Hadoop and it is worth understanding these changes. In Hadoop 1, the programming API (MapReduce) and resource management of the cluster were all bundled together. In Hadoop 2, resource management is now handled by YARN (Yet Another Resource Negotiator). Runs on a single master node. Global resource scheduler across the node. Sits on each slave node. Hadoo...
danielnee.com
Computing PCA with SVD | Daniel Nee
http://danielnee.com/2015/04/computing-pca-with-svd
Computing PCA with SVD. April 25, 2015. PCA is a great tool for performing dimensionality reduction. Two reason you might want to use SVD to compute PCA:. SVD is more numerically stable if the columns are close to collinear. I have seen this happen in text data, when certain terms almost always appear together. Spark's PCA implementation currently doesn't support very wide matrices. The SVD implementation does however. Singular Value Decomposition (SVD). Matrix, the singular value decomposition gives.
danielnee.com
principal component analysis | Daniel Nee
http://danielnee.com/tag/principal-component-analysis
Tag Archives: principal component analysis. Computing PCA with SVD. April 25, 2015. PCA is a great tool for performing dimensionality reduction. Two reason you might want to use SVD to compute PCA:. SVD is more numerically stable if the columns are close to collinear. I have seen this happen in text data, when certain terms almost always appear together. Spark's PCA implementation currently doesn't support very wide matrices. The SVD implementation does however. Singular Value Decomposition (SVD).
danielnee.com
sbt | Daniel Nee
http://danielnee.com/tag/sbt
Setting up IntelliJ for Spark. January 20, 2015. Brief guide to setting up IntelliJ to build Spark applications. Create new Scala Project. Give it an appropriate name. Move to the project root. Run the following:. Mkdir -p src/main/scala mkdir -p src/test/scala mkdir project. Directory you just created, create a new file called plugins.sbt. With the following content:. AddSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.6.0"). Create the build file. At the project root level, run the following.
danielnee.com
Programming | Daniel Nee
http://danielnee.com/category/programming
February 16, 2015. Hadoop 2 is a complete overall of some of the core Hadoop libraries. It is a fundamental shift in the way applications run on top of Hadoop and it is worth understanding these changes. In Hadoop 1, the programming API (MapReduce) and resource management of the cluster were all bundled together. In Hadoop 2, resource management is now handled by YARN (Yet Another Resource Negotiator). Runs on a single master node. Global resource scheduler across the node. Sits on each slave node. Hadoo...
danielnee.com
Machine Learning | Daniel Nee
http://danielnee.com/category/machine-learning
Category Archives: Machine Learning. Computing PCA with SVD. April 25, 2015. PCA is a great tool for performing dimensionality reduction. Two reason you might want to use SVD to compute PCA:. SVD is more numerically stable if the columns are close to collinear. I have seen this happen in text data, when certain terms almost always appear together. Spark's PCA implementation currently doesn't support very wide matrices. The SVD implementation does however. Singular Value Decomposition (SVD). In particular...
danielnee.com
Hadoop Command Line Cheatsheet | Daniel Nee
http://danielnee.com/2015/02/hadoop-command-line-cheatsheet
Hadoop Command Line Cheatsheet. February 16, 2015. Useful commands when using Hadoop on the command line. Full reference can be found in Hadoop Documentation. Hdfs dfs -ls [-R] dir. List the contents of provided directory. Hdfs dfs -put local file dst file. Put the local file to provided HDFS location. Hdfs dfs -get hdfs loc local loc. Copy the file to the local file system. Hdfs dfs -cat file hdfs dfs -text file. Hdfs dfs -text file head. Hdfs dfs -cp source dst hdfs dfs -mv source dst. By default the f...
danielnee.com
scala | Daniel Nee
http://danielnee.com/tag/scala
Setting up IntelliJ for Spark. January 20, 2015. Brief guide to setting up IntelliJ to build Spark applications. Create new Scala Project. Give it an appropriate name. Move to the project root. Run the following:. Mkdir -p src/main/scala mkdir -p src/test/scala mkdir project. Directory you just created, create a new file called plugins.sbt. With the following content:. AddSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.6.0"). Create the build file. At the project root level, run the following.